### Tutorial on Fast Convolution

- Introduction
- Convolution
- Fourier Transform and Convolution
- Max- and P-Norm Convolution
- Literature

This is a tutorial on how to implement the convolution operation or faltung on discretized input. We will outline standard, p-norm, and max-norm convolution and see how to utilize the FFT to reduce the computational runtime from cubic to $\mathcal{O}(n\log n)$ for p-norm and max-norm convolution.

Convolution is a kernel function with various applications in arithmetics, probability theory, signal processing in physics, electrical engineering, and image processing in computer vision. The recent development of convolutional neural networks ( see tutorial on deep learning) for learning image data is one of the most exciting applications.

Convolution, also known as

*faltung* or superposition integral, is a binary operator which measures the amount of overlap of a continuous function $\phi$ as it is shifted over another function $\psi$.
$$
\phi *\psi = \int_{-\infty}^{\infty}\phi(\tau)\psi(t-\tau) d\tau
$$
For many real-world examples the ranges are finite and $\phi$ and $\psi$ are either unknown or not solvable analytically in an efficient manner. In the first case we may have a set of measured values, in the latter case, we can easily transform the signal into a discrete series by sampling from the continuous function.
We therefore continue with the discrete version of the convolution. Assume we have two input vectors $L\in\mathbb{C}^m$ and $R\in\mathbb{C}^n$, the standard convolution $*$ is defined as
$$
(L*R)[m] \equiv \sum_lL[m]\cdot R[m-l].
$$
Many authors split the above convolution operation at position $m$ logically into two parts. In the first part a vector $u$ is constructed as the point-wise product of every non-zero overlap position of input vectors $L$ and reversed $R$ (see Figure bla).
\begin{equation}
u[m] = L[0:m]\cdot R[m:-1:0]
\end{equation}
In the second part vector $u$ is reduced to scalar. The type of utilized reduction operator gives the convolution its precise name: instead of the sum we may plug in a p-norm or a Tchebychev norm.

Reduction function | Convolution Type |

$\sum_i u_i$ | standard |

$\left\lVert u \right\rVert _p$ | p-norm |

$\left\lVert u \right\rVert _{\infty} = \max (u)$ | max-norm or Tchebychev norm |

We have $|L|+|R|-1$ overlap shift positions and a maximal overlap size of $\min (|L|, |R|)$. Let $n$ be the size of the larger vector, the costs for constructing a single $u[m]$ are $O(n)$. Constructing $\vec{u}$ for all positions $l\in [0:|L|+|R|-2]$ the naive way is therefore $O(n^2)$
In the standard convolution we compute the scalar product for all overlaps, i.e. we sum up $u[m]$:
\begin{equation}
(L*R)[m] = \sum u[m] = \sum_lL[m]\cdot R[m-l]
\end{equation}
Any linear operation on $u$ will not change the runtime. For the max and p-norm convolution the addition is replaced by the maximum or p-norm function.
The exact method is to compute a $u$ vector for each overlap shift of $L$ and $R$ and determine the maximum. But even omitting explicit storage will not help us to improving runtime.

The convolution theorem states that under suitable conditions the Fourier transform of a convolution is the pointwise product of Fourier transforms:
$$
\mathcal{F}\{L*R\} = \mathcal{F}\{L\}\cdot \mathcal{F}\{R\}
$$
Applying the inverse of the Fourier transform yields
$$
L*R = \mathcal{F}^{-1}\big \{ \mathcal{F}\{L\}\cdot \mathcal{F}\{R\}\big \}
$$

Together with zero padding to the next power of 2, we can use the Cooley-Tuckey FFT to compute the convolution in $\mathcal{O}(n\log n)$. The circular convolution of two vectors is then found by taking an FFT of each, and then applying an inverse FFT on their point-wise product.

In order to utilize the faster runtime of the FFT, note that FFT always fuses information in contrast to a maximum entry selection where information gets lost. On the one hand we need this feature of the FFT, on the other we want that larger entries in $L$ and $R$ to dominate. The trick is to take the power and revert it at the end of the computation. Maximizing $p$ is exactly the Chebyshev norm or supremum norm:
\begin{equation}
max(u) = \lvert\lvert u\rvert\rvert_\infty = \lim_{p\rightarrow\infty}|| u||_p
\end{equation}
with $||u||_p$ defined as $(\sum u^p)^{1/p}$ in the discrete case. We can get close (error in range of machine precision) to the exact result, by chosing a $p^*$ large enough, s.t. $||u||_\infty \in \big [ ||u||_\infty-\epsilon; ||u||_\infty+\epsilon\big ]$ (see Pfeuffer et al. on how to choose *p*).

The code below demonstrates how to use the faster p-norm convolution to approximate a max convolution:

#### Literature