Marie Hoffmann

Tutorial on Fast Convolution

  1. Introduction
  2. Convolution
  3. Fourier Transform and Convolution
  4. Max- and P-Norm Convolution
  5. Literature


This is a tutorial on how to implement the convolution operation or faltung on discretized input. We will outline standard, p-norm, and max-norm convolution and see how to utilize the FFT to reduce the computational runtime from cubic to $\mathcal{O}(n\log n)$ for p-norm and max-norm convolution.

Convolution is a kernel function with various applications in arithmetics, probability theory, signal processing in physics, electrical engineering, and image processing in computer vision. The recent development of convolutional neural networks ( see tutorial on deep learning) for learning image data is one of the most exciting applications.


Convolution, also known as faltung or superposition integral, is a binary operator which measures the amount of overlap of a continuous function $\phi$ as it is shifted over another function $\psi$. $$ \phi *\psi = \int_{-\infty}^{\infty}\phi(\tau)\psi(t-\tau) d\tau $$ For many real-world examples the ranges are finite and $\phi$ and $\psi$ are either unknown or not solvable analytically in an efficient manner. In the first case we may have a set of measured values, in the latter case, we can easily transform the signal into a discrete series by sampling from the continuous function. We therefore continue with the discrete version of the convolution. Assume we have two input vectors $L\in\mathbb{C}^m$ and $R\in\mathbb{C}^n$, the standard convolution $*$ is defined as $$ (L*R)[m] \equiv \sum_lL[m]\cdot R[m-l]. $$ Many authors split the above convolution operation at position $m$ logically into two parts. In the first part a vector $u$ is constructed as the point-wise product of every non-zero overlap position of input vectors $L$ and reversed $R$ (see Figure bla). \begin{equation} u[m] = L[0:m]\cdot R[m:-1:0] \end{equation} In the second part vector $u$ is reduced to scalar. The type of utilized reduction operator gives the convolution its precise name: instead of the sum we may plug in a p-norm or a Tchebychev norm.
Reduction functionConvolution Type
$\sum_i u_i$standard
$\left\lVert u \right\rVert _p$p-norm
$\left\lVert u \right\rVert _{\infty} = \max (u)$max-norm or Tchebychev norm
We have $|L|+|R|-1$ overlap shift positions and a maximal overlap size of $\min (|L|, |R|)$. Let $n$ be the size of the larger vector, the costs for constructing a single $u[m]$ are $O(n)$. Constructing $\vec{u}$ for all positions $l\in [0:|L|+|R|-2]$ the naive way is therefore $O(n^2)$ In the standard convolution we compute the scalar product for all overlaps, i.e. we sum up $u[m]$: \begin{equation} (L*R)[m] = \sum u[m] = \sum_lL[m]\cdot R[m-l] \end{equation} Any linear operation on $u$ will not change the runtime. For the max and p-norm convolution the addition is replaced by the maximum or p-norm function. The exact method is to compute a $u$ vector for each overlap shift of $L$ and $R$ and determine the maximum. But even omitting explicit storage will not help us to improving runtime.

Fourier Transform and Convolution

The convolution theorem states that under suitable conditions the Fourier transform of a convolution is the pointwise product of Fourier transforms: $$ \mathcal{F}\{L*R\} = \mathcal{F}\{L\}\cdot \mathcal{F}\{R\} $$ Applying the inverse of the Fourier transform yields $$ L*R = \mathcal{F}^{-1}\big \{ \mathcal{F}\{L\}\cdot \mathcal{F}\{R\}\big \} $$

Together with zero padding to the next power of 2, we can use the Cooley-Tuckey FFT to compute the convolution in $\mathcal{O}(n\log n)$. The circular convolution of two vectors is then found by taking an FFT of each, and then applying an inverse FFT on their point-wise product.

Max- and P-Norm Convolution

In order to utilize the faster runtime of the FFT, note that FFT always fuses information in contrast to a maximum entry selection where information gets lost. On the one hand we need this feature of the FFT, on the other we want that larger entries in $L$ and $R$ to dominate. The trick is to take the power and revert it at the end of the computation. Maximizing $p$ is exactly the Chebyshev norm or supremum norm: \begin{equation} max(u) = \lvert\lvert u\rvert\rvert_\infty = \lim_{p\rightarrow\infty}|| u||_p \end{equation} with $||u||_p$ defined as $(\sum u^p)^{1/p}$ in the discrete case. We can get close (error in range of machine precision) to the exact result, by chosing a $p^*$ large enough, s.t. $||u||_\infty \in \big [ ||u||_\infty-\epsilon; ||u||_\infty+\epsilon\big ]$ (see Pfeuffer et al. on how to choose p).

The code below demonstrates how to use the faster p-norm convolution to approximate a max convolution: