fast 1d convolution algorithms - drone research
TRANSCRIPT
![Page 1: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/1.jpg)
P. Giannakeris, P. Bassia, N. Kilis, R.T. ChadoulisProf. Ioannis Pitas
Aristotle University of Thessaloniki
www.aiia.csd.auth.grVersion 2.6
Fast 1D Convolution
Algorithms
![Page 2: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/2.jpg)
Outline
β’ Signals and systems
β’ 1D convolutions
Linear & Cyclic 1D convolutions
Discrete Fourier Transform, Fast Fourier Transform
Winograd algorithm
Nested convolutions
Block convolutions.
β’ Machine learning
Convolutional neural networks.
![Page 3: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/3.jpg)
Motivationβ’ Machine learning
Fast implementation of 1D/2D/3D convolutions in
Convolutional Neural Networks (CNNs).
β’ Fast implementation of 1D digital filters
1D signal filtering (e.g., audio/music, ECG, EEG)
1D Signal feature calculation
β’ Fast implementation of 1D correlation
1D template matching
Time-of-flight (distance) calculation (e.g., sonar)
![Page 4: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/4.jpg)
Motivationβ’ Fast implementation of 2D/3D convolutions
Image/video filtering
Image/video feature calculation
β’ Gabor filters
β’ Spatiotemporal feature calculation
β’ Fast implementation of 2D correlation
Template matching
Correlation tracking
![Page 5: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/5.jpg)
1D Linear Systems
β’ Linearity:
π ππ₯1 + ππ₯2 = ππ π₯1 + ππ π₯2 .
β’ Shift-Invariance:
π¦(π) = π[π₯(π)] β
π¦ π βπ = π π₯ π βπ .
LSI system convolution : π¦ π = β π β π₯ π .
π[π₯(π)]π₯(π) π¦(π)
![Page 6: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/6.jpg)
1D Linear Systems
![Page 7: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/7.jpg)
Linear 1D convolution
β’ The one-dimensional (linear) convolution of:
β’ an input signal π₯ of length πΏ and
β’ a convolution kernel β (filter mask, finite impulse response) of length π:
π¦ π = β π β π₯ π =
π=0
πβ1
β π π₯ π β π .
β’ For a convolution kernel centered around 0 and π = 2π£ + 1,
convolution takes the form:
π¦ π = β π β π₯ π =
π=βπ£
π£
β π π₯ π β π .
![Page 8: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/8.jpg)
Linear 1D convolution
β’ The one-dimensional (linear) convolution is a linear operator.
β’ Signals π₯, β can be interchanged:
π¦ π = β π β π₯ π = π₯ π β β π .
β’ It can model any linear shift-invariant system.
β’ It has a geometrical interpretation:
β’ Signal is π₯ π flipped around π = 0 producing π₯ βπ .
β’ Flipped signal π₯ βπ is shifted by π samples, producing π₯ π β π .
β’ Products β π π₯ π β π , π = 0,β¦ ,π β 1 are calculated and summed.
β’ This is repeated for other temporal shifts π.
![Page 9: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/9.jpg)
Linear 1D convolution β Example
![Page 10: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/10.jpg)
Linear 1D convolution β Example
![Page 11: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/11.jpg)
Linear 1D convolution β Example
![Page 12: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/12.jpg)
Linear 1D convolution β Example
![Page 13: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/13.jpg)
Linear 1D correlation
β’ Correlation of template signal β and input signal π₯ π (inner
product):
πβπ₯ π =
π=0
πβ1
β π π₯ π + π = πππ π .
β’ π = β 0 ,β¦ , β π β 1 π: template vector.
β’ π π = [π₯ π , ..., π₯ π + Ξ β 1 ]Ξ€ βΆ local signal vector.
![Page 14: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/14.jpg)
Linear 1D correlation
Differences from convolution:
β’ π₯ π is not flipped around π = 0 .
β’ It is often confused with convolution: they are identical only
if β is centered at and is symmetric about π = 0.
β’ It is used for:
β’ 1D template matching;
β’ Time-delay estmation.
![Page 15: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/15.jpg)
Linear 1D convolution: matrix
notation
β’ Vectorial convolution input/output, kernel representation:
π± = π₯ 0 ,β¦ , π₯(πΏ β 1) π: the first input vector.
π‘ = β 0 ,β¦ , β(π β 1) π: the second input vector.
π² = π¦ 0 ,β¦ , π¦ π β 1 π: the output vector, with π = πΏ +π β 1.
β’ 1D linear convolution between two discrete signals π±, π‘ can be
expressed as the matrix-vector product:π² = ππ±,
where π is a π Γ πΏ matrix.
![Page 16: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/16.jpg)
Linear 1D convolution: matrix
notation
β’ π : a π Γ πΏ band matrix of the form:
π― =
β 0 0 β― 0β(1) β(0) β― β―
β― β― β― 0β(πβ1) β πβ1 β― β(0)
0 0 β― β(1)β― β― β― β―
0 0 β― β(πβ1)
.
β’ Alternative matrix notation: π² = ππ‘, where π is an N Γπ matrix.β’ Fast calculation of the product π² = ππ± using BLAS/cuBLAS.
![Page 17: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/17.jpg)
Optimal algorithms: minimal
multiplicative complexityβ’ As multiplications are more expensive than additions, optimal algorithms
having minimal multiplicative complexity have been designed.
β’ Example: Complex number multiplication normally involves four
multiplications and two additions:
π + πΌπ = π + ππ π + ππ = ππ β ππ + ππ + ππ π.β’ Gauss trick: a way to reduce the number of multiplications to three
(optimal algorithm): ππ, ππ, π + π π + π ,as: πΌ = (π + π)(π + π) β ππ β ππ.
β’ Other formulation:
π1 = π π + π , π2= π π β π , π3 = π π + π ,π = π1 β π3,πΌ = π1 + π2.
![Page 18: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/18.jpg)
Optimal Linear Convolution
Algorithmsβ’ Winograd theory for optimal linear convolution algorithms:
β’ Minimal number of multiplications.
β’ The computation of the first π outputs π¦ 0 ,β¦ , π¦(π β 1) of an π-tap filter
can be written as:
π¦ 0
π¦ 1β―
π¦(π β 1)
=
π₯ 0 π₯ 1 β― π₯(πβ1)
π₯ 1 π₯ 2 β― π₯(π)β― β― β―
π₯(πβ1) π₯ π β― π₯ π+πβ2
β 0β 1β―
β πβ1
.
β’ Notation consistent with literature.
![Page 19: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/19.jpg)
Optimal Linear Convolution
Algorithms
β’ Linear convolution can be written as a product π² = ππ‘, where π is an π Γπ matrix:
πππ = π₯(π + π) , 0 β€ π β€ π β 1, 0 β€ π β€ π β 1.
β’ Example: for input π₯ π with size π = 4, kernel β π with size π = 3 and
output π¦ π with size π = 2, the matrix vector product:
π¦ 0
π¦ 1=
π₯ 0 π₯ 1 π₯ 2π₯ 1 π₯ 2 π₯ 3
β 0β 1β 2
,
requires 2 Γ 3 = 6 multiplications and 4 additions.
![Page 20: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/20.jpg)
Optimal Linear Convolution
Algorithmsβ’ Example: optimal Winograd linear convolution algorithm.
β’ 8 additions and 4 multiplications.
β’ 3 additions on β can be pre-calculated:
π¦ 0
π¦ 1=
π₯ 0 π₯ 1 π₯ 2π₯ 1 π₯ 2 π₯ 3
β 0β 1β 2
=π1 +π2 +π3
π2 βπ3 βπ4,
where:
π1 = π₯ 0 β π₯ 2 β 0 , π2= π₯ 1 + π₯ 2β 0 +β 1 +β 2
2,
π4 = π₯ 1 β π₯ 3 β 2 , π3 = π₯ 2 β π₯ 1β 0 ββ 1 +β 2
2.
![Page 21: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/21.jpg)
Optimal Linear Convolution
Algorithms
Intermediate addition result is used 2 times.
![Page 22: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/22.jpg)
Optimal Linear Convolution
Algorithmsβ’ Winograd linear convolution algorithm requires π + π β 1multiplications,
π and π: lengths of π¦ and β, respectively.
β’ General form of optimal Winograd linear convolution algorithms:
π² = πT ππ‘ β πππ± ,
β indicates element-wise π + π β 1multiplications.
π±, π‘, π²: input signal, filter coefficient and output signal vectors.
![Page 23: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/23.jpg)
Optimal Linear Convolution
Algorithms-ExampleInput signal, filter coefficient, output signal vectors for π = 3 and π = 2 :
π± = [π₯ 0 π₯ 1 π₯ 2 π₯ 3 ] π, π‘ = [β 0 β 1 β 2 ] π , π² = [π¦ 0 π¦ 1 ] π.
The involved matrices are:
πT =
1 00 1
β1 01 0
0 β10 1
1 00 β1
, π =
1 0 0Ξ€1 2 Ξ€1 2 Ξ€1 2Ξ€1 2 Ξ€β1 2 Ξ€1 20 0 1
,
πT =1 1 1 00 1 β1 β1
.
![Page 24: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/24.jpg)
Cyclic 1D convolution
β’ One-dimensional cyclic convolution of length π :
π¦ π = π₯ π β β π =
π=0
πβ1
β π π₯ π β π π ,
(π)π= π mod π.
β’ It is of no use in modeling linear systems.
β’ Important use: Embedding linear convolution in a fast cyclic convolution
π¦ π = π₯ π β β π of length π β₯ πΏ +π β 1 and then performing a cyclic
convolution of length π.
![Page 25: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/25.jpg)
Linear convolution embeddingβ’ Zero-padding signal vector π± to length π: π±p = π₯ 0 ,β¦ , π₯ π β 1 , 0, β¦ , 0 π.
β’ Zero-padding convolution kernel vector π‘ to length π: π‘p = β 0 ,β¦ , β πΏ β 1 , 0,β¦ , 0 π.
β’ Linear convolution embedding in a cyclic convolution of length π = πΏ +π β 1:
![Page 26: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/26.jpg)
Cyclic 1D convolution - Example
Cyclic convolution of
π₯(π) = [1 2 0] and
β π = 3 5 4 .
Clock-wise
Anticlock-wise
Folded sequence
0 π ππππ 1 π πππ 2 π ππππ
1
20
3
5 4
3
5 4 3
5
4
4
3 50 0 02 22
1 1 1
π¦ 0 = 1 Γ 3 + 2 Γ 4 + 0 Γ 5 π¦ 1 = 1 Γ 5 + 2 Γ 3 + 0 Γ 4 π¦ 2 = 1 Γ 4 + 2 Γ 5 + 0 Γ 3
![Page 27: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/27.jpg)
Cyclic 1D convolution: matrix
notationβ’ Cyclic convolution definition as matrix-vector product:
π² = ππ±,
where:
π± = π₯ 0 ,β¦ , π₯ π β 1 π: the input vector.
π² = π¦ 0 ,β¦ , π¦ π β 1 π: the output vector.
π : a N Γ N Toeplitz matrix of the form:
π― =
β(0) β(1) β(2)β(π β 1) β(0) β(1)β(π β 2) β(π β 1) β(0)
β―β―β―
β(π β 1)β(π β 2)β(π β 3)
β― β― β― β― β―β(1) β(2) β(3) β― β(0)
.
![Page 28: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/28.jpg)
Cyclic Convolution via DFT
β’ Cyclic convolution calculation using 1D Discrete Fourier Transform (DFT):
π² = πΌπ·πΉπ π·πΉπ π± βπ·πΉπ π‘ .
β’ Fast calculation of DFT, IDFT through an FFT algorithm.
π·πΉπ
π·πΉπ
πΌπ·πΉπ
π₯(π)
β(π)
π(π)
π»(π)
Y(π) π¦(π)
![Page 29: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/29.jpg)
1D FFT
β’ There are various FFT algorithms to speed up the calculation of DFT.
β’ The best known is the radix-2 decimation-in-time (DIT) Fast FourierTransform (FFT) (Cooley-Tuckey).
1. DFT of a sequence π₯ π of length π (π = 0,β¦ ,π β 1):
π π = Οπ=0πβ1 π₯ π πβ
2ππ
πππ , π = 0,β¦ ,π β 1.
π-th complex roots of unity: πππ = πβ
2ππ
ππ, π = 0,β¦ ,π β 1.
![Page 30: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/30.jpg)
1D FFT2. The Radix-2 DIT algorithm rearranges the DFT:
π(π) =
π=0
π2β1
π₯ 2π πβ2πππ2
ππ
π·πΉπ ππ ππ£ππβπππππ₯ππ ππππ‘ ππ π₯(π)
+πβ2ππππ
π=0
π2β1
π₯ 2π + 1 πβ2πππ2
ππ
π·πΉπ ππ πππβπππππ₯ππ ππππ‘ ππ π₯(π)
= πΈ(π) + πβ2πππππ(π).
3. The complex exponential is periodic which means that:
π(π +π
2) =
π=0
π2β1
π₯ 2π πβ2ππΞ€π 2π(π+
π2) +πβ
2πππ (π+
π2)
π=0
π2β1
π₯ 2π + 1 πβ2ππΞ€π 2π(π+
π2)
=
π=0
π2β1
π₯ 2π πβ2ππΞ€π 2πππβ2πππ +πβ
2πππ π πβππ
π=0
π2β1
π₯ 2π + 1 πβ2ππΞ€π 2πππβ2πππ
=
π=0
π2β1
π₯ 2π πβ2ππΞ€π 2ππ
βπβ2πππ π
π=0
π2β1
π₯ 2π + 1 πβ2ππΞ€π 2ππ
= πΈ(π) β πβ2πππ ππ(π).
![Page 31: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/31.jpg)
1D FFT
4. We can rewrite π as :
π π = πΈ π + πβ2πππ ππ π .
π π +π
2= πΈ π β πβ
2πππ ππ π .
5. By inserting a recursion, we can compute in that way π(π) and π(π +π
2) as well.
The algorithm gains its speed by re-using the results of intermediate computations to
compute multiple DFT outputs.
![Page 32: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/32.jpg)
1D FFTβ’ radix-2 FFT breaks a length-N DFT into many size-2 DFTs called
"butterfly" operations.
β’ There are log2N FFT
stages.
![Page 33: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/33.jpg)
Radix-2 1D FFT properties
β’ DFT length π should be a power of 2.
β’ The nice FFT structure is based on the properties of the π-th
complex roots of unity πππ = πβ
2ππ
ππ, π = 0,β¦ , π β 1.
β’ Computational complexity of the 1D FFT is π ππππ2π .
β’ Computational complexity of the 1D FFT is π(π2).
β’ Other FFT algorithms: Radix-4 FFT (π is power of 4),
Prime Factor Algorithm (PFA) π = ππ, π, π co-prime numbers.
![Page 34: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/34.jpg)
Cyclic 1D convolution
computation
β’ Its computation by definition requires:
β’ Two nested for loops
β’ Computational complexity (number of multiplications) is π(π2).
β’ Computational complexity through two 1D FFTs and one inverse 1D
FFT is π ππππ2π .
β’ Fast linear convolution calculation through cyclic convolution
embedding.
![Page 35: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/35.jpg)
π transform
π π§ =
π=0
πβ1
π₯ π π§βπ .
The π transform of a discrete signal π₯(π) having domain [0, β¦ , π] is
given by:
The domain of π transform is the complex plane, since π§ is a complex
number.
Convolution property of the π transform (polynomial product π(π§)π»(π§)):
π¦(π) = π₯(π) β β(π) β π(π§) = π(π§)π»(π§).
![Page 36: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/36.jpg)
Cyclic convolution and π transform
π¦ π = π₯ π β β π =
π=0
πβ1
β π π₯ π β π π ,
where : (π)π = π mod π.
π¦ π = π₯ π β β π < => π π§ = π π§ π» π§ mod π§π β 1.
Polynomial product form of the 1D cyclic convolution:
![Page 37: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/37.jpg)
Convolutions as polynomial
products-examplesβ’ 1D linear convolution π¦ π = β π β π₯ π of length πΏ = π = 3:
β’ 1D cyclic convolution π¦ π = π₯ π β β πof length π = 3:
π π§ = π π§ π» π§ mod π§3 β 1= π₯ 0 + π₯ 1 π§ + π₯ 2 π§2 β 0 + β 1 π§ + β 2 π§2 mod π§3 β 1.
Factorization: π§3 β 1 = π1 π§ π2 π§ = (π§ β 1)(π§2 + π§ + 1)(π£ = 2 factors).
π π§ = π π§ π» π§ = π₯ 0 + π₯ 1 π§ + π₯ 2 π§2 β 0 + β 1 π§ + β 2 π§2 .
![Page 38: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/38.jpg)
Chinese Remainder Theorem (CRT)If we have a polynomial Y(z) and:
π π§ = π§π β 1, where the Greatest Common Divider πΊπΆπ· ππ π§ , ππ π§ = 1 for π β π.
π π§ =ΰ·
π=1
Ξ½
ππ π§ .
ππ π§ = π π§ mod ππ(π§).
0 < πππ π π§ β€ Οπ=1π πππ[ππ(π§)] , where deg is the degree of a polynomial.
π π π§ = πππ mod ππ(π§), 1 β€ π β€ π.
Then the polynomial π π§ is given by: π π§ = Οπ=1Ξ½ π π π§ ππ π§ mod π§π β 1
![Page 39: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/39.jpg)
From CRT to Winograd
convolution algorithm
π(π§) = π»(π§)π(π§) mod π§π β 1.
A 'divide-and-conquer' approach can be used for creating fast convolution
algorithms, by factorizing polynomial π§π β 1 to its prime factors ππ π§ .
Then, we calculate first the items:
ππ π§ = π π§ mod ππ π§ , π = 1,β¦ , ππ¨π π§ = π¨ π§ mod ππ π§ , π = 1,β¦ , π
and their products:
ππ π§ = ππ π§ π»π π§ mod ππ π§ , π = 1,β¦ , π.
CRT helps us to create fast convolution algorithms by embedding linear
convolutions in cyclic ones:
![Page 40: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/40.jpg)
Winograd algorithm
Fast 1D cyclic convolution with
minimal complexityβ’ Winograd convolution algorithms or fast
filtering algorithms:
π² = π ππ±β¨ππ‘ .
β’ They require only 2π β π£ multiplications in
their middle vector product, thus having
minimal complexity.
β’ π: number of cyclotomic polynomial
factors of polynomial π§π β 1 over the
rational numbers π.
![Page 41: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/41.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( )( )3
3 3 2
1 3
/
1 ( ) 1 ( ) ( ) 1 1 1N
N
d
c N
z c z z c z c z z z z z=
β = β = β = β + +
( ) ( )3
3( ) ( ) ( ) mod 1 ( ) ( ) ( ) mod 1N
NY z H z X z z Y z H z X z z=
= β = β
1 232
0 1 2
0 0
( ) ( ) ( )N N
n n
n n
n m
X z x z X z x z X z x x z x zβ =
= =
= = = + +
1 232
0 1 2
0 0
( ) ( ) ( )N N
n n
n n
n m
H z h z H z h z H z h h z h zβ =
= =
= = = + +
( )( )2( ) ( ) ( ) mod 1 1Y z H z X z z z z= β + +
I. Definitions:
![Page 42: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/42.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( ) ( ) ( )1
2
1 0 1 2 1 0 1 2mod mod 1i
i iX X z P X x x z x z z X x x x=
= = + + β = + +
( ) ( ) ( ) ( )2
2 2
2 0 1 2 2 0 2 1 2mod 1i
X x x z x z z z X x x x x z=
= + + + + = β + β
( ) ( )1 0 1 2 0 2 0 2 1 2 1 2Similarly for H : and i H h h h b H h h h h z b b z= + + = = β + β = +
0 1 2 3 1 0 2 2 1 2Let , and :a x x x a x x a x x= + + = β = β 1 0 2 1 2 and X a X a a z= = +
II. Computing the reduced Polynomials ππ , π»π:
![Page 43: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/43.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( ) ( ) ( ) ( )1
1 0 0 1 0 0mod ( ) mod 1i
i i i iY X z H z P z Y a b z Y a b=
= = β =
( ) ( )
( ) ( ) ( )( ) ( )
2
2 2 2 2
2
2 1 2 1 2
2 1 1 2 2 1 2 2 1 2 2
mod ( )
mod 1
i
Y X z H z P z
Y a a z b b z z z
Y a b a b z a b a b a b
=
=
= + + + +
= β + + β
( ) ( )1 2 2 1 2 2 1 2 1 2 1 1 2One can see, that: , thus, Y becomes: a b a b a b a a b b a b+ β = β β β +
( ) ( ) ( )2 1 1 2 2 1 2 1 2 1 1Y a b a b z a a b b a b= β + β β β +
( ) ( )0 0 0 1 1 1 2 2 2 1 2 1 2 1 1Let c , c and ca b a b a b a a b b a b= = β = β β β +
1 0 2 1 2 and Y c Y c c z= = +
III. Computing the products ππ:
6 multiplications
4 multiplications
![Page 44: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/44.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( ) ( ) 233
1 1 1
3 3 1
3 3
ppp c z c z z z
R R Rp
=β β β β β= = =
( ) ( ) ( ) ( )0 1 2
2 32
0 1 2 0 1 2 0 1 2
1
1mod 1 2 2
3
NN
i i
iy y y
Y RY z Y c c c c c c z c c c z=
=
= β = = + β + β + + β β
IV. Using the Chinese Remainder Theorem (CRT) to Compute π:
( ) ( ) 233
2 2 2
1
3 3
ppc z c z z z
R R Rp
= + += = =
![Page 45: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/45.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3V. Finding Matrices π΄ and π΅:
0 0 1 20
0
1 0 21
1
2 1 22
2
3 0 11 2
1 1 1
1 0 1
0 1 1
1 1 0
x x x xax
x x xaA A x A
x x xax
x x xa a
+ + β β = = = β β
ββ β
Similarly, we get exactly the same result for matrix π΅!
![Page 46: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/46.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( )
( ) ( )
( ) ( )
( ) ( )
1 0 1 2 0 0 1 1 2 2 1 2 1 2
2 0 1 2 0 0 1 1 2 2 1 2 1 23 1
3 0 1 2 0 0 1 1 2 2 1 2 1 2
2 2
2 2
2
y c c c a b a b a b a a b b
Y y c c c a b a b a b a a b b
y c c c a b a b a b a a b b
+ β + β + β β
= = β + = + + β β β β β β + + β β
( ) ( ) ( )
( )( )
0 0
0 0
1 1
1 14 1
2 2
2 2
1 2 1 2
1 1 1 1 1 1
1 0 1 1 0 1
0 1 1 0 1 1
1 1 0 1 1 0
a bx h
a bM A x B h x h
a bx h
a a b b
β β = = = β β β ββ β
VI. Finding matrix πΆ:
![Page 47: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/47.jpg)
Example: Winograd Cyclic
convolution Algorithm for N=3
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )( )( )
3 1 3 4 4 1
0 0
0 0 1 1 2 2 1 2 1 2 00 01 02 03
1 1
0 0 1 1 2 2 1 2 1 2 10 11 12 13
2 2
0 0 1 1 2 2 1 2 1 2 20 21 22 23
1 2 1 2
2
2
2
1 1 2 1
1 1
Y C M
a ba b a b a b a a b b C C C C
a ba b a b a b a a b b C C C C
a ba b a b a b a a b b C C C C
a a b b
C
=
+ β + β β
+ + β β β = β + + β β β β
β
= 1 2
1 2 1 1
β β
VI. Finding matrix πΆ:
Theoretically minimal number of multiplications: π!
![Page 48: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/48.jpg)
Winograd algorithm version
with minimal computational
complexity β’ Can be equivalently expressed as:
π² = ππT ππ±β¨πTππ‘ .
β’ Matrices π, π typically have elements 0,+1,β1.
β’ Multiplications πTππ‘, ππTπ²β² are done only by
additions/subtractions.
β’ π is an π Γ π permutation matrix.
β’ πh can be precomputed.
![Page 49: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/49.jpg)
Block diagram: Winograd Cyclic
convolution Algorithm for N=3
![Page 50: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/50.jpg)
Block diagram: Winograd Cyclic
convolution Algorithm for N=3
![Page 51: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/51.jpg)
Winograd algorithm
Fast 1D cyclic convolution with
minimal complexity
β’ Winograd algorithm works on small blocks of the input signal.
β’ The input block and filter are transformed.
β’ The outputs of the transform are multiplied together in an element-
wise fashion.
β’ The result is transformed back to obtain the outputs of the
convolution.
β’ GEneral Matrix Multiplication (GEMM) BLAS or CUBLAS routines can
be used.
![Page 52: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/52.jpg)
Nested convolutions
β’ Winograd algorithms exist for relatively short convolution lengths,
e.g.: π = 3, 5, 7.
β’ Use of efficient short-length convolution algorithms iteratively to
build long convolutions.
β’ Does not achieve minimal multiplication complexity.
β’ Good balance between multiplications and additions.
Decomposition of 1D convolution into a 2D convolution:
β’ 1D convolution of length: π = π1π2β’ with π1, π2 co-prime integers, π1, π2 = 1
β’ results into a 2D π1 Γ π2 convolution.
![Page 53: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/53.jpg)
Nested convolutions
β’ Remember circular convolution definition:
π¦(π) =
π=0
πβ1
β π π₯( π β π π), π = 0, . . , π β 1.
β’ If π = π1π2, π and π can be analyzed as:
π β‘ π1π2 + π2π1 mod π1π2 π1, π1 = 0,β¦ ,π1 β 1.π β‘ π1π2 + π2π1 mod π1π2 π2, π2 = 0,β¦ ,π2 β 1.
β’ 1D circular convolution becomes now a 2D π1 Γ π2 convolution:
π¦(π1π2 + π2π1) =
π1=0
π1β1
π2=0
π2β1
β π1π2 +π2π1 π₯(π1(π2βπ2) + π2(π1 β π1)).
![Page 54: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/54.jpg)
Nested convolutions
β’ The number of multiplications required are π1π2, where, π1 and
π2 is the number of multiplications for convolutions π1 and π2accordingly.
β’ The same applies for the reverse decomposition: π2 Γ π1.
β’ However, the number of additions varies over the nesting order :π1 Γ π2: π΄1π2 +π1π΄2π2 Γ π1: π΄2π1 +π2π΄1,
with π΄1 and π΄2 being the number of additions required for π1 and π2accordingly.
β’ Additions must be calculated beforehand, so as to choose the
nesting order.
![Page 55: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/55.jpg)
Nested convolutionsFor example, lets assume we have the circular convolution
of length π = 12 and π1 = 3,π2 = 4 :
then,π1, π1 = 0,1,2π2, π2 = [0, 1, 2, 3]
and
π0 π§ = π₯0 + π₯3π§ + π₯6π§2 + π₯9π§
3
π1 π§ = π₯4 + π₯7π§ + π₯10π§2 + π₯1π§
3
π2 π§ = π₯8 + π₯11π§ + π₯2π§2 + π₯5π§
3
π»0 π§ = β0 + β3π§ + β6π§2 + β9π§
3
π»1 π§ = β4 + β7π§ + β10π§2 + β1π§
3
π»2 π§ = β8 + β11π§ + β2π§2 + β5π§
3
![Page 56: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/56.jpg)
Nested convolutions
π and π can be defined now as 3Γ4 matrices
with the polynomial terms as elements:
πΏ =
π₯0 π₯3 π₯6 π₯9π₯4 π₯7 π₯10 π₯1π₯8 π₯11 π₯2 π₯5
π― =
β0 β3 β6 β9β4 β7 β10 β1β8 β11 β2 β5
![Page 57: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/57.jpg)
Nested convolutions
β’ We use Winograd 3-point circular convolution C, A, and B matrices at
the outer nesting layer:
π² = π ππβ¨ππ
β’ The matrix-vector products ππ and ππ, are replaced by matrix
multiplication ππ and ππ respectively, each one resulting to a 4Γ4
matrix.
β’ We use a fast 4-point Winograd routine to calculate the 4 circular
convolutions ππ β¨ ππ (one per row). The 4-point routine is βnestedβ
inside the 3-point routine.
β’ The correct order of the elements in the result can be found based on
the nesting formulation:
π¦π1π2+π2π1 .
![Page 58: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/58.jpg)
Block Based convolutions
β’ Input signal x π is split in overlapping/non-overlapping blocks.
β’ Blocks are convolved independently.
β’ Great parallelism is achieved.
β’ Two block-based convolution methods:
β’ Overlap-add method
β’ Overlap-save method.
![Page 59: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/59.jpg)
Overlap-add method
It uses the following signal processing principles:
β’ The linear convolution of two discrete-time signals of length πΏ and πresults in a discrete-time convolved signal of length π = πΏ +π β 1.
β’ Additivity:
[π₯1 π + π₯2 π ] β β π = π₯1 π β β π + π₯2 π β β π .
![Page 60: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/60.jpg)
Overlap-add method
1. Break the input signal π₯(π) into non-overlapping blocks π₯π π of length πΏ.
2. Zero pad β π to be of length π = πΏ +π β 1.
Calculate the π·πΉπ π»(π),π = 0,1,β¦ ,π β 1.
3. For each block π:
β’ Zero pad π₯π π to be of length π = πΏ +π β 1. Calculate the DFT ππ π .
β’ Use the IDFT of π» π ππ π for block output π¦π π ,π = 0,1,β¦ ,π β 1.
4. Form overall output π¦ π by overlapping the last π β 1 samples of π¦π π with the
first π β 1 samples π¦π+1 π and adding the result.
![Page 61: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/61.jpg)
Overlap-add methodπΏ πΏπΏ
π₯2 π
π₯3 π
π₯1 π
π¦2 π
π¦1 π
π¦3 π
(π β 1) zeros
(π β 1) zeros
(π β 1) zeros
(π β 1) points
(π β 1) points
(π β 1) points
(π β 1) points
(π β 1) points
![Page 62: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/62.jpg)
Overlap-add method
![Page 63: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/63.jpg)
Overlap-add method
![Page 64: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/64.jpg)
Overlap-save method
It uses the following signal processing principles:
β’ The linear convolution of two discrete-time signals of length π and πresults in a discrete-time convolved signal of length π = πΏ +π β 1.
β’ Time-Domain Aliasing:
π₯π π =
π=ββ
β
π₯πΏ π β ππ , π = 0,1, β¦ , π β 1.
![Page 65: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/65.jpg)
Overlap-save method
Overlap-Save method: 1. Pad π β 1 zeros at the beginning of the input sequence π₯(π).2. Break the padded input signal into overlapping blocks π₯π π of length π = πΏ +π β 1, where the overlap length is π β 1.
3. Zero-pad β(π) to be of length π = πΏ +π β 1.
Calculate the π·πΉπ π»(π),π = 0,1,β¦ ,π β 1.
4. For each block π:β’ Calculate the π·πΉπ ππ π ,π = 0,1,β¦ ,π β 1.
β’ Calculate the IDFT of: ππ π = ππ π π»(π), π = 0,1,β¦ ,π β 1 to get the block output
π¦π π ,π = 0,1,β¦ ,π β 1.
5. Discard the first π β 1 points of each output block π¦π π . Concatenate the remaining (i.e., last) πΏ samples of each block π¦π π to form the output π¦ π .
![Page 66: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/66.jpg)
Overlap-save method
πΏ πΏ πΏ
(π β 1) zeros(π β 1) point overlap
(π β 1) point overlap
πΌπππ’π‘ π πππππ ππππππ
ππ’π‘ππ’π‘ π πππππ ππππππ
Discard (π β 1) points
![Page 67: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/67.jpg)
Overlap-save method
![Page 68: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/68.jpg)
Overlap-save method
![Page 69: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/69.jpg)
Applications
β’ Convolutional neural networks
β’ Signal processing
Signal filtering
Signal restoration
Signal deconvolution
β’ Signal analysis
Time delay estimation
Distance calculation (e.g., sonar)
1D template matching
![Page 70: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/70.jpg)
Convolutional Neural Networks
β’ Two step architecture:
β’ First layers with sparse NN connections: convolutions.
β’ Fully connected final layers.
β’ Need for fast convolution calculations.
Convergence of machine learning and signal processing
![Page 71: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/71.jpg)
Deep Learning Frameworks
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
![Page 72: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/72.jpg)
Convolutions in DL Frameworks
β’ All 5 frameworks work with cuDNN as backend.
β’ cuDNN unfortunately not open source.
β’ cuDNN supports FFT and Winograd convolutions.
Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs
![Page 73: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/73.jpg)
Basic Exercises
1. Implement the definition of the 1D linear convolution algorithm
in C/C++ or Python.
2. Implement the definition of the 1D DFT algorithm in C/C++ or
Python.
3. Implement the 1D cyclic convolution algorithm for length π =2π in C/C++ using FFT.
4. Derive analytically the 1D Winograd cyclic convolution
algorithm for length π = 3.
5. Implement the 1D Winograd cyclic convolution algorithm for
length π = 3 in C/C++ or Python.
![Page 74: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/74.jpg)
Advanced Exercises
1. Implement the 1D linear convolution for filter length M and
signal length πΏ in C/C++ using your own myFFT routine.
2. Implement a nested convolution algorithm for length π = π1π2in C/C++.
3. Study the properties of cyclotomic polynomials. Derive
analytically the 1D Winograd cyclic convolution algorithm for
length π = 4,π = 6 or π = 9.
4. Implement the 1D linear convolution algorithm for filter length
M and a large signal length πΏ using overlap-add in a) C/C++ or
b) CUDA.
![Page 75: Fast 1D Convolution Algorithms - Drone Research](https://reader033.vdocuments.site/reader033/viewer/2022043006/626b4d05324c253775534aeb/html5/thumbnails/75.jpg)
References
β’ Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, p. 436, 2015.
β’ V. Oppenheim, Discrete-time signal processing, Pearson Education India, 1999.
β’ I. Pitas, Digital image processing algorithms and applications. John Wiley & Sons, 2000.
β’ S. Winograd, Arithmetic complexity of computations, vol. 33. Siam, 1980.
β’ J. H. McClellan and C. M. Rader, Number theory in digital signal processing," 1979.
β’ H. J. Nussbaumer, Fast Fourier transform and convolution algorithms.
β’ Springer Science & Business Media, 1981.
β’ I. Pitas and M. Strintzis, Multidimensional cyclic convolution algorithms with minimal multiplicative
complexity," IEEE transactions on acoustics, speech, and signal processing, vol. 35, no. 3, pp. 384-
390, 1987.
β’ A. Lavin and S. Gray, Fast algorithms for convolutional neural networks, Proceedings CVPR, pp.
4013-4021, 2016.