![Page 1: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/1.jpg)
Fast Bilinear Algorithms for Convolution
Caleb Ju
CS598EVS
March 5, 2020
![Page 2: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/2.jpg)
Convolution
The discrete convolution between vectors f ∈ Rr and g ∈ Rn is
yk =∑i
figk−i .
View as a matrix–vector product between matrix T and vector f ,
yk =∑i
gk−i fi =∑j
Tk,j · fj = Tf .
What does the matrix Tlook like?
Denote as T〈g ,r〉, whichis a Toeplitz matrix,whereT〈g ,r〉 ∈ Rn+r−1×r .
T〈g ,r〉 =
g0 0 · · · 0...
. . ....
gn−1. . . 0
. . . g0. . .
...gn−1
![Page 3: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/3.jpg)
Convolution and its Variants
Linear convolution is
yk =
min(k,r−1)∑i=max(0,k−n+1)
figk−i .
The bounds ensure that if we go past either end of vector g , wedon’t compute.
We also have cyclic convolution,
yk =r−1∑i=0
fig(k−i) mod n.
Can also derive correlation,
yk =r−1∑i=0
figk+i .
![Page 4: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/4.jpg)
Applications of Convolution
String matching (Clifford and Clifford, 2007)
Let the pattern be p ∈ Σm and the text be t ∈ Σn.
m−1∑j=0
(pj − ti+j)2 =
m−1∑j=0
(p2j − 2pj ti+j + t2i+j) , ∀ 0 ≤ i ≤ n −m.
Image Processing (Convolutional Neural Network)
Given K filters in tensor F of size r × r , N input images in tensorG of size n × n. Seek to sum over all H channels,
yikxy =H∑
c=1
r∑v=1
r∑u=1
fkcuv · gi ,c,x+u,y+v .
Other applications: cosmological simulation, solutions to partialdifferential equations, signal processing, integer multiplication, . . .
![Page 5: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/5.jpg)
Fast Algorithms for Computing Convolution
A direction computation has O(n2) cost.
Consider complex multiplication,
x × y = (a + bi)× (c + di) = (ac − bd) + (ad + bc)i
= (ac − bd) +(ac + bd − (a− b)(c − d)
)i .
Karatsuba’s Algorithm applies this recursively for O(nlog2(3)) cost.Can also be solved by the discrete Fourier transform,
a ∗ b = IDFT(DFT(a) DFT(b)
).
Using the fast Fourier transform (FFT), can compute linearconvolution in O(nlogn) time.
Other algorithms: Winograd’s minimal filtering method, matrixmultiplication, fast symmetric multiplication
![Page 6: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/6.jpg)
Derivation of Bilinear Algorithms
Recall a bilinear algorithm is
c = F (C)(
(F (A)Ta) (F (B)Tb))
=∑i
∑j
tijkajbk .
The discrete linear convolution of f and g by
yk =
min(k,r−1)∑i=max(0,k−n+1)
fi · gk−i =∑i ,j
tijk figj ,
The tensor T is defined by tijk =
1 : i + j − k = 0
0 : otherwise.
![Page 7: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/7.jpg)
Convolution is Multiplication
How can we derive fast bilinear algorithms for convolution?
Define polynomials a(x) = a0 + a1x + · · ·+ ar−1xr−1 and
b(x) = b0 + b1x + · · · bn−1xn−1. Their product is
c(x) = a(x)b(x) =r+n−2∑k=0
min(k,n−1)∑i=max(0,k−n+1)
(ai · bk−i )xk .
The coefficients of c(x) = c0 + c1x + . . .+ cr+n−2xr+n−2 are
determined by linear convolution.
![Page 8: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/8.jpg)
Convolution as Multiplication
How can we compute c(x)? Suppose we know the value of c(xi )at some nodes x0, . . . , xi , . . . xR−1 and R = deg c(x) + 1. Letcoefficients of c(x) be c . We can get c by
c(xi ) =R−1∑k=0
xki ck = Vi ,:c where V =
x00 . . . xR−10...
...
x0R−1 . . . xR−1R−1
∈ CR×R .
How can we compute c(xi )? Recall c(x) = a(x)b(x). Therefore,
c(xi ) = a(xi )b(xi ).
How can we compute a(xi )? Let a be the coefficients ofpolynomial a(x) (and b for b(x)). Then, computing a(xi ) is aninner product,
a(xi ) =r−1∑k=0
xki ak = Vi ,:a where V is the first r columns of V .
![Page 9: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/9.jpg)
Toom-Cook Algorithm
Toom-Cook
1. Evaluate α = V a and β = V b2. Compute the products ν = α β
3. Interpolate by solving the linear system Vc = ν
Can prescribe this three-step computation as the following bilinearalgorithm,
c = V−1(2n−1×2n−1)
(V(2n−1×n)a V(2n−1×n)b
).
where V is a Vandermonde matrix, V =
x00 . . . xR−10...
...
x0R−1 . . . xR−1R−1
.
![Page 10: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/10.jpg)
Discrete Fourier Transform
(a) Chebyshev Nodes(b) Equispaced Nodes on Unit Cirlce
Discrete Fourier TransformLet ω(n) = exp(−2πi/n), the nth primitive root of unity. Set the
nodes of V as [ω0(n), ω(n), . . . , ω
r−1(n) ]. Then, V is the Fourier matrix
(and V−1 is the inverse Fourier matrix), leading to bilinearalgorithm,
c = F−1(2n−1×2n−1)
(F(2n−1×n)a F(2n−1×n)b
).
![Page 11: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/11.jpg)
Alternative Bilinear Algorithms
The Toom-Cook method and fast Fourier transform work well forsmall and large convolution problems respectively.
I The Toom-Cook is numerically inaccurate for convolutions ofsize greater than four
I The FFT has significant hidden constants
Now we examine alternative algorithms that offer trade-offsbetween computational efficiency and numerical accuracy.
![Page 12: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/12.jpg)
Modular Polynomial Multiplication
Let’s revisit convolution as a polynomial multiplication problem,
c(x) = a(x)b(x) =2n∑k=0
min(k,n−1)∑i=max(0,k−n+1)
(ai · bk−i )xk .
What is the remainder of c(x) divided polynomial M wheredegM > deg c(x)?
c(x) = r(x) ≡ c(x) (mod M).
What if we use a polynomial m where degm ≤ deg c(x)?
c(x) 6= r(x) ≡ c(x) (mod m).
![Page 13: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/13.jpg)
Modular Polynomial Multiplication
Why use modulo polynomial multiplication? Modulomultiplication decreases size of inputs.
c(x) ≡ a(x)b(x) ≡(a(x) mod m
)(b(x) mod m
)(mod m).
However, this leads to an answer that is congruent to the actualproduct, i.e. not the solution we actually want.
Can we compute the polynomial multiplication using modulopolynomial multiplication?
Yes, using the Chinese Remainder Theorem.
![Page 14: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/14.jpg)
Chinese Remainder Theorem
TheoremLet m(1), . . . ,m(k) be coprime integers and M =
∏ki m
(i). Givenremainders r (1), . . . , r (k) where 0 ≤ r (i) < m(i), the ChineseRemainder Theorem (CRT) asserts that there exists a uniqueinteger x (modulo M) such that
x ≡ r (i) (mod m(i)) ∀i ∈ [k].
Further, this mapping between integer and remainders is a ringisomorphism (structure preserving).
Example
Let m(1) = 3,m(2) = 4, and M = 12. Let x = 7 (mod M), and itsremainders,
x ≡ r (1) ≡ 1 (mod 3) and x ≡ r (2) ≡ 3 (mod 4).
![Page 15: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/15.jpg)
Chinese Remainder Theorem: Example
Let x ≡ 7 (mod 12). Seek to compute (7× 4) (mod 12).
Figure: Ring Isomorphism
![Page 16: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/16.jpg)
Chinese Remainder Theorem: Example
x ≡ r (1) ≡ 1 (mod 3) and x ≡ r (2) ≡ 3 (mod 4).
Figure: Ring Isomorphism
![Page 17: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/17.jpg)
Chinese Remainder Theorem: Example
r ′(1) ≡ r (1) × 4 ≡ 4 ≡ 1 (mod 3) and r ′(2) ≡ r (2) × 4 ≡ 0 (mod 4).
Figure: Ring Isomorphism
![Page 18: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/18.jpg)
Chinese Remainder Theorem: Example
y ≡ 28 ≡ 4 (mod 12) satisfiesr ′(1) ≡ 1 (mod 3) and r ′(2) ≡ 0 (mod 4).
Figure: Ring Isomorphism
![Page 19: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/19.jpg)
Modular Polynomial Multiplication
Akin to interpolation, modular polynomial multiplication can becomputed via
I Compute the remainders of a(x) and b(x) for a series ofcoprime divisors m(i)
I Multiply the corresponding remainders (can use normalpolynomial multiplication)
I Map remainders back to its (unique) polynomial
How do we recover the polynomial from its remainder?The Chinese Remainder Theorem also tells us how to do so.
![Page 20: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/20.jpg)
Chinese Remainder Theorem (part 2)
TheoremRecall the polynomial divisors m(i) are coprime, M =
∏i m
(i), andwe have a set of remainders r (i). To solve for x , we compute
x =( k∑
i=1
r (i)M(i)N(i))
mod M,
where M(i) = M/m(i) and N(i) and n(i) are arbitrary polynomialssatisfying Bezout’s identity,
M(i)N(i) + m(i)n(i) = 1.
![Page 21: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/21.jpg)
Chinese Remainder Theorem (part 2): Example
Coprimepolynomial divisorsm(i),
whereM =
∏i m
(i),
andM(i) = M/m(i).Let N(i), n(i) suchthat ∀i M(i)N(i) +m(i)n(i) = 1.
Solution is x =( k∑i=1
r (i)M(i)N(i))
mod M.
Compute product y = (4×7) (mod 12).
Have M(1) = 4, m(1) = 3, M(2) = 3,m(2) = 4, and M = 12,
with remainders r ′(1) ≡ 1 (mod 3) andr ′(2) ≡ 0 (mod 4).
See 4N(1) + 3n(1) = 1 and3N(2) + 4n(2) = 1 are satisfied withN(1) = 1, n(1) = −1, N(2) = −1, andn(2) = 1.
So we have∑i
r (i)M(i)N(i) = 1(4)(1) + 0(3)(−1)
= 4 ≡ 28 (mod 12).
![Page 22: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/22.jpg)
Chinese Remainder Theorem (part 2)
x =( k∑
i=1
r (i)M(i)N(i))
mod M
Why does this work?
Since M(i)N(i) = 1−m(i)n(i), then for a fixed i ,
x =∑j
r (j)M(j)N(j) = r (i)(1−m(i)n(i)︸ ︷︷ ︸=M(i)N(i)
) = r (i) (mod m(i))
The Chinese Remainder tells us there is bijection betweenremainders and the original polynomial. Therefore, any polynomialsatsifying the remainder equivalences is equivalent to the originalpolynomial (modulo M)!
![Page 23: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/23.jpg)
Modular Polynomial Multiplication
The Chinese Remainder Theorem required thatM(i)N(i) + m(i)n(i) = 1 for all i . Does there even exist N(i), n(i)?
Theorem (Bezout’s identity)
Let p and q be coprime polynomials (do not share any roots), thenthere exists polynomials u and v such that pu + qv = 1.
Since M(i) and m(i) are coprime, there exists polynomials N(i) andn(i) such that
M(i)N(i) + m(i)n(i) = 1.
![Page 24: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/24.jpg)
Winograd Convolution Algorithm
Let f ∈ Rr and g ∈ Rn be the vectors we seek to convolve. Recallthat we first compute the remainders,
f = r(i)(f )(mod m(i)) and g = r
(i)(g)(mod m(i)).
Next, we compute the product of remainders using a convolutionalgorithm,
r (i) = (r(i)(g) ∗ r
(i)(g))(mod m(i)).
We use the Chinese remainder theorem to recover the uniquesolution,
y =(∑
r (i) ∗M(i) ∗ N(i))(mod M),
where M(i) = M/m(i) and M(i)N(i) + m(i)n(i) = 1.
![Page 25: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/25.jpg)
Toom-Cook vs. Winograd Convolution Algorithm
Toom-Cook
1. Evaluate at a set ofunique integer points
2. Compute the element-wisemultiplication (these areevaluated points of theproduct)
3. Interpolate to recover theproduct polynomial
Winograd ConvolutionAlgorithm
1. Evaluate the remainderwith the set of coprimepolynomial divisors m(i)
2. Compute the element-wisepolynomial multiplication(via convolution)
3. Use the CRT to recover theproduct polynomial moduloM
![Page 26: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/26.jpg)
Toom-Cook vs. Winograd Convolution Algorithm
Toom-Cook
1. Evaluate at a set ofunique integer points
2. Compute the element-wisemultiplication (these areevaluated points of theproduct)
3. Interpolate to recover theproduct polynomial
Winograd ConvolutionAlgorithm
1. Evaluate the remainderwith the set of coprimepolynomial divisors m(i)
2. Compute the element-wisepolynomial multiplication(via convolution)
3. Use the CRT to recover theproduct polynomial moduloM
![Page 27: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/27.jpg)
Evaluate the Remainder of a Polynomial Division
Denote the coefficients of an arbitrary polynomial p as p, e.g.p = 3x2 − 1 is represented as p =
[−1 0 3
]Let p and m be polynomials where deg(m) ≤ deg(p).
Modulo Operation
LemmaLet r = p (mod m), with d = deg p. There exists a matrix X〈m,d〉such that r = X〈m,d〉p.
![Page 28: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/28.jpg)
Evaluate the Remainder of a Polynomial Division
LemmaLet r = p (mod m), with d = deg p. There exists a matrix X〈m,d〉such that r = X〈m,d〉p.
Proof.We know p = mq + r for some polynomial q. Then,
T〈m,r〉q + r =
m0 . . . 0...
. . .
mdegm−1 m0
. . ....
mdegm−1
q + r =
[UL
]q +
[r0
]=
[p(A)
p(B)
].
Solving both systems, we get r = −UL−1p(B) + p(A).
![Page 29: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/29.jpg)
Solve Bezout’s identity
LemmaWrite MN + mn = 1 as[
T〈M,degm−1〉 T〈m,degM−1〉]︸ ︷︷ ︸
A
[Nn
]=[1 0 . . .
]T
Proof.Show that the matrix A is invertible.
![Page 30: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/30.jpg)
Winograd Convolution Algorithm
Theorem (Winograd Convolution Algorithm)
Given coprime polynomials m(1),m(2) such that M = m(1)m(2) anddegM = n + r − 1, bilinear algorithms (A(i),B(i),C (i)) for aconvolution of dimension degm(i) for i ∈ 1, 2, then (A,B,C ) isa convolution for vectors of dimension r and n, where
A =[XT〈m(1),r−1〉A
(1) , XT〈m(2),r−1〉A
(2))],
B =[XT〈m(1),n−1〉B
(1) , XT〈m(2),n−1〉B
(2))], and
C =[C (1) , C (2)
],
with C (i) = X〈M,degM+degm(i)−2〉T〈e(i),degm(i)〉X〈m(i),2 degm(i)−1〉C(i)
and polynomial e(i) = M(i)N(i) mod M is defined from Bezout’sidentity.
![Page 31: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/31.jpg)
Rank of Winograd Convolution Algorithm
Given f ∈ Rr and g ∈ Rn, the solution y ∈ Rr+n−1. Therefore,select M to be a (n + r − 1)-degree polynomial.
Remark The bilinear rank R of the Winograd convolutionalgorithm with polynomial divisors m(1), . . . ,m(k) is
k∑i=1
(2 degm(i) − 1).
Observation Increasing the bilinear rank of the Winogradconvolution with (at least one) superpolynomial divisor (degreegreater than one) improves the numerical accuracy of convolution.
![Page 32: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/32.jpg)
Nested and Multidimensional Convolution
Given F ,G ∈ Rn×n, a 2D convolution is defined as
yxy =r∑
i=0
r∑j=0
fijgx+i ,y+j =∑i
∑j
fijguv .
Can nest the tensors,
yab =r∑
i=0
r∑j=0
n∑u=0
n∑v=0
t(A)ixu t
(B)jyv fijguv .
Equivalently, we have the following nested bilinear algorithm,
vec(Y ) = (C ⊗ C )[(
(A⊗ A)T vec(F ))((B ⊗ B)T vec(G )
))],
or otherwise,
Y = C[(ATFA) (BTGB)
]CT .
![Page 33: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/33.jpg)
Overlap Add
We can use multidimensional convolution to solve 1D convolutionproblems.Let the recomposition matrix be
Q(γ,η) =
Iη−11
Iη−1 Iη−11
. . .
Iη−1 Iη−11
Iη−1
.
LemmaLet Y = F ∗ G , where F , G ∈ Rγ×η. Then if f = vec(F ),g = vec(G ), f ∗ g = vec(Q(γ,η)Y ).
![Page 34: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/34.jpg)
Numerical Accuracy
Figure: 1D convolution error
![Page 35: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/35.jpg)
Numerical Accuracy
Figure: 2D convolution error
![Page 36: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/36.jpg)
Properties of Bilinear Algorithms
Matrix Interchange
I How can we build new algorithms with the sameencoding/decoding matrices?
I Can we design new algorithms with the same complexity assimilar bilinear algorithms?
Asymptotic Complexity
I The role of bilinear rank.
I How can we nest bilinear algorithms?
Lower BoundsI What are lower bounds for bilinear algorithms?
![Page 37: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/37.jpg)
Matrix Interchange
Recall the definition of the discrete convolution and correlationalgorithm,
yk =r−1∑i=0
figk−i and yk =r−1∑i=0
figk+i .
Theorem (Matrix Interchange)
Let the bilinear algorithm for discrete convolution f and g bedefined as C
((AT f ) (BTg)
). The correlation algorithm with
output size m = n is
B(
(AT f ) (CTg)).
![Page 38: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/38.jpg)
Matrix Interchange
Let the bilinear algorithm for discrete convolution f and g bedefined as C
((AT f ) (BTg)
). The correlation algorithm with
output size m = n is
B(
(AT f ) (CTg)).
Proof.The tensor T in yk =
∑ijtijk figj is 1 if and only if i + j − k = 0.
Moreover, the tensor T corr in yk =∑ijtcorrijk figj is one if and only if
i − j + k = 0.
We see the role of index j (belonging to encoding matrix B) andindex k (belonging to decoding matrix C ) are swapped.
![Page 39: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/39.jpg)
Bilinear Rank
We will denote the bilinear algorithm,
yk =R−1∑l=0
ckl
( r−1∑i=0
ail fi
)( n−1∑j=0
bjlgj
), i.e., y = C
[(AT f )(BTg)
].
with the triplet (A,B,C ). The variable R determines the numberof element-wise multiplications.
Theorem (Correlation Rank Lower Bound (Winograd, 1980))
Given a filter of size r and output of size m, the minimum rank ofa correlation algorithm is m + r − 1.
Corollary
Given a filter of size r and input of size n, the minimum rank of alinear convolution algorithm is n + r − 1.
![Page 40: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/40.jpg)
Asymptotic Complexity
Like in matrix multiplication, we can recursively compute a largerconvolution using a smaller one.
Given a convolution algorithm that divides the problem by size band has bilinear rank R, the cost of the algorithm is
T (n) = R · T (n/b) + (c · b) · n/b= c · nlogb(R).
![Page 41: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/41.jpg)
Error Bounds
Convolution is an ill-posed problem
Consider the cyclic convolution of1−11−1
...
∗cyclic
1111...
=
0000...
.
Therefore, we will use absolute error rather than relative error.
![Page 42: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/42.jpg)
Error Bounds
Theorem (1D bilinear algorithm convolution error)
Given inputs f ∈ Rr and g ∈ Rn, the absolute error of the bilinearalgorithm
‖δy‖ ≤ 2(‖C‖ · ‖A‖ · ‖B‖ · ‖f ‖ · ‖g‖
)ε+ O(ε2),
where ‖ · ‖ is the 2-norm.
Corollary
A d-nested convolution with F ∈ Rr×···×r and G ∈ Rn×···×n hasan error of
||δY || ≤ 2(||C ||d · ||A||d · ||B||d · ||vec(F)|| · ||vec(G)||
)ε+ O(ε2).
![Page 43: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/43.jpg)
Error Bounds
Proof.We can use the fact ||Ax || ≤ ||A|| · ||x || for the encoding anddecoding step. To bound the error from the element-wise product,we use the fact that
‖x y‖2 =∑i
|xiyi |2 ≤(∑
i
|xi |2)(∑
i
|yi |2)
= ‖x‖2 · ‖y‖2,
which leads to ‖x y‖ ≤ ‖x‖ · ‖y‖.
![Page 44: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/44.jpg)
Error Mitigation
Theorem (Pan 2016)
For a Vandermonde matrix V with s as the large magnitude node,the condition number is proportional to
κ(V ) = Ω(sn−1√
n
).
Need node to find ways to either decrease κ(V ) or use a differentmatrix.
![Page 45: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/45.jpg)
Error Mitigation
Better node choiceNumerical accuracy of interpolation improves buy better nodechoices
I Chebyshev nodes
I Brute force search
Can combine small convolution algorithms into larger convolutionalgorithms. Given matrices A,B where C = A⊗ B, we have
κ(C ) = κ(A)κ(B).
Instead of having ||A|| = Ω(nn), we have ||A|| = Ω(nn
1/d)
.
![Page 46: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/46.jpg)
Numerical Accuracy
Figure: 1D convolution error
![Page 47: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/47.jpg)
Numerical Accuracy
Figure: 2D convolution error
![Page 48: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/48.jpg)
Arithmetic Complexity
Let nnz(A) be the number of nonzeros, additions a(A) the numberof additions needed, and m(A) the number of multiplications. Wehave
a(A) ≤(nnz(A)−#row(A)
)and m(A) ≤ nnz(A).
Therefore, the overall cost of a convolution is
a(F ) ≤ a(A)+a(B)+a(C ) and m(F ) ≤ m(A)+m(B)+m(C )+R.
![Page 49: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/49.jpg)
Final Thoughts
Can also use bilinear algorithms to
I Find communication lower bounds
I Discover alternative bilinear algorithms
Concluding Thoughts
We have derived a family of fast bilinear algorithms.
We analyzed the error bounds and arithmetic costs for the differentalgorithms, esepcially bounded vs. unbounded algorithms.
![Page 50: Fast Bilinear Algorithms for Convolution · Convolution The discrete convolution between vectors f 2Rr and g 2Rn is y k = X i f ig k i: View as a matrix{vector product between matrix](https://reader033.vdocuments.site/reader033/viewer/2022052300/6054125daa7ac4411970a253/html5/thumbnails/50.jpg)
Thanks!
Remaining Questions
I Communication lower bounds for nested convolutionalgorithms
I Error lower bounds with node and polynomial divisors choice
I Do polynomial and interpolation-based algorithms cover theentire class of fast bilinear algorithms?
More information covered in the paper,
Caleb Ju and Edgar Solomonik. Derivation and analysis of fastbilinear algorithms for convolution, arXiv:1910.13367 [math.NA],October 2019.