paolo bientinesi - umu.se
TRANSCRIPT
![Page 1: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/1.jpg)
Teaching computers linear algebra
Paolo BientinesiAachen Institute for Computational Engineering Science
RWTH Aachen University
January 31, 2018Friedrich-Schiller-Universitat Jena
![Page 2: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/2.jpg)
30-second talk
I Computers are great with numbers (scalars)
I Scientists work with vectors, matrices, and tensors
I Computers are not great with those
I Why? What can we do about it?
2 / 19
![Page 3: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/3.jpg)
30-second talk
I Computers are great with numbers (scalars)
I Scientists work with vectors, matrices, and tensors
I Computers are not great with those
I Why? What can we do about it?
2 / 19
![Page 4: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/4.jpg)
30-second talk
I Computers are great with numbers (scalars)
I Scientists work with vectors, matrices, and tensors
I Computers are not great with those
I Why? What can we do about it?
2 / 19
![Page 5: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/5.jpg)
30-second talk
I Computers are great with numbers (scalars)
I Scientists work with vectors, matrices, and tensors
I Computers are not great with those
I Why? What can we do about it?
2 / 19
![Page 6: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/6.jpg)
30-second talk
I Computers are great with numbers (scalars)
I Scientists work with vectors, matrices, and tensors
I Computers are not great with those
I Why? What can we do about it?
2 / 19
![Page 7: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/7.jpg)
The world of scientific computing
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics → Discrete mathematics → . . . → Computers
3 / 19
![Page 8: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/8.jpg)
The world of scientific computing
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics → Discrete mathematics → . . . → Computers
3 / 19
![Page 9: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/9.jpg)
The world of scientific computing
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics → Discrete mathematics → . . . → Computers
3 / 19
![Page 10: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/10.jpg)
Roadmap
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics
→ Discrete mathematics → . . . → Computers
3 / 19
![Page 11: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/11.jpg)
Roadmap
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics → Discrete mathematics
→ . . . → Computers
3 / 19
![Page 12: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/12.jpg)
Roadmap
∂
∂t(ρ u)+∇· (ρ u⊗ u) = −∇· pI+∇τ+ρg
Cauchy momentum eqn.
VLJ = 4ε[(
σr
)12 − (σr )6]Lennard-Jones potential
i~∂
∂tΨ(r, t) =
[−2~2µ∇2 + V (r, t)
]Ψ(r, t)
Schrodinger eqn....
Continuous mathematics → Discrete mathematics → . . . → Computers
3 / 19
![Page 13: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/13.jpg)
Linear Algebra expressions
x := A(BTB + ATRTΛRA)−1BTBA−1yexponential
transient excision
∀i ∀j bij :=(XTi M−1j Xi
)−1XTi M1
j yj GWAS{C† := PCPT + Q
K := C†HT (HC†H
T )−1
probabilisticNordsieck method
for ODEs
E := Q−1U(I + UTQ−1U)−1UTL1-norm
minimization onmanifolds
xk|k−1 = Fxk−1|k−1 + BuPk|k−1 = FPk−1|k−1F
T + Qxk|k = xk|k−1 + Pk|k−1H
T × (HPk|k−1HT + R)−1(zk − Hxk|k−1)
Pk|k = Pk|k−1 − Pk|k−1HT × (HPk|k−1H
T + R)−1HPk|k−1
Kalman filter
4 / 19
![Page 14: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/14.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
5 / 19
![Page 15: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/15.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
5 / 19
![Page 16: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/16.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
5 / 19
![Page 17: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/17.jpg)
6 / 19
![Page 18: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/18.jpg)
Back in the 50s: by hand
I Assembly code (Mnemonics → machine code)
I “simple” architectures: no memory hierarchy
Cost(Alg) ≡ #operations(Alg)
computer efficiency! . . . but human productivity?
7 / 19
![Page 19: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/19.jpg)
Back in the 50s: by hand
I Assembly code (Mnemonics → machine code)
I “simple” architectures: no memory hierarchy
Cost(Alg) ≡ #operations(Alg)
computer efficiency! . . . but human productivity?
7 / 19
![Page 20: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/20.jpg)
Back in the 50s: by hand
I Assembly code (Mnemonics → machine code)
I “simple” architectures: no memory hierarchy
Cost(Alg) ≡ #operations(Alg)
computer efficiency!
. . . but human productivity?
7 / 19
![Page 21: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/21.jpg)
Back in the 50s: by hand
I Assembly code (Mnemonics → machine code)
I “simple” architectures: no memory hierarchy
Cost(Alg) ≡ #operations(Alg)
computer efficiency! . . . but human productivity?
7 / 19
![Page 22: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/22.jpg)
Since 1957: compiler(s)
I [1954–1957]: FORTRAN (IBM, John Backus)“Specifications for the IBM Mathematical FORmula TRANslating system, FORTRAN”
→ . . .
→ ACM Turing award (1977)
I Pros: No more Assembly! → increased productivity
A gigantic body of work on compilers
I Cons: Often not the “right” level of abstraction
8 / 19
![Page 23: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/23.jpg)
Since 1957: compiler(s)
I [1954–1957]: FORTRAN (IBM, John Backus)“Specifications for the IBM Mathematical FORmula TRANslating system, FORTRAN”
→ . . . → ACM Turing award (1977)
I Pros: No more Assembly! → increased productivity
A gigantic body of work on compilers
I Cons: Often not the “right” level of abstraction
8 / 19
![Page 24: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/24.jpg)
Since 1957: compiler(s)
I [1954–1957]: FORTRAN (IBM, John Backus)“Specifications for the IBM Mathematical FORmula TRANslating system, FORTRAN”
→ . . . → ACM Turing award (1977)
I Pros: No more Assembly! → increased productivity
A gigantic body of work on compilers
I Cons: Often not the “right” level of abstraction
8 / 19
![Page 25: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/25.jpg)
Since the 70s: libraries
I Identification, standardization, optimization of building blocks
Libraries: LINPACK, BLAS, LAPACK, FFTW, . . .
Convenience, portability, separation of concerns
I [80s–early 90s]: Memory hierarchy
Caching, locality, prefetching, . . . ⇒ Cost(Alg) 6≡ #operations(Alg)
Libraries: necessity
9 / 19
![Page 26: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/26.jpg)
Since the 70s: libraries
I Identification, standardization, optimization of building blocks
Libraries: LINPACK, BLAS, LAPACK, FFTW, . . .
Convenience, portability, separation of concerns
I [80s–early 90s]: Memory hierarchy
Caching, locality, prefetching, . . . ⇒ Cost(Alg) 6≡ #operations(Alg)
Libraries: necessity
9 / 19
![Page 27: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/27.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
LINPACK BLAS LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 19
![Page 28: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/28.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
LINPACK BLAS LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 19
![Page 29: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/29.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
LINPACK BLAS LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 19
![Page 30: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/30.jpg)
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
LINPACK BLAS LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 19
![Page 31: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/31.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 32: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/32.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 33: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/33.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 34: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/34.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 35: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/35.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 36: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/36.jpg)
Linear Algebra Mapping Problem (LAMP)
I E : a list of assignments vari := EXPi
I K: a list of available computational kernels (BLAS, LAPACK, . . . )
I M: a metric (FLOPs, data movement, stability, time)
LAMP:Find a decomposition of the expressions E in terms of the kernels K,optimal according to the metric M.
I Find a decomposition → easy
I Achieve optimality → NP complete
11 / 19
![Page 37: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/37.jpg)
b := (XTX )−1XT y
b := ((QR)TQR)−1(QR)T y
b := R−1QT y
b := M−1XT y
Algorithm 3
Algorithm 1 Algorithm 2
Algorithm “-” Algorithm 4
M := XTX
b := (R−1QT )y
b := M−1(XT y) b := (M−1XT )y
b := (M−1)XT y
12 / 19
![Page 38: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/38.jpg)
b := (XTX )−1XT y
b := ((QR)TQR)−1(QR)T y
b := R−1QT y
b := M−1XT y
Algorithm 3
Algorithm 1 Algorithm 2
Algorithm “-” Algorithm 4
QR = qr(X )M := XTX
b := (R−1QT )y
b := M−1(XT y) b := (M−1XT )y
b := (M−1)XT y
12 / 19
![Page 39: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/39.jpg)
b := (XTX )−1XT y
b := ((QR)TQR)−1(QR)T y
b := R−1QT y
b := M−1XT y
Algorithm 3
Algorithm 1 Algorithm 2
Algorithm “-”
Algorithm 4
QR = qr(X )M := XTX
b := (R−1QT )y
b := M−1(XT y) b := (M−1XT )y
b := (M−1)XT y
12 / 19
![Page 40: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/40.jpg)
b := (XTX )−1XT y
b := ((QR)TQR)−1(QR)T y
b := R−1QT y
b := M−1XT y
Algorithm 3
Algorithm 1 Algorithm 2
Algorithm “-”
Algorithm 4
QR = qr(X )
symbolic simplifications
M := XTX
b := (R−1QT )y
b := M−1(XT y) b := (M−1XT )y
b := (M−1)XT y
12 / 19
![Page 41: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/41.jpg)
b := (XTX )−1XT y
b := ((QR)TQR)−1(QR)T y
b := R−1QT y
b := M−1XT y
Algorithm 3
Algorithm 1 Algorithm 2
Algorithm “-” Algorithm 4
QR = qr(X )
symbolic simplifications
M := XTX
b := R−1(QT y) b := (R−1QT )y
b := M−1(XT y) b := (M−1XT )y
b := (M−1)XT y
12 / 19
![Page 42: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/42.jpg)
Many solutions to a well-known problem
High-level languages
I Matlab
I R
I Julia
I Mathematica
I . . .
Libraries
I Armadillo
I Blaze
I Blitz
I Eigen
I . . .
I NumPy
human productivity! . . . but computer efficiency?
13 / 19
![Page 43: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/43.jpg)
Many solutions to a well-known problem
High-level languages
I Matlab
I R
I Julia
I Mathematica
I . . .
Libraries
I Armadillo
I Blaze
I Blitz
I Eigen
I . . .
I NumPy
human productivity!
. . . but computer efficiency?
13 / 19
![Page 44: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/44.jpg)
Many solutions to a well-known problem
High-level languages
I Matlab
I R
I Julia
I Mathematica
I . . .
Libraries
I Armadillo
I Blaze
I Blitz
I Eigen
I . . .
I NumPy
human productivity! . . . but computer efficiency?
13 / 19
![Page 45: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/45.jpg)
A closer look
α, β, x ∈ R, A,B,C ,X ∈ Rm,n
Commutativity(∗)x := α ∗ β ∗ α−1 ⇒ x := β
X := A ∗ B ∗ A−1 6⇒ x := B
Associativity(∗)
X := (A ∗ B) ∗ C ≡ X := A ∗ (B ∗ C )
ButCost ((A ∗ B) ∗ C ) 6≡ Cost (A ∗ (B ∗ C ))
14 / 19
![Page 46: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/46.jpg)
A closer look
α, β, x ∈ R, A,B,C ,X ∈ Rm,n
Commutativity(∗)x := α ∗ β ∗ α−1 ⇒ x := β
X := A ∗ B ∗ A−1 6⇒ x := B
Associativity(∗)
X := (A ∗ B) ∗ C ≡ X := A ∗ (B ∗ C )
ButCost ((A ∗ B) ∗ C ) 6≡ Cost (A ∗ (B ∗ C ))
14 / 19
![Page 47: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/47.jpg)
Challenges
I Parenthesisation
A B c
(AB)c O(n3) A(Bc) O(n2)
15 / 19
![Page 48: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/48.jpg)
Challenges
I Parenthesisation
A B c
(AB)c O(n3) A(Bc) O(n2)
⇒ Matrix Chain Algorithm
15 / 19
![Page 49: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/49.jpg)
Challenges
I Parenthesisation
In practice:
I Unary operators: transposition, inversion (X := ABTC−TD + . . . )
I Overlapping kernels (e.g., L← L−1, X = A−1B)
I Decompositions (e.g., A→ QTDQ, A→ LU)
I Properties & specialized kernels (GEMM, TRMM, SYMM, ...)
15 / 19
![Page 50: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/50.jpg)
Challenges
I Parenthesisation
In practice:
I Unary operators: transposition, inversion (X := ABTC−TD + . . . )
I Overlapping kernels (e.g., L← L−1, X = A−1B)
I Decompositions (e.g., A→ QTDQ, A→ LU)
I Properties & specialized kernels (GEMM, TRMM, SYMM, ...)
⇒ Generalized Matrix Chain Algorithm
15 / 19
![Page 51: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/51.jpg)
Challenges – not all flops were created equal
I Metric: #FLOPs vs. execution time . . . vs. numerical stability
argminA
( FLOPs(A) ) 6= argminA
( time(A) )
⇒ Performance prediction (efficiency)
15 / 19
![Page 52: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/52.jpg)
Challenges – not all flops were created equal
I Metric: #FLOPs vs. execution time . . . vs. numerical stability
argminA
( FLOPs(A) ) 6= argminA
( time(A) )
⇒ Performance prediction (efficiency)
15 / 19
![Page 53: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/53.jpg)
Challenges – not all flops were created equal
I
0 1000 2000 3000 4000 5000 6000i
0
5
10
15
20
perfo
rman
ce [G
flops
/s]
LAPACKRECSYlibFLAMEIntel MKLmedavgstd
15 / 19
![Page 54: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/54.jpg)
Challenges
I Parallelism? Libraries, explicit multi-threading, runtime, hybrid?
X := A((BTC−T )D) vs. X := (ABT )(C−TD) vs. . . .
15 / 19
![Page 55: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/55.jpg)
Challenges
I Parallelism? Libraries, explicit multi-threading, runtime, hybrid?
X := A((BTC−T )D) vs. X := (ABT )(C−TD) vs. . . .
⇒ Performance prediction (efficiency, scalability)
15 / 19
![Page 56: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/56.jpg)
Challenges
I Linear algebra knowledge: operators, identities, theorems
• Distributivity, commutativity, partitionings, . . .
• ((QR)TQR)−1(QR)T y → (RTQTQR)−1RTQT y → R−1R−TRTQT y → R−1QT y
• SPD(A)→ SPD(ABR − ABLA−1TLA
TBL) Schur complement
• · · ·
⇒ “Knowledge base” – expert system – pattern matching
15 / 19
![Page 57: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/57.jpg)
Challenges
I Linear algebra knowledge: operators, identities, theorems
• Distributivity, commutativity, partitionings, . . .
• ((QR)TQR)−1(QR)T y → (RTQTQR)−1RTQT y → R−1R−TRTQT y → R−1QT y
• SPD(A)→ SPD(ABR − ABLA−1TLA
TBL) Schur complement
• · · ·
⇒ “Knowledge base” – expert system – pattern matching
15 / 19
![Page 58: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/58.jpg)
Challenges
I Inference of properties
E := L1 ∗ UT ∗ L2 triangular(E ) ?
E := Q−1U(I + UTQ−1U)−1UT properties(I + UTQ−1U) ?
λ(A,B) ∧{
symm(A)SPD(B)
→ λ(L−TAL−1) symmetric(L−TAL−1) ?
15 / 19
![Page 59: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/59.jpg)
Challenges
I Inference of properties
E := L1 ∗ UT ∗ L2 triangular(E ) ?
E := Q−1U(I + UTQ−1U)−1UT properties(I + UTQ−1U) ?
λ(A,B) ∧{
symm(A)SPD(B)
→ λ(L−TAL−1) symmetric(L−TAL−1) ?
⇒ Symbolic analysis – pattern matching
15 / 19
![Page 60: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/60.jpg)
Challenges
I Common subexpressions
{X := AB−TC
Y := B−1ATD→
Z := AB−T
X := ZC
Y := ZTD
15 / 19
![Page 61: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/61.jpg)
Challenges
I Common subexpressions
{X := AB−TC
Y := B−1ATD→
Z := AB−T
X := ZC
Y := ZTD
⇒ Pattern matching
15 / 19
![Page 62: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/62.jpg)
Example
w := AB−1c , SPD(B)
Naive ← NEVER!!w = A*inv(B)*c
Recommendedw = A*(B\c)
ExpertL = Chol(B)
w = A * (L’\(L\c))
Generated – “Linnea” by H. Barthels
ml0 = A; ml1 = B; ml2 = c;
potrf!(’L’, ml1)
trsv!(’L’, ’N’, ’N’, ml1, ml2)
trsv!(’L’, ’T’, ’N’, ml1, ml2)
ml3 = Array{Float64}(10)
gemv!(’N’, 1.0, ml0, ml2, 0.0, ml3)
w = ml3
16 / 19
![Page 63: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/63.jpg)
Example
w := AB−1c , SPD(B)
Naive ← NEVER!!w = A*inv(B)*c
Recommendedw = A*(B\c)
ExpertL = Chol(B)
w = A * (L’\(L\c))
Generated – “Linnea” by H. Barthels
ml0 = A; ml1 = B; ml2 = c;
potrf!(’L’, ml1)
trsv!(’L’, ’N’, ’N’, ml1, ml2)
trsv!(’L’, ’T’, ’N’, ml1, ml2)
ml3 = Array{Float64}(10)
gemv!(’N’, 1.0, ml0, ml2, 0.0, ml3)
w = ml3
16 / 19
![Page 64: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/64.jpg)
Example
w := AB−1c , SPD(B)
Naive ← NEVER!!w = A*inv(B)*c
Recommendedw = A*(B\c)
ExpertL = Chol(B)
w = A * (L’\(L\c))
Generated – “Linnea” by H. Barthels
ml0 = A; ml1 = B; ml2 = c;
potrf!(’L’, ml1)
trsv!(’L’, ’N’, ’N’, ml1, ml2)
trsv!(’L’, ’T’, ’N’, ml1, ml2)
ml3 = Array{Float64}(10)
gemv!(’N’, 1.0, ml0, ml2, 0.0, ml3)
w = ml3
16 / 19
![Page 65: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/65.jpg)
Example
w := AB−1c , SPD(B)
Naive ← NEVER!!w = A*inv(B)*c
Recommendedw = A*(B\c)
ExpertL = Chol(B)
w = A * (L’\(L\c))
Generated – “Linnea” by H. Barthels
ml0 = A; ml1 = B; ml2 = c;
potrf!(’L’, ml1)
trsv!(’L’, ’N’, ’N’, ml1, ml2)
trsv!(’L’, ’T’, ’N’, ml1, ml2)
ml3 = Array{Float64}(10)
gemv!(’N’, 1.0, ml0, ml2, 0.0, ml3)
w = ml3
16 / 19
![Page 66: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/66.jpg)
Example
w := AB−1c , SPD(B)
Naive ← NEVER!!w = A*inv(B)*c
Recommendedw = A*(B\c)
ExpertL = Chol(B)
w = A * (L’\(L\c))
Generated – “Linnea” by H. Barthels
ml0 = A; ml1 = B; ml2 = c;
potrf!(’L’, ml1)
trsv!(’L’, ’N’, ’N’, ml1, ml2)
trsv!(’L’, ’T’, ’N’, ml1, ml2)
ml3 = Array{Float64}(10)
gemv!(’N’, 1.0, ml0, ml2, 0.0, ml3)
w = ml3
16 / 19
![Page 67: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/67.jpg)
Experiments
# Example
1 b := (XTX )−1XTy FullRank(X )
2 b := (XTM−1X )−1XTM−1y SPD(M), FullRank(X )
3 W := A−1BCD−TEF LowTri(A), UppTri(D,E)
4
{X := AB−1C
Y := DB−1AT SPD(B)
5 x := W (AT (AWAT )−1b − c) FullRank(A,W )
Diag(W ), Pos(W )...
17 / 19
![Page 68: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/68.jpg)
Linnea – Performance results
Jl nJl r
Arma
n
Arma
r
EignEig
rBl n
Mat n
Mat r
0
2
4
6
1Lin
nea
’ssp
eed
up
over
...
I Henrik Barthels
I https://github.com/HPAC/linnea
I “The Generalized Matrix Chain Algorithm”CGO’18
I “Program Synthesis for Algebraic Domains”(submitted)
18 / 19
![Page 69: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/69.jpg)
Linnea – Performance results
Jl nJl r
Arma
n
Arma
r
EignEig
rBl n
Mat n
Mat r
0
2
4
6
1Lin
nea
’ssp
eed
up
over
...
I Henrik Barthels
I https://github.com/HPAC/linnea
I “The Generalized Matrix Chain Algorithm”CGO’18
I “Program Synthesis for Algebraic Domains”(submitted)
18 / 19
![Page 70: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/70.jpg)
Objectives
I Linnea as a compiler (off line) vs. Linnea as an interpreter (real time)
I Integration into languages and libraries
I Challenges: We only scraped the surfaceMemory usage, parallelism, common subexpressions, . . .
I Beyond matrices: tensors
I Ultimately:
human productivity & computer efficiency
19 / 19
![Page 71: Paolo Bientinesi - umu.se](https://reader033.vdocuments.site/reader033/viewer/2022050613/6274543d39f1b748a520aff3/html5/thumbnails/71.jpg)
Objectives
I Linnea as a compiler (off line) vs. Linnea as an interpreter (real time)
I Integration into languages and libraries
I Challenges: We only scraped the surfaceMemory usage, parallelism, common subexpressions, . . .
I Beyond matrices: tensors
I Ultimately:
human productivity & computer efficiency
19 / 19