cache oblivious algorithms zhang jiahui neel kamal
DESCRIPTION
Large Integer Multiplication(1) We have two large Integer x and y. x has m digits and y has n digits If m>n, append zeros to the left side of n If n>m, append zeros to the left side of m Suppose m>n, we now have two large Integers both of length mTRANSCRIPT
![Page 1: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/1.jpg)
Cache Oblivious Algorithms
Zhang JiaHuiNeel Kamal
![Page 2: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/2.jpg)
Introduction
Cache Oblivious vs Cache Aware (Z,L) Idea-Cache-Model Large Integer Multiplication & RSA Dynamic Programming
- Floyd All-Pair Shortest Paths- Longest Common Sequence
Cache-Behavior Simulator Experimental Results
2LZ
![Page 3: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/3.jpg)
Large Integer Multiplication(1)
We have two large Integer x and y. x has m digits and y has n digits
If m>n, append zeros to the left side of nIf n>m, append zeros to the left side of mSuppose m>n, we now have two large
Integers both of length m
![Page 4: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/4.jpg)
Large Integer Multiplication(1)
A B
C D
B x CA x C
B x D
A x D
Final Result
![Page 5: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/5.jpg)
Large Integer Multiplication(1)
m > nCASE I
After k steps, m
4Zm
otherwiseOmQ
ZZmifLm
mQ,)1(
24
)4
,8
(,4
)(
4,
82ZZm
k
LZm
L
mmQmQ
kk
kk
21624
42
4)(
![Page 6: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/6.jpg)
Large Integer Multiplication(1)
m>nCASE II
If n>m, in CASE I, and
So, combine all the cases, we have
4Zm
LmmQ 4)(
LZnnQ
216)(
LnnQ 4)(
Ln
Lm
LZm
LZnnQ
22
)(
![Page 7: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/7.jpg)
Large Integer Multiplication(2)
We do not append zero to the left hand side of the shorter Integer
CASE I
otherwiseOnmQ
nmifotherwiseOnmQ
ZZnmifLnm
Ln
Lm
nmQ
,)1(2
,2
)(,,)1(,2
2
),2
,(,
),(
Znm ,
![Page 8: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/8.jpg)
Large Integer Multiplication(2)
After k1 steps m
After k2 steps n
ZZm
k ,22 1
ZZn
k ,22 2
LZmn
Lnm
L
nmnmQnmQ
kkkkkk
kkkk
122121
2121 22222
222
2,
222),(
![Page 9: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/9.jpg)
Large Integer Multiplication(2)
CASE II
If
Zm
otherwiseOnmQ
ZZnifLnm
Ln
Lm
nmQ,)1(
2,2
),2
(,),(
Ln
LZmnnmQ ),(
Zn
Lm
LZmnnmQ ),(
![Page 10: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/10.jpg)
Large Integer Multiplication(2)
CASE III
Combine all the cases:
Total work
Znm ,
Lnm
Ln
LmnmQ ),(
LZmn
Lnm
Ln
LmnmQ ),(
mnO
![Page 11: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/11.jpg)
Large Integer Multiplication & RSA
Summary of RSA n = pqn = pq where p and q are distinct
primes. phi, φ = (p-1)(q-1)φ = (p-1)(q-1) e < n such that gcd(e, phi)=1 d = e^-1 mod phi. c = m^e mod n. m = c^d mod n.
![Page 12: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/12.jpg)
All-pair shortest Paths Floyd for k=1 to n for i=1 to n for j=1 to n d[i][j][k]=min(d[i][j][k-1],d[i][k][k-1]+d[k][j][k-1]
For each k (1..n)
3nO
![Page 13: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/13.jpg)
All-pair shortest Paths
For each iteration of k CASE I
After k steps
Zn
otherwiseOnQ
ZZnifLn
nQ
,)1(2
4
),2
(,)(
2
Ln
L
nnQnQ
kk
kk
2
2
242
4)(
ZZnn k ,
22
![Page 14: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/14.jpg)
All-pair shortest Paths
CASE II
Combine the cases:
We have n iterations,
Zn
)()( nnQ
LnnnQ
2
)(
LnnnQtotal
32)(
![Page 15: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/15.jpg)
Longest Common Sequence
We have 2 long sequences x and y, x is of length m, and y is of length n.
Try to find the Longest Common Sequence of x and y.
Dynamic Programming
![Page 16: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/16.jpg)
Longest Common Sequencey1 y2 y3 y4 y5 y6
B D C A B A
x1 A 0 0 0 1 1 1
x2 B 1 1 1 1 2 2
x3 C 1 1 2 2 2 2
x4 B 1 1 2 2 3 3
x5 D 1 2 2 2 3 3
x6 A 1 2 2 3 3 4
x7 B 1 2 2 3 4 4
If x[i] == y[j]c[i][j] = c[i -1][j -1]+1;
elsec[i][j] = max{c[i -1][j]; c[i][j -1];
![Page 17: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/17.jpg)
Longest Common Sequencey1 y2 y3 y4 y5 y6
B D C A B A
x1 A 0 0 0 1 1 1
x2 B 1 1 1 1 2 2
x3 C 1 1 2 2 2 2
x4 B 1 1 2 2 3 3
x5 D 1 2 2 2 3 3
x6 A 1 2 2 3 3 4
x7 B 1 2 2 3 4 4
B D C A B A
A 0 0 0 1 1 1
B 1 1 1 1 2 2
C 1 1 2 2 2 2
B 1 1 2 2 3 3
![Page 18: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/18.jpg)
Longest Common Sequence
CASE I Znm ,
otherwiseOnmQ
nmifotherwiseOnmQ
ZZnmifLmn
nmQ
,)1(2
,2
)(,,)1(,2
2
),2
,(,
),(
![Page 19: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/19.jpg)
Longest Common Sequence
Suppose:After k1 steps, m
After k2 steps, n
ZZm
k ,22 1
ZZn
k ,22 2
Lmn
L
nmnmQnmQ
kkkk
kkkk
2121
2121 2222
2,
222),(
![Page 20: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/20.jpg)
Longest Common Sequence
CASE II Zm
otherwiseOnmQ
ZZnifmnmQ
,)1(2
,2
),2
(,1),(
ZmnnmQ ),(
In the case when Zn
otherwiseOnmQ
ZZmifmnmQ
,)1(,2
2
),2
(,1),(
mnmQ ),(
![Page 21: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/21.jpg)
Longest Common Sequence
CASE III
Combine all 3 cases:
Total Work
Znm ,
mnmQ 1),(
Lmn
ZmnmnmQ 21),(
mnO
![Page 22: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/22.jpg)
Cache Oblivious approaches for Dynamic Programming
Dynamic Programming to find an optimal solution Sub-problems overlap
Approaches bottom up (by recursion usually) top down but with a table to memorize earlier solutions
Divide and Conquer method to build the table recursively to make the approach cache oblivious?
![Page 23: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/23.jpg)
Cache Simulator
With a tall cache assumption
A fully associative cache
2LZ
Other Assumptions • No temporary variable put into the cache• All input data is assumed to be already present in cache.
![Page 24: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/24.jpg)
ResultsSummarizing the theoretical Results:
LZmn
Lnm
Ln
LmnmQ ),(Large Integer Multiplication
All-pair shortest Paths
Longest Common Sequence
LnnnQ
2
)(
Lmn
ZmnmnmQ 21),(
![Page 25: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/25.jpg)
Results from Simulation
Target Machine: arch : IA-64 family : Itanium 2 CPU MHz : 896.262997 Cache size : 303312 KB OS Linux version 2.4.22 gcc version 2.96 20000731
We will see that there is a very close match between the theoretical results and the
simulation result.
![Page 26: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/26.jpg)
Results
Cache Oblivious Large Integer Multiplication
248115
289047097 2712 1089 465 270 101 89 810
50000100000150000200000250000300000
10 20 30 40 50 60 70 80 90 100
Size of cache Line
Num
ber o
f Cac
he M
isse
s
Size of Integer: M = 1000 N = 1000
3),(
LmnnmQ
![Page 27: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/27.jpg)
Comparing Results
Case 1: L = 20, Z = 400Theoretical Result = Θ (1000/8)Simulator Result = 28904ratio = 0.0041
Case 2: L=30, Z = 900Theoretical Result = Θ (1000/27)Simulator Result = 7097ratio = 0.0044
Case 3: L=40, Z = 1600Theoretical Result = Θ (1000/64)Simulator Result = 2712ratio = 0.0052
![Page 28: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/28.jpg)
More Results
Cahe Oblivious Longest Common Sequence
0
500000
1000000
1500000
2000000
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Size of Cache Line
Num
ber o
f Cac
he M
isse
s
Size of Sequence: M = 1000 N = 1000
LmnnmQ ),(
![Page 29: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/29.jpg)
More ResultsCache Oblivious Floyd Algorithm
0100000200000300000400000500000600000700000
20 30 40 50 60 70 80 90 100
Size of Cache Line
Num
ber o
f Cac
he M
isse
s
Number of Vertices N = 100
LnnnQ
2
)(
![Page 30: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/30.jpg)
Some More Work
We also implemented Parallel solutions to each of these problems. We had test results of their performance on CILK.
![Page 31: Cache Oblivious Algorithms Zhang JiaHui Neel Kamal](https://reader035.vdocuments.site/reader035/viewer/2022070605/5a4d1af47f8b9ab0599804f6/html5/thumbnails/31.jpg)
Thank You