oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice...
TRANSCRIPT
![Page 1: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/1.jpg)
Oblikovanje i analiza algoritama
4. predavanje
Sasa Singer
web.math.pmf.unizg.hr/~singer
PMF – Matematicki odsjek, Zagreb
OAA 2019, 4. predavanje – p. 1/68
![Page 2: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/2.jpg)
Sadrzaj predavanja
Slozenost u praksi — eksperimenti (nastavak):
Mnozenje matrica reda n.
Blokovsko mnozenje matrica reda n.
OAA 2019, 4. predavanje – p. 2/68
![Page 3: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/3.jpg)
Informacije — web stranica
Moja web stranica za Oblikovanje i analizu algoritama je
https://web.math.pmf.unizg.hr/~singer/oaa/
ili, skraceno
https://web.math.hr/~singer/oaa/
Kopija je na adresi
http://degiorgi.math.hr/~singer/oaa/
Sluzbena web stranica za Oblikovanje i analizu algoritama je
https://web.math.pmf.unizg.hr/nastava/oaa/
OAA 2019, 4. predavanje – p. 3/68
![Page 4: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/4.jpg)
Mnozenje matrica
OAA 2019, 4. predavanje – p. 4/68
![Page 5: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/5.jpg)
Mnozenje matrica
Problem: Zadan je prirodni broj n ∈ N i 3 matrice A, B i C,reda n. Treba izracunati izraz
C := C + A ∗B.
Akumulacija (“nazbrajavanje”) produkta A ∗ B u matrici C
standardni je oblik BLAS–3 rutine xGEMM za mnozenjematrica,
tj. bas ova operacija se cesto koristi u praksi.
Usput, to ce opet
“prevariti” optimizaciju compilera,
kod visestrukog ponavljanja eksperimenta.
OAA 2019, 4. predavanje – p. 5/68
![Page 6: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/6.jpg)
Mnozenje matrica — formula
“Matematicka” realizacija matricne operacije
C := C + A ∗ B
po elementima je trivijalna:
cij := cij +n
∑
k=1
aik · bkj,
za sve indekse
i = 1, . . . , n, j = 1, . . . , n.
Dakle, “programski” — treba “zavrtiti” tri petlje.
OAA 2019, 4. predavanje – p. 6/68
![Page 7: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/7.jpg)
Mnozenje matrica — potprogram
subroutine mulijk (lda, n, a, b, c)
c
c Matrix multiply
c C(n, n) = C(n, n) + A(n, n) * B(n, n).
c
implicit none
c
integer lda, n
double precision a(lda, lda), b(lda, lda),
$ c(lda, lda)
c
integer i, j, k, nn
OAA 2019, 4. predavanje – p. 7/68
![Page 8: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/8.jpg)
Mnozenje matrica — potprogram (nastavak)
c
c IJK loop, inner
c
nn = n
do 30, i = 1, nn
do 20, j = 1, nn
do 10, k = 1, nn
c(i, j) = c(i, j) + a(i, k) * b(k, j)
10 continue
20 continue
30 continue
c
return
end
OAA 2019, 4. predavanje – p. 8/68
![Page 9: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/9.jpg)
Permutacija petlji
Ovu varijantu algoritma zovemo ijk — opet po poretku(indeksa) petlji, izvana prema unutra.
Sve tri petlje mozemo permutirati, tj. napisati ih u bilo kojemporetku. Na taj nacin dobivamo ukupno 6 varijanti algoritma,koje zovemo leksikografskim redom:
ijk,
ikj,
jik,
jki,
kij,
kji.
OAA 2019, 4. predavanje – p. 9/68
![Page 10: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/10.jpg)
Broj operacija
U svakom prolazu kroz unutarnju petlju imamo dvijeoperacije:
mnozenje matricnih elemenata aik · bkj ,
zbrajanje tog produkta s cij .
Sve tri petlje imaju (svaka) tocno n prolaza.
Ukupan broj operacija u svim varijantama algoritma je:
F (n) = 2n3.
Broj ponavljanja N(n) izabran je tako da dobijemo pribliznokonstantno trajanje “okolne” petlje (s ponavljanjem) kojojmjerimo vrijeme, sve dok N(n) ne padne na 1, za n = 450.
OAA 2019, 4. predavanje – p. 10/68
![Page 11: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/11.jpg)
Boje na grafovima
Legenda za citanje grafova:
petlja ijk — zeleno, rang 3;
petlja ikj — narancasta, rang 5;
petlja jik — zuta, rang 4;
petlja jki — ljubicasta, rang 1;
petlja kij — crveno, rang 6;
petlja kji — plavo, rang 2.
OAA 2019, 4. predavanje – p. 11/68
![Page 12: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/12.jpg)
BabyBlue, CVF, normal
Compaq Visual Fortran:
normalna optimizacija:
prvo 6 pojedinacnih slika, leksikografskim redom, popetljama,
a zatim, zajednicki graf za svih 6 petlji.
fast optimizacija:
permutira petlje,
tako da svih 6 petlji daje gotovo istu brzinu.
Usporedba:
najbrze petlje jki u fast optimizaciji i
MKL-ovog algoritma DGEMM.
OAA 2019, 4. predavanje – p. 12/68
![Page 13: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/13.jpg)
BabyBlue, CVF, normal — ijk
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica ijkbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 13/68
![Page 14: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/14.jpg)
BabyBlue, CVF, normal — ikj
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica ikjbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 14/68
![Page 15: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/15.jpg)
BabyBlue, CVF, normal — jik
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica jikbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 15/68
![Page 16: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/16.jpg)
BabyBlue, CVF, normal — jki
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica jkibrzinauMflop
sima
OAA 2019, 4. predavanje – p. 16/68
![Page 17: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/17.jpg)
BabyBlue, CVF, normal — kij
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica kijbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 17/68
![Page 18: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/18.jpg)
BabyBlue, CVF, normal — kji
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matrica kjibrzinauMflop
sima
OAA 2019, 4. predavanje – p. 18/68
![Page 19: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/19.jpg)
BabyBlue, CVF, normal
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, CVF, normal – Mnozenje matricabrzinauMflop
sima
OAA 2019, 4. predavanje – p. 19/68
![Page 20: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/20.jpg)
BabyBlue, CVF, fast
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, CVF, fast – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 20/68
![Page 21: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/21.jpg)
BabyBlue, CVF, fast — najbrzi i MKL
0 500 1000 1500 2000
0
2000
4000
6000
red matrice
Pentium 4/660, 3.6 GHz, CVF, MKL – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 21/68
![Page 22: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/22.jpg)
BabyBlue, IVF, normal
Intel Visual Fortran:
normalna optimizacija:
prvo 6 pojedinacnih slika, leksikografskim redom, popetljama,
a zatim, zajednicki graf za svih 6 petlji.
fast optimizacija:
permutira petlje,
tako da imamo 3 para petlji s gotovo istom brzinom.
Usporedba:
najbrze petlje jki u fast optimizaciji i
MKL-ovog algoritma DGEMM.
OAA 2019, 4. predavanje – p. 22/68
![Page 23: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/23.jpg)
BabyBlue, IVF, normal — ijk
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica ijkbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 23/68
![Page 24: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/24.jpg)
BabyBlue, IVF, normal — ikj
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica ikjbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 24/68
![Page 25: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/25.jpg)
BabyBlue, IVF, normal — jik
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica jikbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 25/68
![Page 26: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/26.jpg)
BabyBlue, IVF, normal — jki
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica jkibrzinauMflop
sima
OAA 2019, 4. predavanje – p. 26/68
![Page 27: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/27.jpg)
BabyBlue, IVF, normal — kij
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica kijbrzinauMflop
sima
OAA 2019, 4. predavanje – p. 27/68
![Page 28: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/28.jpg)
BabyBlue, IVF, normal — kji
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica kjibrzinauMflop
sima
OAA 2019, 4. predavanje – p. 28/68
![Page 29: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/29.jpg)
BabyBlue, IVF, normal
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 29/68
![Page 30: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/30.jpg)
BabyBlue, IVF, fast
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, IVF, fast – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 30/68
![Page 31: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/31.jpg)
BabyBlue, IVF, fast — najbrzi i MKL
0 500 1000 1500 2000
0
2000
4000
6000
red matrice
Pentium 4/660, 3.6 GHz, IVF, MKL – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 31/68
![Page 32: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/32.jpg)
Tablica brzina za velike n
Usporedba brzina (u Mflops) samo na BabyBlue:
po petljama (ukljucivo i MKL),
za normal i fast opcije kod oba compilera.
Petlja normal CVF normal IVF fast CVF fast IVF
ijk 229.0 228.4 1186.3 1086.4
ikj 106.7 106.5 1186.5 1086.0
jik 204.8 205.1 1185.3 1248.7
jki 1034.2 839.0 1186.4 1249.1
kij 95.0 94.9 1185.1 543.3
kji 544.0 538.6 1185.7 543.3
MKL 5945.7 5967.3 5966.5 5967.6
OAA 2019, 4. predavanje – p. 32/68
![Page 33: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/33.jpg)
Ostala racunala
Vrlo slicno ponasanje brzina za petlje vidi se i na ostalimracunalima.
Grafovi su “skraceni” tako da sadrze redom:
usporedbu brzina svih 6 petlji za normal i fast opcijecompilera (samo CVF),
usporedbu najbrze fast petlje MKL-a.
OAA 2019, 4. predavanje – p. 33/68
![Page 34: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/34.jpg)
Klamath5, CVF, normal
0 500 1000 1500 2000
0
50
100
150
200
red matrice
Pentium III, 500 MHz, CVF, normal – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 34/68
![Page 35: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/35.jpg)
Klamath5, CVF, fast
0 500 1000 1500 2000
0
50
100
150
200
red matrice
Pentium III, 500 MHz, CVF, fast – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 35/68
![Page 36: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/36.jpg)
Veliki, CVF, normal
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4, 3.0 GHz, CVF, normal – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 36/68
![Page 37: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/37.jpg)
Veliki, CVF, fast
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4, 3.0 GHz, CVF, fast – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 37/68
![Page 38: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/38.jpg)
Klamath5, CVF, fast — najbrzi i MKL
0 500 1000 1500 2000
0
100
200
300
400
red matrice
Pentium III, 500 MHz, CVF, MKL – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 38/68
![Page 39: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/39.jpg)
Veliki, CVF, fast — najbrzi i MKL
0 500 1000 1500 2000
0
2000
4000
6000
red matrice
Pentium 4, 3.0 GHz, CVF, MKL – Mnozenje matrica
brzinauMflop
sima
OAA 2019, 4. predavanje – p. 39/68
![Page 40: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/40.jpg)
Komentar rezultata
Kod mnozenja matrica, za razliku od zbrajanja,
svaki ulazni podatak koristimo puno puta, (preciznije,tocno n puta).
Zato brzina cache memorije moze doci do izrazaja, pa mozemodobiti
bitno vece brzine nego kod zbrajanja.
Cache memorija je “glavni krivac” za:
razlike u brzinama izmedu raznih varijanti, i
povecanu brzinu za male n-ove.
Ponavljanje eksperimenta ima neku ulogu samo za vrlo maleredove n. Osim toga, za n ≥ 450 nema ponavljanja.
OAA 2019, 4. predavanje – p. 40/68
![Page 41: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/41.jpg)
Komentar rezultata (nastavak)
Brze su one varijante koje
ucestalije koriste iste podatke, dok su oni jos u cacheu.
Dokaz: Cache se “puni” u “blokovima”, kako su matricespremljene. Najbrza bi trebala biti ona varijanta koja
sekvencijalno prolazi kroz elemente u sve 3 matrice
u “unutarnjoj” naredbi
c(i, j) = c(i, j) + a(i, k) * b(k, j)
U Fortranu, zbog spremanja matrice po stupcima, prvi indeksse brze mijenja. Zato mora biti:
i unutar j, i unutar k, k unutar j.
Dakle, najbrza varijanta algoritma je jki, sto zaista i je!
OAA 2019, 4. predavanje – p. 41/68
![Page 42: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/42.jpg)
Komentar rezultata (nastavak)
Zadnji argument da je “krivac” cache memorija.
Konstruktivni dokaz: “Blokovskom” realizacijom algoritma
za velike n mozemo postici gotovo iste brzine kao i zamale n (tj. sprijeciti pad brzine).
Ovo, naravno, ide samo onda kad
za velike n dobijemo pad brzine.
U protivnom, compiler se “vec pobrinuo” da optimalnoiskoristi cache.
Primjer za IVF da to radi za normal, pa cak i za fast opciju.
OAA 2019, 4. predavanje – p. 42/68
![Page 43: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/43.jpg)
Blokovsko mnozenje matrica
primjer
OAA 2019, 4. predavanje – p. 43/68
![Page 44: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/44.jpg)
Blokovsko mnozenje matrica — primjer
IVF s normal opcijom za jik petlju daje brzine:
1050MFlops za n ≤ 50,
205MFlops za velike n.
IVF s normal opcijom za jki petlju daje brzine:
1100MFlops za n ≤ 300,
840MFlops za velike n.
IVF s fast opcijom za jki petlju daje brzine:
2000MFlops za n ≤ 300,
1250MFlops za velike n.
OAA 2019, 4. predavanje – p. 44/68
![Page 45: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/45.jpg)
BabyBlue, IVF, normal — jik obicni i blok (50)
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnoz. mat. jik (50)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 45/68
![Page 46: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/46.jpg)
BabyBlue, IVF, normal — jki obicni i blok (300)
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnoz. mat. jki (300)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 46/68
![Page 47: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/47.jpg)
BabyBlue, IVF, fast — jki obicni i blok (300)
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, IVF, fast – Mnoz. mat. jki (300)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 47/68
![Page 48: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/48.jpg)
Blokovsko mnozenje matrica
OAA 2019, 4. predavanje – p. 48/68
![Page 49: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/49.jpg)
Mnozenje matrica
Problem: Zadan je prirodni broj n ∈ N i 3 matrice A, B i C,reda n. Treba izracunati izraz
C := C + A ∗B.
Znamo da je realizacija po elementima trivijalna
cij := cij +n
∑
k=1
aik · bkj,
za sve indekse
i = 1, . . . , n, j = 1, . . . , n.
Dakle, “programski” — treba “zavrtiti” tri petlje.
OAA 2019, 4. predavanje – p. 49/68
![Page 50: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/50.jpg)
Mnozenje matrica — realizacija po elementima
Programska realizacija na “skalarnoj” razini (po elementima)ima ovaj opci oblik:
3 petlje po i, j, k, svaka od 1 do n,
operacija unutar tih petlji je
cij := cij + aik · bkj,
tj. mnozenje i zbrajanje skalara.
Ove tri petlje smijemo permutirati pa dobivamo 6 razlicitihvarijanti osnovnog algoritma:
ijk, ikj, jik, jki, kij, kji.
OAA 2019, 4. predavanje – p. 50/68
![Page 51: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/51.jpg)
Mnozenje matrica — podjela na blokove
Matrice A i B mozemo podijeliti na blokove
A =
A11 A12 · · · A1r
A21 A22 · · · A2r
...... · · ·
...
Ap1 Ap2 · · · Apr
, B =
B11 B12 · · · B1q
B21 B22 · · · B2q
...... · · ·
...
Br1 Br2 · · · Brq
.
Ako su blokovi Aik i Bkj takvi da se mogu mnoziti za sveindekse i, j, k, onda operaciju C = C + A ∗ B mozemoizracunati “po blokovima”, gdje je
Cij = Cij +
r∑
k=1
Aik ∗Bkj, i = 1, . . . , p, j = 1, . . . , q.
OAA 2019, 4. predavanje – p. 51/68
![Page 52: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/52.jpg)
Mnozenje matrica — blokovi (nastavak)
Podjela matrica A i B na blokove koji se mogu mnozitiinducira podijelu matrice C na blokove
C =
C11 C12 · · · C1q
C21 C22 · · · C2q
...... · · ·
...
Cp1 Cp2 · · · Cpq
.
Pojednostavljenje: sve tri ulazne matrice su kvadratne reda n
pa ih dijelimo na isti nacin u blokove.
Dakle, p = q = r = (oznaka) = N , gdje je N tzv. “blok–red”matrice.
OAA 2019, 4. predavanje – p. 52/68
![Page 53: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/53.jpg)
Mnozenje matrica — blokovi (nastavak)
Podjela sve tri matrice A, B i C ima isti oblik (napisan za C)
C =
C11 C12 · · · C1N
C21 C22 · · · C2N
...... · · ·
...
CN1 CN2 · · · CNN
.
Pojedini blokovi — podmatrice Aij, Bij i Cij su
matrice istog tipa, oznacimo ga s ni × nj.
Uocite da blokovi ne moraju vise biti kvadratne matrice —opcenito su pravokutne.
OAA 2019, 4. predavanje – p. 53/68
![Page 54: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/54.jpg)
Mnozenje matrica — blokovi (nastavak)
Za velicine blokova mora vrijediti
N∑
i=1
ni = n.
Kako se odreduju velicine blokova ni, za i = 1, . . . , N — malokasnije.
Matricna operacija C = C + A ∗ B sad ima “blokovski” oblik
Cij = Cij +N∑
k=1
Aik ∗Bkj, i = 1, . . . , N, j = 1, . . . , N.
OAA 2019, 4. predavanje – p. 54/68
![Page 55: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/55.jpg)
Mnozenje matrica — realizacija po blokovima
Programska realizacija na “blokovskoj” razini (po blokovima)ima ovaj opci oblik:
3 petlje po i, j, k, svaka od 1 do N ,
operacija unutar tih petlji je
Cij := Cij + Aik ·Bkj ,
tj. mnozenje i zbrajanje matrica.
Ova operacija ima isti oblik xGEMM kao i cijeli polazni problem(“rekurzija”), samo sto matrice ne moraju biti kvadratne
(ni × nj) = (ni × nj) + (ni × nk) ∗ (nk × nj).
OAA 2019, 4. predavanje – p. 55/68
![Page 56: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/56.jpg)
Blokovsko mnozenje matrica — petlje
Tri petlje za blokove smijemo permutirati — pa dobivamo 6razlicitih varijanti blokovskog algoritma:
ijk, ikj, jik, jki, kij, kji.
Za “unutarnje” mnozenje pojedinih blokova, takoder, imamoodgovarajucih 6 varijanti osnovnog algoritma.
Dakle, sve skupa, imamo 36 varijanti!
Tko hoce, neka proba sve. Ja necu.
U nastavku koristim
istu varijantu (permutaciju petlji) i za blokovski i zaosnovni (skalarni) algoritam.
OAA 2019, 4. predavanje – p. 56/68
![Page 57: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/57.jpg)
Blokovsko mnozenje matrica — velicine blokova
Ideja: velicine blokova izabrati tako da se unutarnje mnozenjeblokova
Cij := Cij + Aik ·Bkj
(operacija xGEMM) obavlja u cacheu.
Postupak. Iz tablice brzina za odabrani osnovni algoritam
nademo priblizni maksimalni red n za koji jos dobivamopunu “cache” brzinu.
Nazovimo taj red s ncache.
OAA 2019, 4. predavanje – p. 57/68
![Page 58: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/58.jpg)
Velicine blokova (nastavak)
Cilj podjele na blokove je
unutarnje mnozenje blokova mora raditi s matricamavelicine manje (ili jednake) ncache.
Dakle, mora vrijediti
ni ≤ ncache, i = 1, . . . , N.
Tome dodajemo raniji uvjet
N∑
i=1
ni = n.
Iz ovih uvjeta mozemo odrediti broj blokova N .
OAA 2019, 4. predavanje – p. 58/68
![Page 59: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/59.jpg)
Broj blokova N
Uvrstimo ni ≤ ncache, za i = 1, . . . , N , u relaciju za zbroj.Izlazi
n =N∑
i=1
ni ≤ N · ncache,
iliN ≥
n
ncache
.
Broj blokova N mora biti cijeli broj i jos (prirodno) zelimo da
N bude sto manji — najmanji moguci!
Onda treba uzeti
N =
⌈
n
ncache
⌉
=
⌊
n+ ncache − 1
ncache
⌋
.
OAA 2019, 4. predavanje – p. 59/68
![Page 60: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/60.jpg)
Velicine blokova (nastavak)
Za nalazenje ni standardno se koriste dva pristupa.
“equal–sized” ili “uniform” — svi ni imaju podjednakuvelicinu ni ≈ n/N , tj. razlika medu njima je najvise 1.
“greedy” — svi ni imaju maksimalnu velicinu ncache,osim, eventualno, jednog od njih (prvi ili zadnji).
Ako zelimo dobiti jednoznacnost rastava na blokove, zgodno jeuzeti da su
velicine blokova ni sortirane — uzlazno ili silazno.
U nastavku uzimamo silazni poredak
n1 ≥ n2 ≥ · · · ≥ nN .
OAA 2019, 4. predavanje – p. 60/68
![Page 61: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/61.jpg)
“Equal–sized” — podjednake velicine blokova
Definiramo ostatak
nr := n mod N.
Podjela na blokove izlazi iz rastava broja n oblika
n =⌊ n
N
⌋
·N + nr = (N − nr) ·⌊ n
N
⌋
+ nr ·
(
⌊ n
N
⌋
+ 1
)
.
Velicine blokova ni u silaznom poretku su
ni =
⌊ n
N
⌋
+ 1, za i = 1, . . . , nr,
⌊ n
N
⌋
, za i = nr + 1, . . . , N .
OAA 2019, 4. predavanje – p. 61/68
![Page 62: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/62.jpg)
“Greedy” — maksimalne velicine blokova
Definiramo ostatak
nr := n mod ncache.
Podjela na blokove izlazi iz rastava broja n oblika
n =
⌊
n
ncache
⌋
· ncache + nr.
Velicine blokova ni u silaznom poretku su
ni = ncache, i = 1, . . . , N − 1,
nN =
{
ncache, za nr = 0 (tj. ncache dijeli n),
nr, za nr > 0.
OAA 2019, 4. predavanje – p. 62/68
![Page 63: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/63.jpg)
Velicine blokova (nastavak)
U primjerima se koristi “equal–sized” ili “uniform” podjela.
Napomena. Pravu (najbolju) vrijednost za ncache odredujemo
testiranjem blokovskog algoritma!
(Taj treba biti sto brzi.)
OAA 2019, 4. predavanje – p. 63/68
![Page 64: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/64.jpg)
Blokovsko mnozenje matrica — indeksi
I jos, da nam se indeksi i oznake ne “pomijesaju”
sto indeksira blokove, a sto elemente,
dodajemo podindeks “b” za sve sto se odnosi na blokove.
OAA 2019, 4. predavanje – p. 64/68
![Page 65: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/65.jpg)
Blokovsko mnozenje matrica — primjer
IVF s normal opcijom za jik petlju daje brzine:
1050MFlops za n ≤ 50,
205MFlops za velike n.
IVF s normal opcijom za jki petlju daje brzine:
1100MFlops za n ≤ 300,
840MFlops za velike n.
IVF s fast opcijom za jki petlju daje brzine:
2000MFlops za n ≤ 300,
1250MFlops za velike n.
OAA 2019, 4. predavanje – p. 65/68
![Page 66: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/66.jpg)
BabyBlue, IVF, normal — jik obicni i blok (50)
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnoz. mat. jik (50)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 66/68
![Page 67: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/67.jpg)
BabyBlue, IVF, normal — jki obicni i blok (300)
0 500 1000 1500 2000
0
500
1000
1500
red matrice
Pentium 4/660, 3.6 GHz, IVF, normal – Mnoz. mat. jki (300)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 67/68
![Page 68: Oblikovanje i analiza algoritama 4. predavanje0 500 1000 1500 2000 0 500 1000 1500 red matrice Pentium 4/660, 3.6 GHz, CVF, normal – Mnoˇzenje matrica ijk brzina u Mflopsima OAA](https://reader036.vdocuments.site/reader036/viewer/2022081409/6082a44e79b5de48502d3d02/html5/thumbnails/68.jpg)
BabyBlue, IVF, fast — jki obicni i blok (300)
0 500 1000 1500 2000
0
500
1000
1500
2000
2500
red matrice
Pentium 4/660, 3.6 GHz, IVF, fast – Mnoz. mat. jki (300)brzinauMflop
sima
OAA 2019, 4. predavanje – p. 68/68