![Page 1: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/1.jpg)
Matrix Factorizations for Parallel Integer Transforms
Yiyuan She1,2,3, Pengwei Hao1,2, Yakup Paker2
1Center for Information Science, Peking University
2Queen Mary, University of London
3Department of Statistics, Stanford University
![Page 2: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/2.jpg)
Contents
1. Introduction2. Point & block factorizations3. Parallel ERM factorization (PERM)4. Parallel computational complexity5. Matrix blocking strategy6. Conclusions
![Page 3: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/3.jpg)
Why integer transform reversible?
store
commE
ncoding
Spatial Transform
B/W image
Color image
Multi-component image
Color Space T
MCTD
ecoding
Inverse Spatial T
Color image
Multi-component image
InverseColor T
IMCT
B/W image
Lossless?
Lossless? Lossless?
![Page 4: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/4.jpg)
How to implement?• Wavelet construction
S transform (Blume & Fand, 1989)
TS transform (Zandi et al, 1995)
S+P transform (Said & Pearlman, 1996)
• Ladder structure (Bruekers & van den Enden, 1992)
• Lifting scheme (2D, Sweldens, 1996)
• Approximated color transform (Gormish et al, 1997)
• General wavelet transform (2D, Daubechies et al, 1998)
![Page 5: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/5.jpg)
Matrix factorizationsP. Hao and Q. Shi, Invertible linear transforms
implemented by integer mapping, Science in China, Series E (in Chinese), 2000, 30, pp. 132-141.
P. Hao and Q. Shi, Matrix factorizations for reversible integer mapping, IEEE Trans. Signal Processing, 2001, 49 pp. 2314-2324.
P. Hao and Q. Shi, Proposal of reversible integer implementation for multiple component transforms, ISO/IEC JTC1/SC29/WG1N1720, Arles, France, 2000.
Y. She and P. Hao, A block TERM factorization of nonsingular uniform block matrices, Science in China, Series E (in Chinese), 2004, 34(2).
![Page 6: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/6.jpg)
Can we make it more efficient?
Less factor matricesLess rounding errorInteger computationParallel computing
How to increase the degree of parallelism?
![Page 7: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/7.jpg)
b
[ ]
x j y+
+
b
x1/j
b
[ ]
-
+
Elementary reversible structure
• Integer factor: j• Flexible rounding: round(), floor(), ceil(), …• Generalized lifting scheme: for j =1, it is the same
as ladder structure and the lifting scheme• Implementation: y=jx+[b] and x=(1/j).(y+[b])
y=jx+[b] x=(1/j).(y+[b])
![Page 8: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/8.jpg)
Elementary reversible matrix (ERM)• Diagonal elements: Integer factors• Triangular ERM (TERM)
– Upper TERM– Lower TERM
• Single-row ERM (SERM)–– Only one row off-diagonal nonzeros
Tm m m= +S J e s
![Page 9: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/9.jpg)
Point factorizations (PLUS)
0
1 1
R
N N −
==
A PLUS DLU S S Sif det det 0T
R= ≠P A D
]0,,,,[ 12100 −⋅+=+= NNT
N ssseIseIS
( ))det(,1,,1,1 APD TR Diag=
Tm m m= +S I e s
![Page 10: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/10.jpg)
Block factorizations (BLUS)
0
1 1
R
N N −
==
A PLUS DLU S S Sif ( ) ( ) existsT
R=DET P A DET D
0 0 1 2 1[ , , , ,0]TN N N −= + = + ⋅S I e s I e s s s
( ), , , , ( )TR Diag=D I I I DET P A
Tm m m= +S I e s
![Page 11: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/11.jpg)
Parallel factorizations (PERM)(1) (2) ( )(0) (1) (2) ( 1) ( )
1 2
Kn n nK KN m m m m m N−= → → → =
( )
1(1) (2) ( ) ( ) ( ) ( ) (1) (1) ( ) ( )
1( ) ( ) kK K K K k k
nk K=
= = ∏A P P P D L U L U PD S S
PERM(0)
PERM(1)
![Page 12: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/12.jpg)
Parallel computing PERM(0)
x P y(1)1S (1)
2S (1)3S (1)
4S (2)1S (2)
2S (2)3S (2)
4S
![Page 13: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/13.jpg)
Parallel computing PERM(1)
x (1)0S (1)
1S (1)2S (1)
3S (1)4S (2)
0S (2)1S (2)
2S (2)3S (2)
4S P y
![Page 14: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/14.jpg)
Parallel multiplication
For p processors to implement multiplications of n pairs of numbers
the computational time is:
* nTp
=
![Page 15: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/15.jpg)
Parallel additionx
1S x
1
1
1
1
1
1
1
1
1
1
1
1
(1,5)(1,6)(1,7)(1,8)(1,9)(1,10)(1,11)(1,12)(1,13)(1,14)(1,15)(1,16)
SSSSSSSSSSSS
[ ]
2
2
log if 2/ log if 2
n n pT
n p C p n p+ < = + ≥
![Page 16: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/16.jpg)
Computational complexity *
( )
(1)* ( ) ( ) ( 1) ( )
12 2
( 1) 2 ( ) 2 1 2
1
( 1) ( ) /
1 ( ) ( )
Kk k k k
PERMk
Kk k
k
T n m m m p
N Nm mp p
−
=
−
=
= + −
−≈ − =
∑
∑
( 0)* ( ) ( ) ( 1) ( ) 1
( 1)1
( 1) ( )1 1 1 2
1
( ) /
( ) ( )
Kk k k k
kPERMk
Kk k
k
NT n m m m pm
N N N Nm mp p
−−
=
−
=
= −
−≈ − =
∑
∑
For n(k)m(k)= m(k–1), m(0)=N1, m(K)=N2 , the parallel multiplication time is:
It’s independent of the blocking manners.
(1) (2) ( )(0) (1) (2) ( 1) ( )1 2
Kn n nK KN m m m m m N−= → → → =
![Page 17: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/17.jpg)
Computational complexity +
For n(k)m(k)= m(k–1), m(0)=N1, m(K)=N2 , the parallel addition time:
There is a turning point Kp, where
is close to but less than 2p.
(1) (2) ( )(0) (1) (2) ( 1) ( )1 2
Kn n nK KN m m m m m N−= → → → =
( ) ( )(0 )( ) ( ) ( 1) ( ) ( ) 1
2 2 ( 1)1
( ) ( 1) ( ) 12 ( 1)
( ) / log log
log ( )
p
p
Kk k k k k
kPERMk
Kk k k
kk K
NT n m m m p p C p mm
Nn m mm
+ −−
=
−−
=
= − − + −
+ −
∑
∑
( ) ( )(1)( ) ( ) ( 1) ( ) ( )
2 21
( ) ( 1) ( )2
( 1) ( ) / log log
( 1) log ( )
p
p
Kk k k k k
PERMk
Kk k k
k K
T n m m m p p C p m
n m m
+ −
=
−
=
= + − − + −
+ + −
∑
∑( ) ( 1) ( )( )p p pK K Km m m− −
![Page 18: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/18.jpg)
Blocking strategy
Since the parallel computational time has a turning point (ignoring the factors like communication time)
We propose a three-phase blocking strategy
(1) (2) ( )(0) (1) (2) ( 1) ( )1 2
Kn n nK KN m m m m m N−= → → → =
if 2 :
if 2 2 :
if 2 : 1
N p N p
p N p N p
N p N
≥ → →
≤ < → →
≤ → →
![Page 19: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/19.jpg)
Computational complexity(1) (2) ( )(0) (1) (2) ( 1) ( )
1 2
Kn n nK KN m m m m m N−= → → → =
(1)
* *1 2
2* * *
2 3PERM
2*
3 4
( , ) 1 1 ( , ) 2
( , ) ( , ) 1 1 ( , ) 2 4
( , ) 5log 4
N N Nf N p p f p p pp p
N N N NT N p f N p f p p pp p
Nf N p N p
= + ⋅ − ⋅ + ≤
= = + ⋅ − + < < = ≥
( )
(1)
1 2
2
2 2 3PERM
3 4 4
( , ) 1 1 ( , ) 2
( , ) ( , ) 1 1 log ( , ) 2 4
( , ) 5log log 9 1
N N Nf N p p f p p pp p
N N N NT N p f N p C p f p p pp p
f N p N N
+ +
+ + +
+
= + ⋅ − ⋅ + ≤
= = + ⋅ − + + < <
= −2
4
Np
≥
![Page 20: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/20.jpg)
Complexity comparison(1)
*pSERM
1( , ) ( 1) NT N p Np−= +
(1) 2pSERM
1( , ) ( 1) logNT N p N C pp
+ −= + +
p Operation O(N) O(N2)
SERM(1) O(N) O(N) Multiplications
PERM(1) O(N) O(logN) SERM(1) O(NlogN) O(NlogN)
Additions PERM(1) O(N) O(log2N)
![Page 21: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/21.jpg)
PERM vs. parallel SERM
1
10
100
1000
10000
1 4 16 64 256 1024Number of Processors ( p )
Computational Com
plexity
PERM MultiplicationsPERM AdditionsSERM MultiplicationsSERM Additions
Computational complexity (N = 64, C = 1)
![Page 22: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/22.jpg)
PERM vs. parallel SERM
Relative speedup( N = 64, C = 1)
0
2
4
6
8
10
1 4 16 64 256 1024Number of Processors (p )
Speedup(PERM
/SERM
)
P ERM Multiplica tio n/SERM Multiplica tio nPERM Addition/SERM Addition
![Page 23: Matrix Factorizations for Parallel Integer Transforms](https://reader034.vdocuments.site/reader034/viewer/2022052507/628b6834a4042b6545073367/html5/thumbnails/23.jpg)
ConclusionsFor parallel computing:
Increase the degree of parallelismAccommodate more processors
For sequential computing:May be more efficient for sequential computing with special matrix computation software such as BLAS
More factorization levels possibly result in greater rounding error