factoring and eliminating common subexpressions in polynomial expressions international conference...
Post on 21-Dec-2015
225 views
TRANSCRIPT
Factoring and Eliminating Common Factoring and Eliminating Common Subexpressions in Polynomial Subexpressions in Polynomial
ExpressionsExpressions
International Conference on Computer Aided International Conference on Computer Aided Design (ICCAD), 2004Design (ICCAD), 2004
Farzan FallahFarzan Fallah
Advanced CAD ResearchAdvanced CAD Research
Fujitsu Labs. of AmericaFujitsu Labs. of America
Farzan FallahFarzan Fallah
Advanced CAD ResearchAdvanced CAD Research
Fujitsu Labs. of AmericaFujitsu Labs. of America
Anup Hosangadi Anup Hosangadi
Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB
Anup Hosangadi Anup Hosangadi
Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB
OutlineOutline
IntroductionIntroduction
Related WorkRelated Work
Algebraic techniques for redundancy Algebraic techniques for redundancy eliminationelimination
Experimental resultsExperimental results
ConclusionsConclusions
IntroductionIntroductionEmbedded systemEmbedded system applications applications need to compute polynomial need to compute polynomial expressionsexpressions– Continuous functions can be Continuous functions can be
approximated by polynomials to approximated by polynomials to desired degree of accuracy.desired degree of accuracy.
– Adaptive signal processing (Adaptive signal processing (Polynomial Polynomial filtersfilters ) )
– Polynomial interpolation/extrapolation Polynomial interpolation/extrapolation in in Computer GraphicsComputer Graphics
– EncryptionEncryption
!7!5!3)sin(
753 xxxxx
!7!5!3)sin(
753 xxxxx
IntroductionIntroductionMultiplications are expensive in Embedded Multiplications are expensive in Embedded systemssystemsNo good optimization tool for reducing No good optimization tool for reducing complexity of polynomialscomplexity of polynomials– Designers rely on Hand optimized librariesDesigners rely on Hand optimized libraries
Conventional optimization techniquesConventional optimization techniques– CSE, Value numberingCSE, Value numbering: not suited for : not suited for
polynomialspolynomials– Horner form: Horner form: most popular representationmost popular representation– aannxxnn + a + a11xxn-1n-1 + ….a + ….an-1n-1x + ax + a00 = (…((a = (…((annx + ax + an-1n-1)x + a)x + an-2n-2)x + ..a)x + ..a11)x + a)x + a00
– Not good for multivariate polynomialsNot good for multivariate polynomials– Only a single polynomial expression at a timeOnly a single polynomial expression at a time
IntroductionIntroduction
Quartic-spline polynomial (3-D graphics)Quartic-spline polynomial (3-D graphics)P = zuP = zu44 + 4avu + 4avu33 + 6bu + 6bu22vv22 + 4uv + 4uv33w + qvw + qv44
Horner form (from Horner form (from MapleMapleTMTM))P = zuP = zu44 + (4au + (4au33 + (6bu + (6bu22 + (4uw + qv)v)v)v + (4uw + qv)v)v)v
(17 multiplications)(17 multiplications)Proposed algebraic method:Proposed algebraic method: dd11 = v = v22 ; d ; d22 = d = d11*v*v
P = uP = u33(uz + ad(uz + ad22) + d) + d11( qd( qd11 + u(wd + u(wd22 + 6bu) ) + 6bu) )
(11 multiplications)(11 multiplications)
Related WorkRelated WorkExpression Expression FactorizationFactorization (M.A.Breuer JACM’69) (M.A.Breuer JACM’69) – Allows only one kind of operator at a timeAllows only one kind of operator at a time
Symbolic algebraSymbolic algebra techniques techniques (A. Peymandoust, De’Micheli DAC’01)(A. Peymandoust, De’Micheli DAC’01)
– Used for mapping DSP datapaths (polynomials) to Used for mapping DSP datapaths (polynomials) to library elementslibrary elements
– Results depend upon exponential library searchResults depend upon exponential library search eg. aeg. a22 – b – b22 = (a+b)(a-b) iff (a+b) or (a –b) is in library = (a+b)(a-b) iff (a+b) or (a –b) is in library– Manipulates only one expression at a time.Manipulates only one expression at a time.
F1 = A + B + C + D;
F2 = A + P + D;=> Extract (A + D)
Motivating ExampleMotivating Example
Consider set of expressionsConsider set of expressions
– Naïve implementation: 16 multiplications, 4 Naïve implementation: 16 multiplications, 4 additions/subtractionsadditions/subtractions
Using CSEUsing CSE
– 12 multiplications, 4 additions/subtractions12 multiplications, 4 additions/subtractions
y– x4xy P
– xyz4yz 4x P
zy xy x P
23
2
22
31
y– x4xy P
– xyz4yz 4x P
zy xy x P
23
2
22
31
xdydyd
xydzdyzd
xdzyddd
4 P
4 P
P
3133
2232
21
21211
xdydyd
xydzdyzd
xdzyddd
4 P
4 P
P
3133
2232
21
21211
Motivating ExampleMotivating Example
Using our algebraic techniquesUsing our algebraic techniques
– Total 7 multiplications, 3 additions/subtractionsTotal 7 multiplications, 3 additions/subtractions– Savings of 5 multiplications, 1 addition/subtraction Savings of 5 multiplications, 1 addition/subtraction
compared to CSEcompared to CSE
Impossible to obtain such results using Impossible to obtain such results using conventional techniquesconventional techniques
xyddd
xdzdd
yzxddxd
3323
2312
1311
P
4 - 4 P
P
xyddd
xdzdd
yzxddxd
3323
2312
1311
P
4 - 4 P
P
Introduction to algebraic techniques Introduction to algebraic techniques for redundancy eliminationfor redundancy elimination
Algebraic techniques in multi-level logic synthesis (MLLS)Algebraic techniques in multi-level logic synthesis (MLLS)– Decomposition, factoring Decomposition, factoring reduce number of literalsreduce number of literals– DistillDistill and and CondenseCondense use Rectangle Covering methods. use Rectangle Covering methods.
Polynomial Expressions (Our Technique)Polynomial Expressions (Our Technique)– Factoring, Single term common subexpressions Factoring, Single term common subexpressions reduces number reduces number
of multiplicationsof multiplications– Multiple term common subexpressions Multiple term common subexpressions reduces number of reduces number of
additions and possibly multiplicationsadditions and possibly multiplications
Key Differences (Generalization to handle higher orders)Key Differences (Generalization to handle higher orders)– Kernelling techniquesKernelling techniques– Finding single cube intersectionsFinding single cube intersections
Introduction to our techniqueIntroduction to our technique(Outline)(Outline)
Find a subset of all possible subexpressions Find a subset of all possible subexpressions (kernel generation)(kernel generation)
Transformation of Polynomial Expressions Transformation of Polynomial Expressions – Problem formulationProblem formulation
Extract multiple term common subexpressions Extract multiple term common subexpressions and factorsand factors
Extract single term common factorsExtract single term common factors
Introduction to our techniqueIntroduction to our technique
TerminologyTerminology– LiteralLiteral: A variable or a constant eg. a,b,2,3.14: A variable or a constant eg. a,b,2,3.14– CubeCube: Product of literals eg. +3a: Product of literals eg. +3a22b, -2ab, -2a33bb22cc– SOPSOP: Sum of cubes eg. +3a: Sum of cubes eg. +3a22b – 2ab – 2a33bb22cc– Cube-free expressionCube-free expression: No literal or cube can : No literal or cube can
divide all the cubes of the expressions divide all the cubes of the expressions – KernelKernel: A cube free sub-expression of an : A cube free sub-expression of an
expression, eg. 3 – 2abcexpression, eg. 3 – 2abc– Co-KernelCo-Kernel: A cube that is used to divide an : A cube that is used to divide an
expression to get a kernel, eg. aexpression to get a kernel, eg. a22bb
Introduction to our TechniqueIntroduction to our Technique
Matrix Representation of Arithmetic ExpressionsMatrix Representation of Arithmetic Expressions
– F = xF = x33y – xyy – xy22zz is represented by is represented by
– Each row represents a product termEach row represents a product term– Each column represents a variable/constantEach column represents a variable/constant– Each element (i,j) represents power of variable j in term iEach element (i,j) represents power of variable j in term i
+/-+/- xx yy zz
++ 33 11 00
-- 11 22 11
Generation of Kernels (example)Generation of Kernels (example)PP11 = x = x33y + xy + x22yy22z {L} = {x,y,z}z {L} = {x,y,z}– Divide by x: Divide by x:
FFtt = P = P11/x = x/x = x22y + xyy + xy22zz
xx yy zz
33 11 00
22 22 11
xx yy zz
22 11 00
11 22 11
Generation of Kernels (example)Generation of Kernels (example)
FFtt = P = P11/x = x/x = x22y + xyy + xy22zz
C = Biggest Cube dividing all cubes of FC = Biggest Cube dividing all cubes of Ftt
xx yy zz
22 11 00
11 22 11
1 1 0
/ C = xx yy zz
11 00 00
00 11 11
Generation of Kernels (example)Generation of Kernels (example)
Obtain Kernel:Obtain Kernel: FF11 = F = Ftt/C = (x/C = (x22y + xyy + xy22z)/(xy) = ( x + yz)z)/(xy) = ( x + yz)
Obtain Co-Kernel Obtain Co-Kernel DD11 = x*(xy) = x = x*(xy) = x22yy– No kernels within FNo kernels within F11. Go back to P. Go back to P11
PP11 = x = x33y + xy + x22yy22zz– Divide now by next variable yDivide now by next variable y
FFtt = x = x33 + x + x22yzyz– C = xC = x22
– But (x < y) But (x < y) εε C C
Stop HereStop Here, to avoid repeating same kernel F, to avoid repeating same kernel Ftt/C = (x + yz)/C = (x + yz)– No more kernels extractedNo more kernels extracted– Record kernel FRecord kernel F11 = P = P11 with co-kernel ‘1’ with co-kernel ‘1’
Concept of kernels and co-kernelsConcept of kernels and co-kernels
Theorem:Theorem: Two expressions f and g can have a multiple Two expressions f and g can have a multiple term common subexpression term common subexpression iffiff there are 2 kernels K there are 2 kernels Kff and Kand Kgg having a multiple term intersection having a multiple term intersection
Detection of multiple term common subexpressions by Detection of multiple term common subexpressions by intersection of sets of kernels.intersection of sets of kernels.
Each co-kernel : kernel pair represents a possible Each co-kernel : kernel pair represents a possible factorizationfactorization– eg. xeg. x33y + xy + x22yy22z = [xz = [x22y](x + yz)y](x + yz)
Set of kernels a subset of all possible subexpressionsSet of kernels a subset of all possible subexpressions
All Kernels and Co KernelsAll Kernels and Co Kernels
(7)2
(6)3
(5)(4)(3)2
(2)22
(1)3
1
y– x 4xy P
– xyz4yz 4x P
zy xy x P
(7)2
(6)3
(5)(4)(3)2
(2)22
(1)3
1
y– x 4xy P
– xyz4yz 4x P
zy xy x P
y](1) x-[4xy x](xy),- [4 :P
xyz](1)- 4yz [4x yz](4), [x yz](x), -[4 x](yz),-[4 :P
z](1)y xy [x y),yz](x [x :P
23
2
22321
y](1) x-[4xy x](xy),- [4 :P
xyz](1)- 4yz [4x yz](4), [x yz](x), -[4 x](yz),-[4 :P
z](1)y xy [x y),yz](x [x :P
23
2
22321
Which kernels to choose?Which kernels to choose?
Kernel Cube Matrix (KCM)Kernel Cube Matrix (KCM)
One row for each Kernel generatedOne row for each Kernel generated
One column for each distinct kernel cubeOne column for each distinct kernel cube
Each non-zero element represents a term Each non-zero element represents a term
Kernel CubesKernel Cubes
xx yzyz 44 -yz-yz -x-xCCooKKeerrnneellss
44 11(3)(3) 11(4)(4) 00 00 00
xx22yy 11(1)(1) 11(2)(2) 00 00 00
xx 00 00 11(3)(3) 11(5)(5) 00
xyxy 00 00 11(6)(6) 00 11(7)(7)
yzyz 00 00 11(4)(4) 00 11(5)(5)
x3y
Finding Kernel IntersectionsFinding Kernel Intersections(Distill Algorithm)(Distill Algorithm)
Each kernel intersection or factor appears as a Each kernel intersection or factor appears as a rectanglerectangle– RectangleRectangle: Set of rows and columns such that all : Set of rows and columns such that all
elements are ‘1’elements are ‘1’
ValueValue of a rectangle = weighted sum of the of a rectangle = weighted sum of the number of operations savednumber of operations saved
GoalGoal: Maximum valued rectangular covering of : Maximum valued rectangular covering of KCMKCM
Greedy heuristic: covering by prime rectanglesGreedy heuristic: covering by prime rectangles– Prime rectanglePrime rectangle: Rectangle not covered by : Rectangle not covered by
any other rectangleany other rectangle
Finding Kernel Intersections Finding Kernel Intersections (Distill Algorithm)(Distill Algorithm)
Formula for Value of a rectangleFormula for Value of a rectangle R = number of rows; R = number of rows;
C = number of columnsC = number of columns
M(RM(Rii) = # of multiplications in row (co-kernel) i.) = # of multiplications in row (co-kernel) i.
M(CM(Cii) = # of multiplications in column (kernel-cube) i) = # of multiplications in column (kernel-cube) i
m = ratio of weights of multiplication to additionm = ratio of weights of multiplication to addition
Value = Value =
)1C()1R(
} ))C(M()1R())R(MR(1) - (C {mC
iR
i
)1C()1R(
} ))C(M()1R())R(MR(1) - (C {mC
iR
i
Formula calculates savings in operation Formula calculates savings in operation countcount
Distill AlgorithmDistill Algorithm
Kernel CubesKernel Cubes
xx yzyz 44 -yz-yz -x-x
CCooKKeerrnneellss
44 11(3)(3) 11(4)(4) 00 00 00
xx22yy 11(1)(1) 11(2)(2) 00 00 00
xx 00 00 11(3)(3) 11(5)(5) 00
xyxy 00 00 11(6)(6) 00 11(7)(7)
yzyz 00 00 11(4)(4) 00 11(5)(5)
4x + 4yz = 4d1 d1 = (x + yz)
x3y + x2y2z = x2yd1
Saves 5 multiplications and 1 addition
Distill AlgorithmDistill Algorithm
Kernel CubesKernel Cubes
xx yzyz 44 -yz-yz -x-x
CCooKKeerrnneellss
44 11(3)(3) 11(4)(4) 00 00 00
xx22yy 11(1)(1) 11(2)(2) 00 00 00
xx 00 00 11(3)(3) 11(5)(5) 00
xyxy 00 00 11(6)(6) 00 11(7)(7)
yzyz 00 00 11(4)(4) 00 11(5)(5)
Remove covered terms
4xy – x2y = xyd2
d2 = 4 – x
Saves 2 multiplications
Distill AlgorithmDistill Algorithm
Distill algorithm exits after no more kernel Distill algorithm exits after no more kernel intersections can be foundintersections can be found
P1 = x2yd1 d1 = x + yz
P2 = 4d1 – xyz d2 = 4 - xP3 = xyd1
Can further optimize by finding single cube Can further optimize by finding single cube intersectionsintersections
Finding single cube intersections Finding single cube intersections (Condense Algorithm)(Condense Algorithm)
Need an algorithm for finding single term Need an algorithm for finding single term common subexpressionscommon subexpressions
Consider two single term expressionsConsider two single term expressions– FF11 = a = a44bb33c c
– FF22 = a = a22bb44cc22
Form Cube Variable Incidence Matrix (CIM)Form Cube Variable Incidence Matrix (CIM) aa bb cc
44 33 11
22 44 22
One row for each product term.
One column for each variable
Finding single cube intersections Finding single cube intersections (Condense algorithm)(Condense algorithm)
Each (single term) common subexpression appears as a Each (single term) common subexpression appears as a rectangle.rectangle.– RectangleRectangle: Set of rows and columns where all elements are non-: Set of rows and columns where all elements are non-
zerozero
ValueValue of a rectangle is number of multiplications saved of a rectangle is number of multiplications saved by selecting itby selecting it– C = cube corresponding to the rectangleC = cube corresponding to the rectangle Value = Rows*( (Value = Rows*( (ΣΣC[i] ) -1)C[i] ) -1)
Maximum valued rectangular covering will give minimum Maximum valued rectangular covering will give minimum number of multiplicationsnumber of multiplications
Use greedy iterative covering by prime rectanglesUse greedy iterative covering by prime rectangles
Finding single cube intersections Finding single cube intersections (Condense algorithm)(Condense algorithm)
aa bb cc
44 33 11
22 44 22
22 33 11
d1 = a2b3c
aa bb cc dd11
22 00 00 11
00 11 11 11
22 33 11 00
00 11 11 00
d2 = bc
Finding single cube intersections Finding single cube intersections (Condense algorithm)(Condense algorithm)
aa bb cc dd11 dd22
22 00 00 11 00
00 00 00 11 11
22 22 00 00 11
00 11 11 00 00
22 00 00 00 00
d3 = a2
Finding single cube intersections Finding single cube intersections (Condense algorithm)(Condense algorithm)
Final CIMFinal CIM
Final Implementation ( 7 multiplications)Final Implementation ( 7 multiplications)
dd33 = a*a = a*a
dd22 = b*c = b*c
dd11 = b*b*d = b*b*d22*d*d33
FF11 = d = d11*d*d33
FF22 = d = d11*d*d22
aa bb cc dd11 dd22 dd33
00 00 00 11 00 11
00 00 00 11 11 00
00 22 00 00 11 11
00 11 11 00 00 00
22 00 00 00 00 00
Cube Literal Matrix (Condense Cube Literal Matrix (Condense Algorithm)Algorithm)
LiteralsLiterals
Term Term ++/-/- xx yy zz 44 dd11 dd22
CCuubbeess
11 ++ 22 11 00 00 11 00
22 ++ 00 00 00 11 11 00
33 -- 11 11 11 00 00 00
44 ++ 11 11 00 00 00 11
55 ++ 11 00 00 00 00 00
66 ++ 00 11 11 00 00 00
77 ++ 00 00 00 11 00 00
88 -- 11 00 00 00 00 00
Save 2 multiplications by extracting xy
CIM for our example after Distill algorithm
Condense AlgorithmCondense Algorithm
LiteralsLiterals
Term Term ++/-/- xx yy zz 44 dd11 dd22
CCuubbeess
11 ++ 11 00 00 00 11 00
22 ++ 00 00 00 11 11 00
33 -- 00 00 11 00 00 00
44 ++ 00 00 00 00 00 11
55 ++ 11 00 00 00 00 00
66 ++ 00 11 11 00 00 00
77 ++ 00 00 00 11 00 00
88 -- 11 00 00 00 00 00
Extracting xy
No more favorable cube intersections found
Final ImplementationFinal Implementation
– Total 7 multiplications, 3 additions/subtractionsTotal 7 multiplications, 3 additions/subtractions– Savings of 5 multiplications, 1 addition/subtraction compared Savings of 5 multiplications, 1 addition/subtraction compared
to CSEto CSE
Impossible to obtain such results using conventional Impossible to obtain such results using conventional techniquestechniques
xyddd
xdzdd
yzxddxd
3323
2312
1311
P
4 - 4 P
P
xyddd
xdzdd
yzxddxd
3323
2312
1311
P
4 - 4 P
P
Optimization of sin(x)Optimization of sin(x)
KernelsKernels11 -S-S33xx22 SS55xx44 -S-S77xx66 -S-S33 SS55xx22 -S-S77xx44 SS55 -S-S77xx22
xx 11(1)(1) 11(2)(2) 11(3)(3) 11(4)(4) 00 00 00 00 00
xx33 00 00 00 00 11(2)(2) 11(3)(3) 11(4)(4) 00 00
xx55 00 00 00 00 00 00 00 11(3)(3) 11(4)(4)
77
55
33 xSxSxS - x )xsin( 7
75
53
3 xSxSxS - x )xsin( Sin (x) = x + x3(-S3 + S5x2 – S7x4)
Saves 6 multiplications
Optimization of sin(x)Optimization of sin(x)
Final Implementation:Final Implementation: X = x*xX = x*x
Sin(x) = x*(1 + (-SSin(x) = x*(1 + (-S33 + (S + (S55 + S + S77*X)*X) ) *X)*X)*X) ) *X)
– Total 5 multiplications and 3 additions/subtractionsTotal 5 multiplications and 3 additions/subtractions
SAME AS GNU C HAND optimized formSAME AS GNU C HAND optimized form
KernelsKernels
11 xx22dd11 SS55 -S-S77xx22
xx 11(1)(1) 11(2)(2) 00 00
xx22 00 00 11(4)(4) 11(5)(5)
Experimental Setup Experimental Setup (Sequential processor)(Sequential processor)
Signal processing and multimedia applicationsSignal processing and multimedia applications– MP3 decoder, Mesa (graphics), Adaptive filter, FFT, MP3 decoder, Mesa (graphics), Adaptive filter, FFT,
FIRFIR– Taylor series approximation of trigonometric functionsTaylor series approximation of trigonometric functions– Optimizations on arithmetic subgraphs from Dataflow Optimizations on arithmetic subgraphs from Dataflow
graphs (DFGs)graphs (DFGs)
Polynomials from computer graphicsPolynomials from computer graphics– Multivariate polynomial approximationMultivariate polynomial approximation
Compared number of operations with CSE and Compared number of operations with CSE and Horner formHorner formEstimated savings in clock cycles on ARM coreEstimated savings in clock cycles on ARM core
ApplicationApplication FunctionFunction UnoptimizedUnoptimized CSECSE HornerHorner Our techniqueOur technique
AA MM AA MM AA MM AA MM
MP3 decoderMP3 decoder hwin_inithwin_init 8080 260260 7272 162162 8080 110110 6464 8686
MP3 decoderMP3 decoder imdctimdct 6363 189189 6363 108108 6363 9090 6363 5454
MesaMesa gl_rotationgl_rotation 1010 9292 1010 3434 1010 3737 1010 1515
Adaptive filterAdaptive filter LMSLMS 3535 130130 3535 8585 3535 5555 3535 4040
Gaussian Gaussian noise filternoise filter
FIRFIR 3636 224224 3636 143143 3636 8989 3636 6363
Fast Fast convolutionconvolution
FFTFFT 4545 194194 4545 112112 4545 8383 4545 5656
GraphicsGraphics quartic-splinequartic-spline 44 2323 44 1717 44 2020 44 1414
GraphicsGraphics quintic-splinequintic-spline 55 3434 55 2222 55 2323 55 1616
GraphicsGraphics chebyshevchebyshev 88 3232 88 1818 88 1818 88 1111
GraphicsGraphics cos-waveletcos-wavelet 1717 4343 1717 2424 1717 1919 1515 1717
AverageAverage 30.330.3 122122 29.529.5 72.572.5 30.330.3 54.454.4 28.528.5 37.237.2
Experimental results (comparing number of Experimental results (comparing number of operations from different methods)operations from different methods)
Average run time = 0.45s for our technique
Experimental results (Improvement over Experimental results (Improvement over CSE and Horner)CSE and Horner)
ApplicationApplication FunctionFunction Over CSEOver CSE Over HornerOver Horner
MM Clock cycles Clock cycles on ARM 7on ARM 7
MM Clock cycles Clock cycles on ARM 7on ARM 7
MP3 decoderMP3 decoder hwin_inithwin_init 46.9%46.9% 44.0%44.0% 21.8%21.8% 21.6%21.6%
MP3 decoderMP3 decoder imdctimdct 50.0%50.0% 44.7%44.7% 40.0%40.0% 35.1%35.1%
MesaMesa gl_rotationgl_rotation 55.9%55.9% 52.8%52.8% 59.5%59.5% 56.4%56.4%
Adaptive filterAdaptive filter LMSLMS 52.9%52.9% 48.9%48.9% 27.3%27.3% 24.2%24.2%
Gaussian noise Gaussian noise filterfilter
FIRFIR 55.9%55.9% 53.3%53.3% 29.2%29.2% 27.0%27.0%
Fast convolutionFast convolution FFTFFT 50.0%50.0% 46.3%46.3% 32.5%32.5% 29.3%29.3%
GraphicsGraphics quartic-quartic-splinespline
17.6%17.6% 16.8%16.8% 30.0%30.0% 28.8%28.8%
GraphicsGraphics quintic-splinequintic-spline 27.3%27.3% 26.1%26.1% 30.4%30.4% 29.2%29.2%
GraphicsGraphics chebyshevchebyshev 38.9%38.9% 35.7%35.7% 38.9%38.9% 35.7%35.7%
GraphicsGraphics cos-waveletcos-wavelet 29.2%29.2% 27.0%27.0% 10.5%10.5% 10.7%10.7%
AverageAverage 42.5%42.5% 39.6%39.6% 32.0%32.0% 29.8%29.8%
ConclusionsConclusions
Development of new algebraic technique Development of new algebraic technique for optimizing polynomial expressions.for optimizing polynomial expressions.
Currently used for minimizing number of Currently used for minimizing number of arithmetic operations using greedy arithmetic operations using greedy rectangular coveringrectangular covering
Results better than conventional Results better than conventional techniquestechniques
Future WorkFuture Work
Develop and implement optimal algorithms Develop and implement optimal algorithms to compare results with our greedy to compare results with our greedy heuristic.heuristic.
Optimization for delay, energy.Optimization for delay, energy.
Integrate our technique with conventional Integrate our technique with conventional compiler optimization pass to measure compiler optimization pass to measure impact on the whole application.impact on the whole application.
Thank YouThank You
Questions ??Questions ??
Extra slidesExtra slides
Finding Kernel IntersectionsFinding Kernel Intersections(Distill Algorithm)(Distill Algorithm)
Worst case scenario for Distill algorithmWorst case scenario for Distill algorithm
Number of prime rectangles exponential in number of rows/columnsNumber of prime rectangles exponential in number of rows/columns– Heuristic methods to find best prime rectangleHeuristic methods to find best prime rectangle– In practice polynomial expressions are not so largeIn practice polynomial expressions are not so large
11 11 11 11
11 11 11 11
11 11 11 11
11 11 11 11
11 11 11 11