exploiting symmetry in sat-based boolean matching for heterogeneous fpga technology mapping yu hu 1,...
TRANSCRIPT
Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA
Technology Mapping
Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous FPGA
Technology Mapping
Yu Hu1, Victor Shih2, Rupak Majumdar2 and Lei He1
1Electrical Engineering Dept., UCLA
2Computer Science Dept., UCLAPresented by Victor ShihPresented by Victor Shih
Address comments to [email protected] comments to [email protected] http://eda.ee.ucla.edu/publications.htmlhttp://eda.ee.ucla.edu/publications.html
OutlineOutline
Background and Motivation
Review of Standard SAT-based Boolean Matching
Proposed Improvements
Experimental Results
Conclusion and Future Work
BackgroundBackground
FPGA technology mapping Map a design into a network of Programmable
Logic Blocks (PLBs) Optimize for area, speed and/or power
PLBs containing heterogeneous devices requires Boolean matching (BM) to determine whether function fcan be implemented by hardware component H 3A.1 Design, Synthesis and Evaluation of Heterogeneous FPGA
with Mixed LUTs and Macro-Gates [Hu, ICCAD’07]
LUT4
AND2
PLB_aLUT4
PLB_b
LUT4
MUX2
LUT4
PLB_d
LUT4
LUT3
Example: Boolean Matching (BM) for PLBsExample: Boolean Matching (BM) for PLBs
Answers a Yes-No question: Can a Boolean function f be implemented in PLB p? If yes, give the configuration bits for all LUTs.
f1 = e*a + c*a + d*a + b*a
f2 = a + b + c + d + e
LUT4
AND2
x1
x2
x3
x4
x5f
PLB p
f1 = (e + c + d + b)*a
i.e.,
f1 = z*a
z = e + c + d + b
z
ec
db
a f1
L0 (0000) 0
L1 (0001) 1
L2 (0010) 1
…… …
L15(1111) 1
Motivation for SAT-Based PLB BMMotivation for SAT-Based PLB BM
Application of FPGA PLB Boolean matching Technology mapping Re-synthesis
Existing BM algorithms Decomposition-based BM lacks flexibility, i.e., the algorithm is
only applicable to selected BLE structure [Cong, TCAD’01] BDD based BM is not scalable (memory explosion) [Ciric, TCAD’03] Fast BM is hard to deal with programmable devices [Wei, ISQED’06] Improvements over Ling’s algorithm obtain 3x speedup [Sean,
DAC’06] and 10x speedup [Jason, FPGA’07]
SAT-based BM [Ling, DAC’05][Safarpour, DAC’06][Cong, FPGA’07]
Introduces extreme flexibility Provide a tradeoff between memory and runtime to deal with
complicated BLE structures Still slow, hard to be applied to complex PLBs
Review: SAT-Based Encoding for BMReview: SAT-Based Encoding for BM
Encoding non-programmable devices: Requires common/interconnect variables Is a linear time procedure
Example:
x1x2 g
x3
z1
f AND= (x2+¬z1) (x1+¬z1) (¬x2+¬x1+ z1)
f OR= (¬x3+g) (¬z1+g) (x3+z1+ ¬g)
f total= fAND fOR
= (x2+¬z1) (x1+¬z1) (¬x2+¬x1+ z1) (¬x3+g) (¬z1+g) (x3+z1+ ¬g)
Review: SAT-Based Encoding for BMReview: SAT-Based Encoding for BM
Encoding programmable devices: Configuration bits are encoded
f LUT= ( x1 + x2+ ¬L0 + z1) ( x1 + x2+ L0 + ¬ z1)
( x1 + ¬ x2+ ¬L1 + z1) ( x1 + ¬ x2+ L1 + ¬ z1)
(¬ x1 + x2+ ¬L2 + z1) (¬ x1 + x2+ L2 + ¬ z1)
(¬ x1 + ¬ x2+ ¬L3 + z1) (¬ x1 + ¬ x2+ L3 + ¬ z1)
L0
L3
L1 4-1 MUX
x1
z
LUT-2
00
11
01
L2 10
x2
LUT2 z
x1
x2
Review: SAT-Based Encoding for BMReview: SAT-Based Encoding for BM
G LUT2 =( x1 + x2+ ¬L0 + z) ( x1 + x2+ L0 + ¬ z)
( x1 + ¬ x2+ ¬L1 + z) ( x1 + ¬ x2+ L1 + ¬ z)
(¬ x1 + x2+ ¬L2 + z) (¬ x1 + x2+ L2 + ¬ z)
(¬ x1 + ¬ x2+ ¬L3 + z) (¬ x1 + ¬ x2+ L3 + ¬ z)
G AND2 = ( x3 + ¬f ) (¬ x3 + ¬f ) ( ¬x3 + ¬z + f )
G = G AND2 · G LUT2
x1x2x3f
000 0
001 0
010 1
011 0
100 1
101 1
110 1
111 1
SAT:
G expand = G[X/000, f/0, z/z0] · G[X/001, f/0, z/z1]
G[X/010, f/1, z/z2] · G[X/011, f/0, z/z3]
G[X/100, f/1, z/z4] · G[X/101, f/1, z/z5]
G[X/110, f/1, z/z6] · G[X/111, f/1, z/z7]
LUT2
AND2
x1
x2
x3f
z
Boolean function
Configuration bits are encoded as
SAT literals
The solution of this SAT problem
corresponds to the Boolean matching
results
The Problem of Input Permutations [Ling, DAC’05]The Problem of Input Permutations [Ling, DAC’05]
Virtual MUXes increase runtime
exponentially!
? ? ?
? ? ?
? ? ?
Impact of Virtual MUXesImpact of Virtual MUXes
0
2000
4000
6000
8000
10000
12000
5 6 7 8 9
input#
SA
T V
aria
ble
#
V-MUX w /o V-MUX
0
20000
40000
60000
80000
100000
120000
5 6 7 8 9
input#
Cla
use
#
V-MUX w /o V-MUX
0.001
0.01
0.1
1
10
100
5 6 7 8 9
Ru
nti
me
(s
)
input#
V-MUX w/o V-MUX
LUT4
PLB_d
LUT4
LUT3
OutlineOutline
Background and Motivation
Review of Standard SAT-based Boolean Matching
Proposed Improvements
Experimental Results
Conclusion and Future Work
Symmetries in Circuits and PLBSymmetries in Circuits and PLB
Functional Symmetries Variable a and variable b are symmetric if swapping a and b
does not change the truth table of function F(…, a, …, b, …) E.g.: F(a, b, c) = a·(b + c), where b and c are symmetric General symmetries which consider the permutations of more
than two variables can also be explored
Architectural Symmetries Structures of certain inputs of a PLB are equivalent E.g.: Inputs of the primary input LUTs of PLBs are symmetric
F(b, a, c, d)
Impact of Considering SymmetriesImpact of Considering Symmetries
The number of distinct permutations under symmetries decreases substantially Functional symmetries and architecture symmetries
independently reduce permutations by 100x
1
10
100
1000
10000
100000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Nu
mb
er o
f dis
tin
ct p
erm
uta
tio
ns
Test case - MCNC circuits
Number of distinct permutations (9-input Boolean functions, 9!=362880)
Per#F_symm Per#Arch_symm
Overall AlgorithmOverall Algorithm
Boolean function
Functional symmetry detection
Prune via architecture symmetries
Non-redundant Permutation Set (NPS)
Is NPS empty?
Pre-processing target PLB - one-time cost
Target architecture
Pre-calculate architecture symmetry patterns
Architecture symmetry
information
Characteristic function template
Generate characteristic function template
Pop permutation pExit
YN
Overall Algorithm (cont.)Overall Algorithm (cont.)
Replicate CNFs of p
Solve SAT problem
SAT?
Return implementable
Target architecture
Pre-calculate architecture symmetry patterns
Architecture symmetry
information
Characteristic function template
Generate characteristic function template
Pre-processing target PLB - one-time cost Exit
Y
Is NPS empty?
Pop permutation p
YN
N
Experimental ResultsExperimental Results
Experimental conditions: Tested with Boolean functions in MCNC circuits Target PLB was “PLB_d” Used “miniSAT1.14” as SAT solver engine
Obtained over 100x speedup compared to the standard approach [Ling’05] Additional 2x
achieved usingimplicant tablerepresentation
LUT4
PLB_d
LUT4
LUT3
0.001
0.01
0.1
1
10
100
5 6 7 8 9
Ru
nti
me
(s
)
Number of inputs of Boolean function
Ling'05 Ours Ours+Cong'07
Experimental ResultsExperimental Results
Improvement makes SAT-based Boolean matching feasible in the context of technology mapping and re-synthesis
0.1
1
10
100
1000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Ru
nti
me
(s
)
Test case
Breakdown of speedup techniques
SAT-BM F_symm_SAT-IP Arch_symm_SAT-IP SAT-IP
Other SAT Symmetry Detection TechniquesOther SAT Symmetry Detection Techniques
Shatter/Saucy - general symmetry-breaking tool for SAT (version released Sept ‘07) [Aloul, Markov, Sakallah, DAC’03] Augments SAT problem, adding symmetry breaking clauses Same SAT engine used
Problems in the Proposed AlgorithmProblems in the Proposed Algorithm
(Following improvements yet to be published)
Conjunctive normal form (CNF) clause generation dominates runtime Generally CNF generation time to SAT solution time is an order
of magnitude greater
Improvement - Iterative Clause TestingImprovement - Iterative Clause Testing
miniSAT’s incremental SAT reasoning suggests possible improvement: If any subset of clauses is unsatisfiable, the entire set is
unsatisfiable Since SAT testing is relatively inexpensive, we can iteratively
test larger and larger subsets to minimize CNF generation Asymptotically, worst-case runtime
is equivalent to non-incrementaltesting – overhead for additionaliterations is insignificant
Experimentally, this resulted inroughly 3x speedup
a + j + d
!b + e + s
t + a + !f
e + !m +
!p + !e +
g + r + !e
…
12
3
Iterations Clauses
…
Improvement – Template Clause-Set ApproachImprovement – Template Clause-Set Approach
miniSAT’s “assumptions” parameter suggests possible improvement: Facilitates parameterization of SAT literals for later resolution We can minimize clause generation time by formulating a set of
clauses which is common to all permutations to test, and adding only the clauses specific to each permutation as assumptions
Experimentally, this resulted in roughly 10x speedup Note that this approach precludes the iterative clause testing
approach
x1x2x3f'
000 f1
001 f5
010 f3
011 f7
100 f2
101 f6
110 f4
111 f8
Template Clause-Set Approach DetailTemplate Clause-Set Approach Detail
Example: Every permutation will have the same set of input values, but
each will correspond to different outputs
x1x2x3f
000 f1
001 f2
010 f3
011 f4
100 f5
101 f6
110 f7
111 f8
x1x2x3f'
000 f1
100 f2
010 f3
110 f4
001 f5
101 f6
011 f7
111 f8
f f' = f (x3, x2, x1) f'
P1 = Gexpand ∙ ¬f1 ∙ ¬ f2 ∙ f3 ∙ ¬ f4 ∙ f5 ∙ f6 ∙ f7 ∙ f8
P2 = Gexpand ∙ ¬ f1 ∙ f2 ∙ ¬ f3 ∙ f4 ∙ ¬ f5 ∙ ¬ f6 ∙ ¬ f7 ∙ f8
…
SAT:
G expand = G[X/000, f/f1, z/z0] · G[X/001, f/f2, z/z1]
G[X/010, f/f3, z/z2] · G[X/011, f/f4 , z/z3]
G[X/100, f/f5, z/z4] · G[X/101, f/f6, z/z5]
G[X/110, f/f7, z/z6] · G[X/111, f/f8, z/z7]
CNF Generation Runtime BreakdownCNF Generation Runtime Breakdown
Template clause-set implementation obtains average 10x speedup of CNF generation time
Conclusions and Future WorkConclusions and Future Work
An improvement for SAT-based Boolean matching is presented by considering functional and architectural symmetries
Over 100x speedup is obtained compared to the standard SAT-based Boolean matching approach
Future Work Integrate the improved SAT-based Boolean matcher into
heterogeneous FPGA technology mapping phase Perform architecture exploration using our improved technology
mapper
ThanksThanks
Exploiting Symmetry in SAT-Based Boolean Matching for Heterogeneous
FPGA Technology MappingYu Hu, Victor Shih, Rupak Majumdar and Lei He