variable latency speculative addition: a new paradigm for arithmetic circuit design
DESCRIPTION
csda. csda. Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit Design. Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for Advanced Digital Systems (CSDA) Ecole Polytechnique Fédérale de Lausanne (EPFL). - PowerPoint PPT PresentationTRANSCRIPT
Ajay K. Verma, Philip Brisk and Paolo Ienne
Processor Architecture Laboratory (LAP)& Centre for Advanced Digital Systems (CSDA)
Ecole Polytechnique Fédérale de Lausanne (EPFL)
csda
csda
Variable Latency Speculative Addition: Variable Latency Speculative Addition: A New Paradigm for Arithmetic Circuit A New Paradigm for Arithmetic Circuit
DesignDesign
2
Do We Always Need 100% Do We Always Need 100% AccuracyAccuracy
Ariane 5 explosion, 96 Patriot missile failure, 91
Cryptography attacks
√
X
√
3
Ciphertext-Only Attacks (1 of 2)Ciphertext-Only Attacks (1 of 2)
Guess a key
Decryption
Frequencyanalysis
Ciphertext
Yes
No
4
Ciphertext-Only Attacks (2 of 2)Ciphertext-Only Attacks (2 of 2) Speeding up decryption process will allow
Large amount of ciphertext to decipher More key guesses
Error in the decryption of a few blocks will NOT Affect the frequencies of characters significantly Reduce the efficacy of attack
Use of extremely fast, almost correct arithmetic components is desirable
5
Our ContributionOur Contribution Almost Correct Adder (ACA)
Exponentially faster compared to fastest reliable adder Produces correct result in 99.99% cases
Trade-off between delay and error-precision
Variable Latency Speculative Adder (VLSA) For a processor which allows variable latency
instructions Uses ACA as a component Always produces correct result Extremely fast in more than 99.99% cases
6
OutlineOutline Related work Main Idea
Limited carry propagation occurs in most cases Design of the ACA
Delay optimal design with minimal area Design of the VLSA
Error detection and recovery of ACA Results Extension to other arithmetic components
Parallel counters, multipliers etc. Conclusions
7
Related WorkRelated Work Design of optimal adders with respect to different metrics
Delay and area: Ripple carry adder, Carry lookahead adder, Prefix adder etc.
Maximum fanout, wiretrack: Kogge-Stone adder, Brent-Kung adder, Knowles adders
Generation of all Pareto-optimal prefix adders [Liu07]
Probabilistic arithmetic component Probabilistic arithmetic component to save energy [George06] Razor: circuit level correction for low power operations
[Ernst05] Error detection and correction due to reduction in power
supply voltage [Hegde01] Asynchronous speculative adder [Nowick96, Nowick97]
8
Recurrence for A Typical AdderRecurrence for A Typical Addera15 a14 a13 a12 a1 a0
b15 b14 b13 b12 b1 b0
s15 s14 s13 s12 s1 s0
gi = ai bi
pi = ai bi
ki = ai + bisi = ai bi ci-1
ci = 0 if ki = 11 if gi = 1ci-1 if pi = 1
ci
ci-1
genkill
ci
ci-1
prop
X
9
Main Idea: Limited Carry Main Idea: Limited Carry PropagationPropagation
gen
X
gen
X
prop prop prop kill
X
10
Longest Sequence of PropagatesLongest Sequence of Propagates Longest sequence of propagates
Longest run of 1’s in the XOR of input integers (A B) Longest run of heads in tossing a coin n times
Tk = Tk-1 + average number of steps to advance from k-1 to k
Tk = Tk-1 +1 + (1 + Tk)
2 Tk = 2k+1 - 2
11
Probabilistic Bounds on The Longest Probabilistic Bounds on The Longest Sequence of PropagatesSequence of Propagates
An (x) = number of instances in n-bit addition, where longest sequence of propagates is bounded by x
An (x) = 22n if n ≤ x
2n (An-1 (x) + An-2 (x) + … + An-x-1 (x)) otherwise
Bitwidth Longest sequence of propagates with 99%
probability
Longest sequence of propagates with 99.99%
probability64 11 17
128 12 18256 13 20512 14 211024 15 222048 16 23
12
A Primitive Design of ACA (1 of 2) A Primitive Design of ACA (1 of 2)
13
A Primitive Design of ACA (2 of 2)A Primitive Design of ACA (2 of 2)
ADDA [5, 0]B [5, 0]
S [0]
S [5]
ADDA [6, 1]B [6, 1] S [6]
ADDA [7, 2]B [7, 2] S [7]
ADDA [19, 14]B [19, 14] S [19]
Large area overhead due to the multitude
of small adders
14
Area Overhead in ACA (1 of 2)Area Overhead in ACA (1 of 2)a15 a14 a13 a12 a1 a0
b15 b14 b13 b12 b1 b0
p, g (15, 0)p, g (14, 0)
bitposition
15
Area Overhead in ACA (2 of 2)Area Overhead in ACA (2 of 2)
Step 1: compute the (p, g) for any group of two consecutive bit positions Step 2: compute the (p, g) for any group of four consecutive bit positions Final step: combine the computed (p, g)’s to compute the (p, g) for any group
of k consecutive bi positions
A slightly more complicated design can be used to further reduce the hardware area
16
OutlineOutline Related work Main Idea
Limited carry propagation occurs in most cases Design of the ACA
Delay optimal design with minimal area Design of the VLSA
Error detection and recovery of ACA Results Extension to other arithmetic components
Parallel counters, multipliers etc. Conclusions
17
Error DetectionError Detection Error occurs if there is a long chain of propagates
ER = ∑ pi pi+1 … pi+k
Delay of error detection Higher than the delay of an ACA Smaller than the delay of a traditional adder Experimentally 2/3 of the delay of a traditional adder
18
Error RecoveryError Recovery
Significant amount of ACA computation can be used for the computation of correct addition in error recovery
19
Variable Latency Speculative Variable Latency Speculative AdderAdder
20
Example of VLSA ComputationExample of VLSA Computation
21
Experimental SetupExperimental Setup
Input N (bitwidth)
Traditional fast adder(Prefix adder)
Almost correct adder(ACA)
Error detection
ACA + error recovery(VLSA)
Logic synthesis
Synopsis Design Compiler - compile_ultra - minimize delay
Artisan Standard CellsUMC (0.18µm)
22
ResultsResults
Average delay of VLSA = 0.70 x delay of traditional adderDelay of ACA = 0.52 x delay of traditional adder
23
ConclusionsConclusions We have presented an exponentially fast adder that
works correctly in more than 99.99% cases
We have also presented the reliable version of above adder that works correctly in all case, and Is extremely fast in more than 99.99% cases Has almost the same delay as traditional adder in
other cases
An extension for the similar approach for other arithmetic components is desirable
24
Future Work: Can We Have A Fast Future Work: Can We Have A Fast Almost Correct (Counter/Multiplier)Almost Correct (Counter/Multiplier)
1
1
11
1
1 11
0
00
0
0
0
1
Ex [path number] = sum of bitsOutput = path number = 1001
00011001110111011101010110011001
1001000
Var [path number] = high
Since each output bit depends on each input bit equally,one cannot discard some input bits in the computation of an output bit
25
Future Work: Few Most Significant Future Work: Few Most Significant Bits in MultiplierBits in Multiplier
1001 01101101 1001x
0111 1111 0010 0110
10011101x
0111 0101
Even if we ignore the lower half bits of two inputs, most significant (log n) bits of output will remain same with high probability