h5h4, h5e7 fixed point arithmetic lecture 2 (used to be 5)iverbauw/courses/h05h4/... · 2009. 3....
TRANSCRIPT
-
Page 1
1
H5H4, H5E7 H5H4, H5E7 Fixed point arithmeticFixed point arithmetic
Lecture 2 (used to be 5)Lecture 2 (used to be 5)I. VerbauwhedeAcknowledgements:
H. DeMan, V. Öwall, D. Hwang,2008-2009
K.U.Leuven
2
OverviewOverview
Lecture 1: what is a system-on-chipLecture 2: terminology for the different stepsLecture 3: models of computations, SDFGLecture 4: control flowLecture 5 – today : fixed point refinement
because we need it for exercises
-
Page 2
3
Lecture 3: invited lectureLecture 3: invited lecture
Friday Feb. 27, 10.30u to 12.30u, room 00.62Prof. Çetin Kaya Koç (University of Santa Barbara,
CA), “A brief history of cryptographic hardware design”
4
H5H4 goal: Skiing down a mountainH5H4 goal: Skiing down a mountain
Specification
ASIC SpecialPurposeRetargetablecoprocessor
DSPprocessor
DSP-RISC RISC
Algorithm Transformations
Memory Transformations and Optimizations
Floating-point to Fixed-point
SPW, Matlab, C++
pipelining, unrolling
loop merging, compaction
40 bit accumulator
-
Page 3
5
ReferencesReferences
P. Lapsley, et al., “DSP Processor fundamentals: Architectures and features,” IEEE Press, 1997, Chapter 3.
W. Sung, K. Kum, “Simulation-based Word-Length Optimization Method for Fixed-point Digital Signal processing systems,” IEEE Trans. On Signal Proc. Vol. 43, No. 12, Dec. 1995.
Viktor Öwall, Dept. of Electroscience, Lund Sweden -www.es.lth.se/ugradcourses/DSPDesign/
M. Ercegovac, T. Lang, “Digital Arithmetic,” Kaufmann Publishers, 2004.
Fridge project: http://www.ert.rwth-aachen.de/Projekte/Tools/FRIDGE/fridge.html
6
DSP applicationshigh speedminimum arealow power
**
3 bytes (mantissa)3 bytes (mantissa)+ 1 byte (exponent)+ 1 byte (exponent)
Fixed-point refinement
88
**661414
Finite word lengths: a must for DSPFinite word lengths: a must for DSP
Floating-point– powerful– expensive (storage & ops)
-
Page 4
7
Example: Failure of Patriot Missile (1991 Feb. 25)
Source http://www.math.psu.edu/dna/455.f96/disasters.html
American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept incoming Iraqi Scud missile The Scud struck an American Army barracks, killing 28
Cause, per GAO/IMTEC-92-26 report: “software problem” (inaccurate calculation of the time since boot)
Specifics of the problem: time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to get the time in seconds. Internal registers were 24 bits wide 1/10 = 0.0001 1001 1001 1001 1001 100 (chopped to 24 b) Error ≅ 0.1100 1100 × 2–23 ≅ 9.5 × 10–8
Error in 100-hr operation period≅ 9.5 × 10–8 × 100 × 60 × 60 × 10 = 0.34 sDistance traveled by Scud = (0.34 s) × (1676 m/s) ≅ 570 m
This put the Scud outside the Patriot’s “range gate” Ironically, the fact that the bad time calculation had been improved in some (but not all) code parts contributed to the problem, since it meant that inaccuracies did not cancel out
Consequences of Bad Consequences of Bad UseUse of of ApproximationsApproximations
8
Example: Explosion of Ariane Rocket (1996 June 4)
Source http://www.math.psu.edu/dna/455.f96/disasters.html
Unmanned Ariane 5 rocket launched by the European Space Agency veered off its flight path, broke up, and exploded only 30 seconds after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its 1st voyage after a decade of development costing $7 billion
Cause: “software error in the inertial reference system”
Specifics of the problem: a 64 bit floating point number relating to the horizontal velocity of the rocket was being converted to a 16 bit signed integer
An SRI* software exception arose during conversion because the 64-bit floating point number had a value greater than what could be represented by a 16-bit signed integer (max 32 767)
Consequences of Bad Consequences of Bad ApproximationsApproximations
-
Page 5
9
OutlineOutline
• Number representation• Location of decimal point• Precision• Dynamic range• Truncation, rounding• Overflow
10
Binary numbers, unsigned integersBinary numbers, unsigned integers
22 21 200 0 0 (0)0 0 1 (1)0 1 0 (2)0 1 1 (3)1 0 0 (4)1 0 1 (5)1 1 0 (6)1 1 1 (7)
MSB =Most Significant Bit
LSB =Least Significant Bit
N bits
2N
[V. Öwall]
-
Page 6
11
Dynamic range and ResolutionDynamic range and Resolution
Nr. of Nr. of Resolution Dynamic Rangebits levels Vfs=0.5V VLSB=0.03125
4 16 0.03125V 0.5V
8 256 2mV 8V
12 4096 0.12mV 128V
16 65 536 7.6μV 2042V
How do we use the bits?Depends on the application!
[V. Öwall]
12
Number RepresentationNumber Representation
•Unsigned numbers•Signed digit numbers
•Sign magnitude•One’s complement•Two’s complement
Notation: with W = K + LW = wordlengthL = number of bits behind decimal (or binary) point
-
Page 7
13
SignedSigned--Digit RepresentationsDigit Representations
Representations– 1) Signed-Magnitude: redundant – 2) Biased: non-redundant– 3) Complement
» A) Radix Complement (r=2 “two's complement”)– non-redundant
» B) Digit Complement or Diminished-Radix Complement (r=2 “one's complement”)
– redundant
Redundant two representations for same numberNon-redundant each representation is different
number
14
Sign MagnitudeSign Magnitude
Unsigned numbers with a sign-bit
- Two Zeros
+ Low Power?SignedMagnitude
000
001
011101
111
110 010
100
0
-10
3
2
1-3
-2
[V. Öwall]
-
Page 8
15
One’s ComplementOne’s Complement
Signed numbers by inverting (Complement)
- Two Zeros
+ Easy to convert to Negative
One'sComplement
000
001
011101
111
110 010
100
0
-1
0
3
2
1
-3-2
[V. Öwall]
16
Two’s ComplementTwo’s Complement
Complement + LSB
+ One Zero
+ Easy Addition
- Not so easy to convert to Neg.
Two'sComplement
000
001
011101
111
110 010
100
0-1
- 43
2
1
-3
-2
Most widely used fixed point numbering system
[V. Öwall]
-
Page 9
17
7 0111 1111 0111 01116 0110 1110 0110 01105 0101 1101 0101 01014 0100 1100 0100 01003 0011 1011 0011 00112 0010 1010 0010 00101 0001 1001 0001 00010 0000 1000 0000 0000-0 1000 1111-1 1001 0111 1111 1110-2 1010 0110 1110 1101-3 1011 0101 1101 1100-4 1100 0100 1100 1011-5 1101 0011 1011 1010-6 1110 0010 1010 1001-7 1111 0001 1001 1000-8 0000 1000
Signed-magnitude Biased
Two’s complement
One’s complement
18
2233
22 22 22 22 22i.2i.222 11 00 --11 --22 --33
WWLL
MSB=WMSB=W--LL LSB=LLSB=L
Position of decimal pointPosition of decimal point
Total number of bits WFractional bits L
• Value representation• 2’s complement (i=-1)• unsigned (i=1)
How do you store this decimal point?
-
Page 10
19
Fixed point for DSP processorsFixed point for DSP processors
Simple binary integer (two’s complement)
Simple binary fractional representation
2266
22 22 22 22 222255 44 33 22 11 00
WW
MSB=WMSB=W LSB=0LSB=0
77
--22SignbitSignbit
22--11
22 22 22 22 2222--22 --33 --44 --55 --66 --77
WW
MSB=WMSB=W LSB=L=WLSB=L=W--11
00
--22SignbitSignbit
Values between [-1,1[
20
Mantissa representationMantissa representation
• Mantissa: e.g. 24 bit– One sign bit– Mantissa bit = 1 (always!) [-1, -2] and [+1, +2]
• Exponent: e.g. 8 bit• Value = Mantisse x 2exponent
2200
22 22 22 22 2222--11 --22 --33 --44 --55 --66
WW
MSB=WMSB=W LSB=LLSB=L
11
--22SignbitSignbit
22 22 22 2233 22 11 00
-
Page 11
21
PrecisionPrecision
• Quantization error = error when a longer numeric format is converted to a shorter one
• E.g.: round 1.325 to 1.33, error = 0.005
Maximum precision (in bits) = log2 (|maximum value| / |max quantization error|)
• E.g.: 16 bit fractional representation• max value = -1, max error = 2-16 (with rounding)
maximum precision = 16 bits
Importance of scaling!!
22
Dynamic rangeDynamic range
• Dynamic range = largest number / smallest numberin a given data format
• E.g. 32 bit fractional valueratio = (1- 2-31) / 2-31 = 2+31 = 2.15 109 = 187 dB
• Telecom: 50 dB,• High End Audio: 90dB +
• DSP processors: provide a few more bits than the dynamic range requires
Scaling !!
-
Page 12
23
RoundingRounding
24
How do we quantize?How do we quantize?
floorfloor
fxpfxp
flpflp
roundround
fxpfxp
flpflp
MagnitudeMagnitudetruncatetruncate
fxpfxp
flpflp
ceilceil
fxpfxp
flpflp
CheapNasty
BestExpensive
Sign-MagnitudeUnusual
2-compltruncate⎣x⎦
⎡x⎤
-
Page 13
25
RoundingRounding
Rounding occurs when we want to approximate a more precise number (i.e. more fractional bits L) with a less precise number (i.e. fewer fractional bits L')
Example 1: “down”– old: 000110.11010001 (K=6, L=8)– new: 000110.11 (K'=6, L'=2)
Example 2: “up”– old: 000110.11010001 (K=6, L=8)– new: 000111. (K'=6, L'=0)
The following show rounding from L>0 fractional bits to L'=0 bits, but the mathematics hold true for any L' < L
Usually, keep the number of integral bits the same K'=K
26
Rounding EquationRounding Equation
y = round(x)
Fractional partWhole part
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l yk–1yk–2 . . . y1y0Round
-
Page 14
27
Rounding TechniquesRounding Techniques
Different rounding techniques:– 1) truncation
» results in round towards zero in signed magnitude
» results in round towards -∞ in two's complement– 2) round to nearest number– 3) round to nearest even number (or odd number)– 4) round towards +∞
Other rounding techniques– 5) jamming or von Neumann– 6) ROM rounding
Each will differ in their error depending on representation of numbers i.e. signed magnitude versus two's complement– Error = round(x) – x
28
1) Truncation1) Truncation
Truncation in signed-magnitude results in a number chop(x) that is always of smaller magnitude than x. This is called round towards zero or inward rounding– 011.10 (3.5)10 011 (3)10
» Error = -0.5– 111.10 (-3.5)10 111 (-3)10
» Error = +0.5Truncation in two's complement results in a number chop(x) that is always
smaller than x. This is called round towards -∞ or downward-directed rounding– 011.10 (3.5)10 011 (3)10
» Error = -0.5– 100.10 (-3.5)10 100 (-4)10
» Error = -0.5
The simplest possible rounding scheme: chopping or truncation
xk–1xk–2 . . . x1x0 . x–1x–2 . . . x–l xk–1xk–2 . . . x1x0trunculp
-
Page 15
29
Truncation Function Graph: Truncation Function Graph: chop(xchop(x))
Fig. 17.5 Truncation or chopping of a signed-magnitude number (same as round toward 0).
Fig. 17.6 Truncation or chopping of a 2’s-complement number (same as round to -∞).
chop(x)
–4
–3
–2
–1
x–4 –3 –2 –1 4321
4
3
2
1
chop(x)
–4
–3
–2
–1
x–4 –3 –2 –1 4321
4
3
2
1
30
Bias in two's complement truncationBias in two's complement truncation
0-3101-3101.00
-0.75-4100-3.25100.11
-0.5-4100-3.5100.10
-0.25-4100-3.75100.01
-0.7530113.75011.11
-0.530113.5011.10
-0.2530113.25011.01
030113011.00
Error(decimal)
chop(x)(decimal)
chop(x) (binary)
X (decimal)
X (binary)
Assuming all combinations of positive and negative values of x equally possible, average error is -0.375
In general, average error = (2-L'-2-L )/2, where L' = new number of fractional bits
-
Page 16
31
Implementation truncation in Implementation truncation in hardwarehardware
Easy, just ignore (i.e. truncate) the fractional digits from L to L'+1
xk-1 xk-2 .. x1 x0. x-1 x-2 .. x-L
= yk-1 yk-2 .. y1 y0.
ignore (i.e. truncate the rest)
32
2) Round to nearest number2) Round to nearest number
Rounding to nearest number what we normally think of when say round
rtn in two's complement – 010.10 (2.5)10 011 (3)10
» Error = +0.5– 101.10 (-2.5)10 110 (-2)10
» Error = -0.5
-
Page 17
33
Round to Nearest Function Graph: Round to Nearest Function Graph: rtn(xrtn(x))
rtn(x)
–4
–3
–2
–1
x–4 –3 –2 –1 4 3 2 1
4
3
2
1
34
Bias in two's complement round to Bias in two's complement round to nearestnearest
0-2110-2110.00+0.25-2110-2.25101.11+0.5-2110-2.5101.10-0.25-3101-2.75101.01+0.2530112.75010.11+0.530112.5010.10-0.2520102.25010.01
020102010.00
Error(decimal)
rtn(x)(decimal)
rtn(x) (binary)
X (decimal)X (binary)
All combinations of positive and negative values of x equally possible, average error is +0.125
– Smaller average error than truncation, but still not symmetric error– We have a problem with the midway value, i.e. exactly at 2.5 or -2.5 leads to positive
error bias alwaysOverflow problem: if only allocate K' = K integral bits
– Example: rtn(011.10) overflow– This overflow only occurs on positive numbers near the maximum positive value, not on
negative numbers
-
Page 18
35
Truncation and roundingTruncation and rounding
Truncation: cheapest but introduces “bias”E.g.: use 0011 = 3 0011.1 = 3.5 truncates to 31100 = -4 1100.1 = -3.5 truncates to -4Always a smaller number
Rounding: “round to the nearest”Simple hardware trick: add 1/2 of the smallest number and truncateE.g.: use 0011 = 3 0011.1 = 3.5 rounds to 4
1100.1 = -3.5 rounds to -3
How in hardware?
36
RoundingRounding
Rounding to the nearest: still bias for numbers exactly half way
More expensive: “convergent” rounding
aa66
aa aa aa aa aaaa55 44 33 22 11 0077
aaSignbitSignbit
bb22
bbbb11 0033
bb
If a3:a0 > 1000
b3:b0 = a7:a4 + a3
If a3:a0 < 1000
b3:b0 = a7:a4 + a3
If a3:a0 = 1000
b3:b0 = a7:a4 + a4
SignbitSignbit
-
Page 19
37
OverflowOverflow
38
What happens on an overflow?What happens on an overflow?
wrap-around saturation
flp flpfxp fxp
max. value
-
Page 20
39
Adding Two's Complement Numbers: Adding Two's Complement Numbers: Ignoring OverflowIgnoring Overflow
Ignoring overflow, adding a K.L two's complement number to a K.L binary unsigned number results in a K.L numberExample: 0111.01 + 1000.00 +
0110.10 = 1001.00 =01101.11 10001.00
Ignore cK
Adding 7.25 + 6.5 results in -2.25: must add 2^K = 16 to get correct result (13.75)
Adding -8 + -7 results in +1: must add -2^K = -16 to get correct result
Ignore cK
40
Two's Complement Wraparound Two's Complement Wraparound PropertyProperty
Temporary wraparounds are fine as long as final value is in the correct dynamic range:– Example: add (-8 + -6) + 7 = -7– 1000 + 1010 = 0010
» Should be (-14)10 not (+2)10 wraparound/overflow– 0010 + 0111 = 1001
» Final result is correct: (-7)10» If final result guaranteed to be in the correct dynamic
range [-8,+7] then intermediate wraparounds are fine
-
Page 21
41
To avoid overflow, adding a binary two's complement number to a two's complement number results in a number. To compute, sign extend MSB, ignore cK+1Example: 00111.01 +
00110.10 =001101.11
Adding Two's Complement Numbers: Adding Two's Complement Numbers: Avoiding or Detecting OverflowAvoiding or Detecting Overflow
If result is confined to a number, need overflow detection, which is the cK xor cK-1Example: 0111.01 +
0110.10 =01101.11
cK XOR cK-1 indicates overflow
Ignore cK+1
K=4, L=2
42
Ignoring overflow, subtracting a two's complement number from a two's complement number results in a numberExample: 1
0111 - 0111 +1000 0111 =
01111
Subtracting Two's Complement Subtracting Two's Complement Numbers: Ignoring OverflowNumbers: Ignoring Overflow
7 – (-8) resulted in -1– A wraparound/overflow occured– Must add 2^K=2^4=16 to get correct value of +15
Again we see the modulo effect– As with addition, temporary wraparounds are okay as long as final
result is in correct dynamic range
Ignore cK
-
Page 22
43
If result is confined to a K.L number, need overflow detection, which is the cK xor cK-1Example: 1
0111.01 - 0111.01 +1000.00 0111.11 =
01111.01
To avoid overflow, subtracting a two's complement number from a two's complement number results in a numberExample: 1
0111.01 - 00111.01 +1000.00 00111.11 =
001111.01
Subtracting Two's Complement Numbers: Subtracting Two's Complement Numbers: Avoiding or Detecting OverflowAvoiding or Detecting Overflow
cK XOR cK-1 indicates overflow
Ignore cK+1
44
Negating a Two's Complement Negating a Two's Complement NumberNumber
Negating a K.L two's complement number usually only requires a K.L digit result. The only exception is when you negate the largest negative number, and you need a K.(L+1) digit result.
» - 0111 = 1001» - 1000 = 01000 need extra bit to negate largest negative
number
Again overflow detection needed
-
Page 23
45
OutlineOutline
• Number representation• Location of decimal point• Precision• Dynamic range• Truncation, rounding• Overflow• Now: what to do?
46
The Wordlength, i.e. nr of bitsThe Wordlength, i.e. nr of bits
D D Dx(n)
h0 h3h2h1
y(n)
Every extra bit costs• energy/power• delay• area
• the word length has to be reduced
UMTS-filter
7bits
float
[V. Öwall]
-
Page 24
47
The Wordlength, i.e. nr of bitsThe Wordlength, i.e. nr of bitsD D Dx(n)
h0 h3h2h1
y(n)
The output of• adder output needs an extra bit to be sure of no overflow, e.g. 2+2 = 4 ⇒ 10+10=100
• multiplier MxN bits ⇒ M+N bits for full precision
⇒ Precision has to be limited
[V. Öwall]
48
FloatingFloating--pointpointalgorithmalgorithmADAD
88 77
**
**++
????
????
????
During design: During design: specifyspecify fixedfixed--point formats for point formats for signalssignals
W,L,Q
System context
System context
coefficients
data
-
Page 25
49
FixedFixed--point refinement: optimization problempoint refinement: optimization problem
Minimize overall cost:– minimal word lengths– truncate and wrap-around
MSB determination:– goal: avoid unwanted overflows– method: find min, max signal values– result: MSB position, value
representation, overflow behaviour
LSB determination:– goal: keep “required” precision– method: evaluate difference
between flp and fxp behavior– result: LSB position, quantization
safe rangesafe range
quantizationquantization
t
t
cost
50
1.MSB determination: range calculations1.MSB determination: range calculations
* +
d
m
x
c y
rangerangeinfoinfo
rangerangecalc.calc.
Analytical methodPut range (min, max) on inputs, statesPropagate range over the operatorsThis gives a save(pessimistic) estimate
-
Page 26
51
Word length propagationWord length propagation
Range propagation translates to word length growthE.g. Two’s complement integer addition A + B
A and B represented by A + B needs A – B needs
In general:A is represented by , B by , A + B needs
Get’s more complicated for multiplication
52
Range calculations grows unbounded?Range calculations grows unbounded?
*
+
a
-
Page 27
53
* +
d
m
x
c ystimuli
stimuli
?min, maxq1
q2
Alternative: Collect signal statistics Alternative: Collect signal statistics during simulationsduring simulations
Perform simulation with realistic stimuli.Collect minimum and maximum value on each signal during the
simulationThis gives an optimistic, stimuli dependent estimate
54
signal statistic range propagationname min max MSB1 min max MSB2
signal1 -1.5 1.6 2 -1.9 1.9 2signal2 -1.3 1.4 2 -2.1 2.1 3signal3 -1.2 1.2 2 -22.0 22.0 6
Combine both methods for accurate Combine both methods for accurate MSB determinationMSB determination
If MSB1 == MSB2: wrap-around(MSB1)If MSB1 < MSB2: choose saturate(MSB1) or wrap-around(MSB2)If MSB1
-
Page 28
55
Transform DFG for cheaper solutionTransform DFG for cheaper solution
Scaling by moving multiplications or shifters over operators, use commutativity, associativity, distributivity(check accuracy!)
Need to verify also LSB behavior
162-420
16
16
20
20
++16
2-4
16
16
16
16
++2-4
56
QQ ++
B bitsinput output outputinput
noise
2. Quantization effects can be modeled 2. Quantization effects can be modeled as additive noise (LSB)as additive noise (LSB)
Quantization noise is approximated by a statistical model with the following assumptions:
– the noise is uncorrelated to the input.– the noise is white.– the probability distribution is uniform.
-
Page 29
57
Each quantization effect is modeled Each quantization effect is modeled by a mean and varianceby a mean and variance
Rounding:
Truncation:
Magnitude truncation:
12 and 0
2Δ== nnm σ
12 and
2
2Δ=
Δ−= nnm σ
3 and 0
2Δ== nnm σ
Δ is quantization step
58
This results in an equivalent linear This results in an equivalent linear networknetwork
*
+
a
X(n) Y(n)
z-1
QQ *
+
a
X(n) Y(n)
z-1
+
e(n)
But quantization is a non-linear operation!
-
Page 30
59
Limit cycles are an example of nonLimit cycles are an example of non--linear behaviorlinear behavior
*
+
-0.96
X(n) Y(n)
z-1
QQ
X(0) = 14, x(n) = 0 for n > 0
round to nearest integer
B bits
...
...
with rounding:
without rounding:
60
LimitcycleLimitcycle exampleexample
-
Page 31
61
a) LSB determination must be based a) LSB determination must be based on simulationson simulations
All fixed-point
simulate
outputok
yes
no
* +
stimuli
0.6
x
ymQQ
* +0.6
x
ym compare
QQ
z-1
62
b) Gradual refinement is necessary b) Gradual refinement is necessary to keep the problem manageableto keep the problem manageable
quantize S only
simulate
Perf.ok
yes
no
return
For each
signal S
* +
stimuli
0.6
x
ymQQ
reference simulationcompare
z-1
-
Page 32
63
ConclusionConclusion
• Number representation• Location of decimal point• Precision• Dynamic range• Truncation, rounding• Overflow• Now: what to do?• Why are we doing this?
– Next exercise: floating point to fixed point– Area/time/power optimization– Important design optimization for JPEG project