algorithms and architectures for decimal transcendental function computation

81
Ph.D final defence 1 Algorithms and Architectures for Decimal Transcendental Function Computation Ph.D Candidate: Dongdong Chen Department of Electrical and Computer Engineering University of Saskatchewan

Upload: clarissa-verne

Post on 01-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Algorithms and Architectures for Decimal Transcendental Function Computation. Ph.D Candidate: Dongdong Chen Department of Electrical and Computer Engineering University of Saskatchewan. Outline. Research Background and Motivation Table-based First-Order Polynomial Approximation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 1

Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D Candidate: Dongdong Chen

Department of Electrical and Computer Engineering

University of Saskatchewan

Page 2: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 2

Outline

• Research Background and Motivation

• Table-based First-Order Polynomial Approximation

• Digit-Recurrence with Selection by Rounding

• Function Iteration Method

• Conclusion

Page 3: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 3

Research Background

• Why Decimal Arithmetic?

Page 4: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 4

Objectives (Con.)

• Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2)– DFP adder/substracter – DFP multiplier– DFP divider– DFP transcendental function computation

Page 5: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 5

Background

The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true?

Page 6: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 6

Introduction• Decimal is still essential for specific applications

– Numbers in commercial databases are decimal

– Extensive use decimal in commercial applications

– Survey of commercial databases report

– Decimal fixed-point or floating-point number

• How to process decimal computation

– Software computation

– Convert back to decimal representation

– Problems

Page 7: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 7

Introduction (Con.)• Errors from decimal and binary conversion

– Example 1: represent 0.1 in DFP or BFP

Decimal representation (BCD code):0.0001

Binary representation: 0.00011… 0.09…

– Example 2: telephone billing Cost: 0.70; Tax: 5%

BFP arithmetic: 0.6999…8*(1.05)=0.734999…

DFP arithmetic: 0.70*(1.05)=0.74

• Decimal integer, fixed-point or floating-point?• Decimal hardware or software solutions?

Page 8: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 8

• DFP arithmetic defined in IEEE 754-2008 • IBM computing systems include DFP hardware

– IBM Power6, z9, z10• Intel include DFP software solution in system

– Intel DFP software computation library• DFP arithmetic IP blocks:

– Basic DFP arithmetic IPs:

DFP adder/substrcter, multiplier, divider, square root etc.

– Transcendental DFP arithmetic IPs:

DFP CORDIC, Logarithm, antilogarithm, reciprocal etc.

Current Researches

Page 9: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 9

DFP Arithmetic in IEEE 754-2008

• Review BFP arithmetic in IEEE 754-2008

• How to define new DFP in IEEE 754-2008

Page 10: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 10

BFP Floating-point representation

• Representation:

– sign, exponent, significand (or mantissa):

(–1)sign ×significand ×2exponent

– more bits for significand gives more accuracy

– more bits for exponent increases range

• IEEE 754 floating point standard:

– single precision: 8 bit exponent, 23 bit significand

– double precision: 11 bit exponent, 52 bit significand

Page 11: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 11

BFP floating-point Number

• Leading “1” bit of significand is implicit–Example: if the significand is 011010110…0, the

actual significand is 1.011010110…0

• This is called a normalized number; there is exactly one non-zero digit to the left of the point.

–Unique representation of a number–We get a little more precision: there are 24 bits in

the significand, but only 23 of them are stored.

Page 12: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 12

Exponent• Exponent is “biased” to make sorting easier

– all 0s is smallest exponent, all 1s is largest

– The actual exponent is e-127 for single precision, and e-1023 for double precision

– Bias of 127 for single precision and 1023 for double precision

– By biasing the exponent and storing it before the significand, we can compare magnitudes as if they were unsigned integers.

• If e = 1000 0011 (13110), the actual exponent is 131-127=4

• If e = 0101 1101 (9310), the actual exponent is 93-127=-34

Page 13: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 13

BFP Floating-Point Formats

Short (32-bit) format

Long (64-bit) format

Sign Exponent Significand

8 bits, bias = 127, –126 to 127

11 bits, bias = 1023, –1022 to 1023

52 bits for fractional part (plus hidden 1 in integer part)

23 bits for fractional part (plus hidden 1 in integer part)

Page 14: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 14

BFP Floating-Point Formats (Con.)

NegativeOverflow

PositiveOverflow

Expressible negativenumbers

Expressible positivenumbers

0-2-127 2-127

Positive underflowNegative underflow

(2 – 2-23)×2128- (2 – 2-23)×2128

00000000 00000000000000000000000Biased

exponentFraction

Positive and negative zero

11111111 00000000000000000000000Biased

exponentFraction

1

1

0

0Positive and

negative infinity

exponent = 128 and fraction ≠ 0, It is called exponent = 128 and fraction ≠ 0, It is called “not a number”“not a number” or or NaNNaN

0

∞ ∞

Page 15: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 15

Example

• Summary: FP representation

(–1)sign×significand)×2exponent – bias

• Example:– decimal: -.75 = -3/4 = -3/22

– binary: -.11 = -1.1 x 2-1

– floating point: exponent = 126 = 01111110– IEEE single precision:

1 01111110 10000000000000000000000

Page 16: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 16

• Representation:

– sign, exponent, significand (or mantissa):

(–1)sign ×significand ×10exponent

– more digits for significand gives more accuracy

– more bits for exponent increases range representation:

• DFP formats:

– decimal32: DFP storage format encoded in 32-bit

– decimal64: DFP computational format encoded in 64-bit

– decimal128: DFP computational format encoded in 128-bit

DFP Number Representation

Page 17: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 17

DFP Number format

• 1-bit Sign (S) is defined as same as BFP format

• w+5-bit combination (G) to two subfield: – 5-bit (G0…G4) to encode: 2 MSBs of exponent; 1 MSD of

significand; Not-a-Number (NaN); Inf;

– W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4, which consists of w+2-bit nonnegative biased exponent.

Page 18: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 18

DFP Exponent

• Exponent is “biased” to make sorting easier– Binary format (not decimal)

– The actual exponent is e-101 for decimal32, e-398 for decimal64, e-6167 for decimal128

– Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1);

Page 19: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 19

DFP Number format (Con.)

• J×10-bit Trailing Significand (T) Field:– Densely packed decimal (DPD) encoding

3-digit decimal number encoded to 10-bit binary number

DPD converted to binary coded decimal (BCD)

– Binary integer decimal (BID) encoding

decimal number encoded by binary integer

– Non-normalized decimal significand

(-1)0 × 0.00900 × 102 (-1)0 × 0.09000 × 101

– DFP number’s Cohort

Page 20: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 20

Parameters in DFP Format

Page 21: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 21

Example• Summary: DFP representation • (–1)sign×(significand)×10exponent-bias • Convert -8.35×10-2 to decimal64

– Sign bit: “1” negative, “0” positive (sign 1)

– Exponent: -2+398=396 (8-bit “0110001100”)

– Significand: 835(50-bit DPD coding “0…00 02 3D”)

– Encoding of 5-bit MSBs (G0…G4) of Combinational

field “01000”

– Decimal-64 : “10100010001100…..00…1000111101”

“A2 30 00 00 00 00 02 3D” (binary/hex)

Page 22: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 22

• Not-a-Number: G0…G4 “11111”;• Infinite Number: G0…G4 “11110”, sign of Inf

according to the sign bit;• Overflow: If DFP numbers with absolute values are

larger than the largest DFP number (|vmax|=(10q - 1)×10emax-q+1) then overflow occurs.

• Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1, it produces subnormal.

• Normal number: The remaining exponent values and significands represent normal numbers.

DFP special values

Page 23: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 23

• Basic DFP arithmetic operations• Two decimal-specific DFP operations

– SameQuantum(DFP1,DFP2)– Quantize(DFP1,DFP2)

• DFP comparison operations– do not distinguish between redundant of the same

number• DFP conversion operations

– DFP to BFP conversion (correctly rounded);– DFP to integer conversion

• Recommended DFP operations

DFP Arithmetic Operations

Page 24: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 24

• Basic DFP arithmetic operations• Two decimal-specific DFP operations

– SameQuantum(DFP1,DFP2)– Quantize(DFP1,DFP2)

• DFP comparison operations– do not distinguish between redundant of the same

number• DFP conversion operations

– DFP to BFP conversion (correctly rounded);– DFP to integer conversion

• Recommended DFP operations

DFP Arithmetic Operations

Page 25: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 25

• Non-normalized decimal significand • DFP number’s Cohort• Standard defines the preferred (required) exponent

(quantum)– Exact operation results: the cohort member is selected

based on the preferred exponent (quantum) for a DFP result of that operation

– Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits

DFP Number’s Cohort

Page 26: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 26

• Five types of active rounding modes– roundTiesToEven

– roundTiesToAway

– roundTiesToPositive

– roundTiesToNegative

– roundTowardZero

• Correct rounding and Faithful rounding• IEEE 754-2008 require to satisfy the correct

rounded results for all DFP arithmetic operations• DFP operations should satisfy all rounding modes

DFP Rounding Modes

Page 27: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 27

• Invalid operation: Operand is NaN; 0×Inf; quare-root of negative operand; default result is NaN

• Division by zero: if the dividend is a finite non-zero number and the divisor is zero. The default result is a +inf or −inf.

• Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation.

• Underflow operation: if the magnitude of a result is below 10emin.

• Inexact: the correctly rounded result of an operation differs from the infinite precision result.

DFP Exception Handling

Page 28: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 28

DFP Addition/Subtraction

Page 29: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 29

DFP Add/Sub Data flow

Page 30: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 30

• Step 1: equalize the exponents– add the mantissas only when exponents are the

same. – the number with smaller exponent should be

shifting its point to the left, and the number with larger exponent should be shifting its point to right.

– Rewriting the operand with the smaller exponent could result in a loss of the least significant digits

– keep guard digit, round digit, and stick digit for the operand with smaller exponent

DFP Addition

Page 31: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 31

DFP addition

• Step 2: add the mantissas

0099999x101

+0016234x10-3

0999990x100

0000016(234)x100

1000006(234) x100

• Step 3: Normalize the result if necessary

Page 32: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 32

DFP addition

• Step 4: Round the number if needed

1000006234x100 =1000006x100

• Step 5: Repeat step 3 if the result is no longer normalized

• The final result is 1000006

• The correct answer is 1000006.234

Page 33: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 33

Guard bits• To help minimize rounding problems, IEEE

specifies that intermediate steps of operations must store guard digits - additional internal digits that increase the precision of the operations.

• Previous example: add one extra digit.

• IEEE 754-2008 requires one guard digit, one rounded digit and one sticky digit to make rounding more accurate.

Page 34: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 34

DFP add/sub

Page 35: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 35

General Description: Addition

Page 36: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 36

Example: Addition

Page 37: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 37

Example: Addition (Con.)

Page 38: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 38

DFU: IBM POWER6 and Z10

Page 39: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 39

High performance Implementation

Page 40: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 40

High performance Implementation

Page 41: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 41

High performance Implementation

[12] A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009

Page 42: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 42

Evaluation Results and Comparison

[Proposed]: A. Vázquez and E. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19, Portland. June 08-10 2009

Page 43: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 43

DFP Multiplication

Page 44: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 44

Scheme of decimal multiplierx : 1 9 6 3 ×y : 8 1 4 5 =xy0: 5x 9 8 1 5 0 0 0 0 0xy1: 5x 9 8 1 5 −x - 1 9 6 3xy2 : x 1 9 6 3 0 0 0 0 0xy3: 10x 1 9 6 3 0 −2x - 3 9 2 6 1 5 9 8 8 6 3 5

Page 45: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 45

Partial product generation

Generate XYi

Yi {1,2,3…7,8,9}XYi is carry save format

Page 46: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 46

Partial product generationSolid Circles: BCD Sum (digit)Hollow Circles: Carry (bit)

n-digit radix-10 CSA

m-digit radix-10 counter

Page 47: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 47

Carry Save Adder Tree

CSA Tree to Generate Multiplication Result

Page 48: Algorithms and Architectures for Decimal Transcendental Function Computation

48Nov. 26, 2010

Flowchart of DFP Multiplier

Page 49: Algorithms and Architectures for Decimal Transcendental Function Computation

49Nov. 26, 2010

Architecture of DFP Multiplier

Page 50: Algorithms and Architectures for Decimal Transcendental Function Computation

50Nov. 26, 2010

Exception Detection & Handling

• Invalid operation– sNaN (pass significand of sNaN)

– 0 x ∞ (produce qNaN with significand 0)

• Overflow (and Inexact)– IEIP – SLA > Emax

– Increase SLA until all LZs removed

• Underflow (and possibly Inexact)– IEIP – SLA < Emin

– Decrease SLA until 0, then shift right

• Inexact

Page 51: Algorithms and Architectures for Decimal Transcendental Function Computation

51Nov. 26, 2010

Implementation Highlights

• Leverage operands' LZCs– SC, SLA, and IESIP

• Handle NaNs with minimal overhead– No dataflow modification– Coerce multiplicand or multiplier to 1

• Support gradual underflow– No dataflow modification– Simply extend number of iterations

• Simple, control-based rounding scheme

Page 52: Algorithms and Architectures for Decimal Transcendental Function Computation

52Nov. 26, 2010

Synthesis Results

• 64-bit (16 digit) operands, DPD encoded

• LSI Logic's gflxp 0.11um CMOS, 55ps FO4

• Synopsys Design Compiler

• Results– Fixed-point 119,653 um2 14.72 FO4s– Floating-point 237,607 um2 15.45 FO4s

• Critical path– Fixed-point 4:2 compressor (accumulator)– Floating-point 128-bit barrel shifer

Page 53: Algorithms and Architectures for Decimal Transcendental Function Computation

53Nov. 26, 2010

Applicability to Parallel Designs

• IE and IP shift generation

• Rounding scheme

• NaN handling

• Exception detection and handling

• On-the-fly sticky bit generation... NO

Page 54: Algorithms and Architectures for Decimal Transcendental Function Computation

54Nov. 26, 2010

Sequential vs. Parallel

• Sequential– Less area– Potentially better cycle time

• Parallel– Less latency– Higher throughput

Page 55: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 55

DFP Division

Page 56: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 56

DFP Division Data FlowCombinational Field

(5 bits)Significands Field (50bits) Sign (1 bit)

64 64

Sign Logic

Exponent Substraction

Bias Addition

Exponent Adjustment

Exponent Adjustment

Mantissa Division

Normalization

Rounding

Unpacking

RoundingControl

F

1

10

S1

11

S2

Sq

E2_bE1_b

E12

1

8 8

10

11

Eb

Eq 8

Ea

1

Fa

72

M12

Mn

Fr1

1Fa2

72

64

Mq

64

Exponent Field (8 bits)

Combinational Field (5 bits)

Significands Field (50 bits) Sign (1 bit) packingExponent Field (8 bits)

Combinational Div Process

55

10

C1 C22

E1_a

Combin_Register

2E2_a

E2E1 10 10

Combin_Register

DPD_to_BCD

M2_bM1_b 50 50

M2_bM1_b 60 60

4

4M2_a

M1_a

M2M1 64 64

Mn72

Combinational Com Process

Significand_Div

BCD_to_DPD

60

Mq

Mq50

Exponent Div

10Ea

2

Eq_C

4

Mq_C

5Cq

• Unpacking Decimal Floating-Point Number

• Check for zeros and infinity

• Subtract exponents

• Divide Mantissa• Normalize and

detect overflow and underflow

• Perform rounding• Replace sign• Packing

Page 57: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 57

Unpacking and Sign Logic

• Step1: Unpacking Floating-Point Number Check for zeros and infinity (if F=0, Stop)

Sign Logic

1S1 S21

Sq1

• Step2: Sign Process

1 2qS S S

Combinational Field (5 bits)

Significands Field (50bits) Sign (1 bit)

64 64

UnpackingExponent Field (8 bits)

Page 58: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 58

Exponent Subtraction

ExponentSubstraction

E2E1

E12

11 11

11

Bias Addition

11Eb

• Step3: Exponent Subtract

1 2bE E E bias

Page 59: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 59

Mantissa Division

Mantissa Division

M1 M264 64

68M12

• Step4: Mantissa Division

10.1 1M 20.1 1M

min 0.1M max 1 10 1pM

min max 1 2 max min0.1 / / / 10M M M M M M

Algorithms Choose here?

1. Restoring division

2. Non-restoring division

3. High-Radix division

4. Convergence division

Page 60: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 60

Normalization

• Step5 : Left shift over one bit is needed to make Mantissa result Normalized, also need to detect overflow and underflow

Normalization

68M12

Fa

1

68Mn

Exponent Adjustment

10Eb

10Ea

For example: “0934…2140819564” Left shift one bit

“934…21408195640 Should tell exponent and Ea=Eb-1

Page 61: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 61

Rounding and Packing

RoundingControl

68

Rounding

Mn

Fr

68

1

Exponent Adjustment

64 Mq

1

Fr

Ea10

Eq10

• Step6 : Truncate, Round-up, Round-to-nearest. Sometimes, the Rounding Policy above is not fair, according to IEEE Rounding standard: “Round to nearest even” is more better.

• Step7: Packing the Sign bit and Exponent bits and Significand bits together, detect the NaN, Infinity,

11

Eb M12

64

Combinational Field (5 bits)

Significands Field (50 bits) Sign (1 bit) packingExponent Field (8 bits)

Page 62: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 62

High performance Implementation

[1] L.-K. Wang and M. J. Schulte, “Decimal Floating-Point Division Using Newton-Raphson Iteration,” Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 84-95, Sep. 2004.

Page 63: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 63

High performance Implementation

[2] Tomás Lang and Alberto Nannarelli, “A Radix-10 Digit-Recurrence Division Unit: Algorithm and

Architecture,”IEEE Transactions on Computers, pp727–739, IEEE, June 2007.

Page 64: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 64

High performance Implementation

Page 65: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 65

Evaluation Results and Comparison

1: Synthesized with a STM 90-nm standard cell library

DFP Divider[1] DFP Divider[2]

Precision (digit) 16 (decimal64) 16 (decimal64)

Cycle time (ns) 0.57 1

# of cycles 150 20

Latency (ns) 85.5 20

Page 66: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 66

DFP Transcendental Arithmetic

Page 67: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 67

Contents• Introduction

• Decimal Logarithmic Converter

• Decimal Antilogarithmic Converter

• Conclusions

• Future Work

Page 68: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 68

32-bit DFP Logarithm

10 10 10log ( ) log (10 ) log ( )eR X coefficient

( 1) 10s eX coefficient

coefficient is a non-normalized decimal Integer.

To guarantee a 32-bit DFP Calculation, there need to keep 14-digit FXP logarithmic calculation.

Example: 0 810log (( 1) 10 0024589)R

108 5 log (0.2458900)

Page 69: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 69

32-bit DFP Antilogarithm

10log ( ) 10XP Anti X 10 min 10 maxlog ( ) log ( )X X X

10log ( ) 10 10 10 fracInt Frac IntXX X XAnti X

Here:

For 32-bit DFP: [ 101,96.99999]X

Example: 1 510log (( 1) 1940467 10 )Anti

19 0.404670010log (19.40467) 10 10Anti

To guarantee a 32-bit DFP calculation, there need to keep 8-digit FXP antilog calculation.

Page 70: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 70

Digit-Recurrence Algorithm (Log)

The corresponding recurrences:

( 1) [ ](1 10 )jjE j E j e

10( 1) [ ] log (1 10 )jjL j L j e

Here: [1]E m [1] 0L

je ( 1)E j selected so that converges to 1

ej ∈ { -9 -8 -7…0 1…7 8 9}

Page 71: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 71

Digit-Recurrence Algorithm (Antilog) Any 7-digit fixed-point decimal input N:

( ) ln(10) '10 em m me

The corresponding recurrences:

Here: [1] 1E [1] 'L m

je ( 1)L j selected so that converges to 0

( 1) [ ] ln(1 10 )jjL j L j e

( 1) [ ] (1 10 )jjE j E j e

1 10 ji jf e

ej ∈ { -9 -8 -7…0 1…7 8 9}

Page 72: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 72

Selection By Rounding (cont.) A scaled remainder is defined as:

[ ] 10 (1 [ ])jW j E j

je is achieved by Rounding W [j]

( [ ])je round W j

e1 is achieved by using look-up table, e2…ej can be obtained with selection by rounding

Log:

Antilog: [ ] 10 ( [ ])jW j E j

Page 73: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 73

Architecture: Decimal Log Converter

Detector

Reg 1

28

Mult128

m

m2m 3m5m

56

32

8

2

m'“0000”

4e1

Mult2

Mux 2Mux 1

4

56

56

56

Mux 3

56

Mux 41

56

14-Digit Decimal CLA Adder

56

56

56

9'sCom

56

14-Digit Dec CLA56

Rounding Logic

456

4ej

56

Shifter (x10) Shifter (x100)

56

Mux 6

56

4ejejm' W[j]

56m'W[j]

Shifter (x10-j)

Mux 556

5656

Tab II Mult3Mux 8

(1/ln(10))

16-Digit Dec CLA

Mux 9

0 &

Mu

x 7

Stage 24

4

4

64 64

56

64

6464

64

4

64

ej

e1

Log 10(5,2,3)

Stage 1

Reg

34

Reg 2

Reg 4

Reg

5W[j]

Reg 6

Adjusted Costant

critical path

9'sCom

“0000”

56

Tab I4e1

8

Page 74: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 74

Implementation Results

Logic Utilization Used Available* Utilization

# of Occupied Slices 2842 13696 21%

Maximum Frequency 47.7 MHz

# of Clock Cycles 17 clock cycle

*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7

Critical Path Detail (ns):

Reg2 Mux2 Mult 2 Shifter Mux5 CLA Round Total

1.188 1.564 9.347 1.438 1.350 5.519 0.566 20.97

Page 75: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 75

Architecture: Dec. Antilog Converter

Reg 2

TAB I12

8e1

AddGen

40

Mux 17

TABLE II7

Shifter (x10j+1)40

9'sCom

9'sCom

40 40Mux 2Mux 3

40

10-digit Dec CLA4040

Rounding Logic40

Shifter_Reg

40

40

Reg 3

W[j]

4 ej

AddGen7

4 ej

40

ej

“0000”

4

Stage 1

40

Stage 2

e1 ‘1’

Shifter (x10-j)

4

Mult

40

40

40

40

40 404040

Reg 6

40

ej

Mux 4

“0000”

10-digit Dec CLA

Mux 5

m'

Critical Path

Reg 528

28

Final Rounding

L(j)

Reg 128

40

28

Cons Mul28

32“0000”

ln(10)

fracX

Page 76: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 76

Implementation Results

Logic Utilization Used Available* Utilization

# of Occupied Slices 2315 13696 17%

Maximum Frequency 51.5 MHz

# of Clock Cycles 11 clock cycle

*: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7

Critical Path Detail (ns):

Reg6 Mult Mux4 Shifter CLA Round Total

1.599 7.839 1.539 1.100 6.794 0.545 19.42

Page 77: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 77

Comparison (with Binary FXP Log and Exponential Converters)

• similar dynamic range for the normalized coefficients.

• Binary reference available having the same digit-

recurrence algorithm with Selection by Rounding.• The radix-10 is close to radix-8.

52 16 532 10 2 23 7 242 10 2

Page 78: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 78

Comparison (cont.) (with Binary FXP Log and Exponential Converters)

1: Synthesized with a TMSC 0.18-um standard cell library2: the area of 1-bit full adder3: the delay of 1-bit full adder

Radix-10 Decimal1 Radix-8 Binary [1]

Log. Exp. Log. Exp.

Precision (digit) 7 16 7 16 24 53 24 53

Area (fa2) 1630 2640 1370 2260 647 1829 627 1777

Cycle time (T3) 17 19 16 18 7 8 7 8

# of cycles 8 17 8 17 8 18 11 21

Latency (T3) 136 323 128 306 56 144 77 168

Page 79: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 79

Conclusions• Achieved 32-bit DFP accuracy of decimal log and

antilog results.• Implemented them on FPGA and ASIC.• Compare them with binary converters.

Page 80: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 80EE990 April. 2009

80/18Decimal Log and Antilog Converters

Future Work• The 64-bit and 128-bit DFP logarithm and antilog

converters.• The presented architecture can be optimized to

achieve a faster speed or occupy a smaller area.

Page 81: Algorithms and Architectures for Decimal Transcendental Function Computation

Ph.D final defence 81

Summary

• IEEE 754-2008 defines a DFP standard that defines– number representation in several precisions– correct DFP arithmetic operations– rounding modes

• Implementation of DFP Adder, Multiplier, Divider, Logarithmic and Antilogarithmic Converter

• Implementing and programming DFP are both really hard.