Download - Improved Goldschmidt division method using mapping of divisors

. BRIEF REPORT .

SCIENCE CHINAInformation Sciences

September 2013, Vol. 56 099101:1–099101:6

doi: 10.1007/s11432-013-4996-1

c© Science China Press and Springer-Verlag Berlin Heidelberg 2013 info.scichina.com www.springerlink.com

Improved Goldschmidt division method usingmapping of divisors

YAN Wen1, QU XiuJie1∗, CHEN He1, YU JiYang2 & LONG Teng1

1School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China;2The Electronic Engineering Technology Research Center of China Academy of Space Tech, Beijing 100094, China

Received May 23, 2013; accepted July 18, 2013

Abstract To achieve high precision with fewer storage resources, an improved Goldschmidt division method

of using the mapping of divisors is presented. The improved division method does not need the initial approxi-

mation, which means that the look-up table can be saved. Then a mapping method is proposed to reduce the

relative errors of the iteration results through multiplying the dividends and divisors by the mapping coefficients

simultaneously. Since the mapping coefficients are all fixed factors, the mapping method applies CSD coding in

the multiplication with fixed factors to reduce the hardware resources. Finally, using fewer hardware resources,

the proposed method can achieve smaller relative errors.

Keywords Goldschmidt division method, mapping of divisors, CSD coding, multiplication with fixed factors,

fewer hardware resources

Citation Yan W, Qu X J, Chen H, et al. Improved Goldschmidt division method using mapping of divisors. Sci

China Inf Sci, 2013, 56: 099101(6), doi: 10.1007/s11432-013-4996-1

1 Introduction

Floating-point division is now more and more popular in scientific and engineering applications. Although

division is not used as frequently as addition and multiplication, it affects the performance of computation

results a lot. The division algorithm is usually based on the iterations of multiplication. Newton-

Raphson [1,2] and Goldschmidt [3–5] division are the most popular iteration algorithms. We usually

choose Goldschmidt algorithm instead of Newton-Raphson algorithm due to its higher intrinsic parallelism

[6], which leads to less execution times.

Many researchers have tried to reduce the computation time of division by convergence, and the

most popular approach is to reduce the number of iterations by increasing the precision of the initial

approximation to the reciprocal, thus needing to increase the digit numbers of the look-up table. Then

the storage resources will grow accordingly in an exponential manner. Since the storage resources of a

system is usually limited, it is important to save the table area.To reduce the area for the reciprocal table,

some methods have been suggested. For example, a Goldschmidt division method is proposed in [5] with

faster than quadratic convergence, which can achieve faster convergence speed with fewer table areas.

∗Corresponding author (email: [email protected])

Yan W, et al. Sci China Inf Sci September 2013 Vol. 56 099101:2

Although the proposed method in [5] can reduce the table area, it still needs to increase the storage

resources when the precision requirement is very high. To achieve high precision using fewer storage

resources, an improved Goldschmidt division method of using mapping of divisors is proposed. This

method does not need the initial approximation, thus saving the look-up table. The proposed method

has smaller relative errors than the conventioal Goldschmidt method and the DFQC method in [5].

Besides, the mapping of divisors is equivalent to the multiplication with fixed factors. And CSD coding

is applied to reduce the multiplication resources [7].

The main contribution of our method is that it has saved the initial approximation and can achieve

high precision with fewer storage resources. In Sections 2, the improved Goldschmidt division method

using mapping of divisors is explained in detail. Then some analysis on the errors and resources is made in

Section 3 and Section 4. And results of the new design are compared with other methods to demonstrate

its better performance. Our conclusion is given in Section 5.

2 Divisor mapping GLD method

Division can be written as Q = N/D, where N is the numerator, D is the denominator and Q is

the quotient. The principle of the Goldschmidt division method is to multiply the numerator and the

denominator with a group of factors Fi at the same time to make the denominator close to 1, such that

the numerator is close to the quotient [3,8]. Details on the conventional Goldschmidt division method can

be seen in [3,5]. Results of the conventional GLD division method can be achieved through the following

equations ⎧⎪⎪⎨

⎪⎪⎩

Fi+1 = 2−Di+1,

Ni+1 = NiFi,

Di+1 = DiFi,

(1)

where i = 0, 1, 2, . . . ,M − 1, M represents the iteration times and M � 1. Ni and Di represent the

dividend and divisor after the ith iteration respectively. Here N0 = N , D0 = D, and F0 is produced by

a reciprocal table with limited precision, which has an initial error s.

We know that the conventional Goldschmidt division method needs to store the initial approximations

in a look-up table before implementation. However, to improve the precision of the initial approximation,

it needs to increase the number of digits of the look-up table. Then the storage resources will grow

accordingly in an exponential manner.

2.1 Improved GLD method

In this paper, an improved division method, named the Divisor Mapping GLD (Goldschmidt Division)

method is proposed, which doesn’t need the initial approximation. Instead, it needs to preprocess the

dividends and divisors using the mapping method before the iterations.

We reform the iteration equations as follows

⎧⎪⎪⎨

⎪⎪⎩

Fi = 2−Di,

Ni+1 = NiFi,

Di+1 = DiFi,

(2)

where i = 0, 1, . . . ,M − 1, M represents the iteration times and M � 1. Here, N0 = N , D0 = D. We can

find that F0 = 2−D, so the Divisor Mapping GLD method does not need the initial approximation.

With the first and the third equations in (2), we can get

Di+1 − 1 = −(D − 1)2i+1

. (3)

With (3), we can get

Ni+1 =N

DDi+1 =

N

D

[1− (D − 1)2

i+1]. (4)


When the divisor is close to 1, the quotient is approximately equal to the dividend. After the Mth

iteration, the quotient QM can be expressed as

QM ≈ NM =N

D

[1− (D − 1)2

M]. (5)

Define the relative error as

ei+1 = abs

(Qi+1 −N/D

N/D

)

. (6)

Combining (5) and (6), the relative error of results after the M th iteration is

eM = (D − 1)2M

. (7)

For the floating system, N and D can be expressed in the scientific notation as 1.F × 2G, where F and

G stand for the mantissa and the exponent respectively. Obviously, we can get the division results by

the exponent subtract and the mantissa divide. Then the normalized N and D are both in [1, 2).

From (7), we can see that the improved GLD method still achieves quadratic convergence. But the

relative error of results has nothing to do with the initial approximation. When the iteration time M

is fixed, the relative error will increase with the increasing divisor. So the closer D is to 2, the larger

the relative errors are. Then we have proposed a divisor mapping method, which can reduce the relative

errors of the improved GLD method.

2.2 Mapping of divisors

Assume the precision requirement to be E. If we want to meet the precision requirement, the divisors

should satisfy eM � E. Then we can get

D � 10lgE/2M + 1 = p. (8)

Here, p is the largest divisor which can meet the precision requirement, and p ∈ [1, 2]. From the

above analysis, we can find that only when the divisor p ∈ [1, p], can it meet the precision requirement.

Then the question is how to accommodate the divisors in the interval of (p, 2). Our approach is to map

divisors in (p, 2) onto [1, p].

When p � 1.5, we just need one mapping. Then the divisors in (p, 2) can be mapped onto [1, p]

through multiplying a fixed factor. When D ∈ (p, 2), we can define a mapping coefficient λ ∈ (0.5, 1)

to get D′ = Dλ, N ′ = Nλ, where N ′ and D′ represent the dividend and the divisor after mapping

respectively. Here, D′ ∈ [1, p]. We can guarantee results of all divisors to meet the precision requirement

by converting N/D into N ′/D′.But when p < 1.5, it is hard to accomplish the process with just one mapping, and so we need to map

for several times. We assume the mapping times as T. That is to say, the interval (p, 2) is divided into

T sections. The interval of the ith mapping is recorded as (Range(i,min),Range(i,max)], here i = 1, . . . , T .

The relationship of all the intervals should satisfy⎧⎪⎪⎨

⎪⎪⎩

Range(0,min) = p,

Range(i−1,max) = Range(i,min),

Range(T−1,max) = 2.

(9)

Assume the coefficient of every mapping to be C = {c(1), c(2), . . . , c(T )}. Then we can get

{c(i)Range(i,max) = p,

c(i)Range(i,min) = 1,(10)

Combining (9) and (10), we have the following mapping coefficients:⎧⎪⎪⎨

⎪⎪⎩

Range(i,max) =2

pT−1−i ,

Range(i,min) =2

pT−i ,

c(i) = pT−i

2 .

(11)


1.0 1.2 1.4 1.6 1.8 2.0−350

−300

−250

−200

−150

−100

−50

0

D

Rel

ativ

e er

rors

(dB

)

E

Before mappingAfter mapping

Figure 1 Relative errors before and after the mapping method.

To make sure that all the divisors in (p, 2) can be mapped onto [1, p], we usually keep a moderate

margin, i.e., that the neighboring sections can have overlaps. Then we have

{c(i)Range(i,max) � p,

c(i)Range(i,min) � 1.(12)

With (11) and (12), we can get

logp 2− 1 � T � logp 2. (13)

Considering that T is an integer, we have

T = ceil(logp 2− 1), (14)

where ceil represents rounding up.

Through the mapping method, all divisors can be mapped onto [1, p] and results of all divisors can

meet the precision requirement through the mapping method. We choose 100 different numbers within

the interval [1, 2] as the divisor. And the improved GLD method using mapping of divisors is simulated

using these numbers. The relative errors before and after the mapping method are shown in Figure 1.

Figure 1 shows that the relative errors will increase with the increasing divisors without the mapping

method. And the mapping method can effectively reduce the relative errors. Assume the precision

requirement is E. Then results of all divisors can meet the precision requirement after using the mapping

method.

3 Error analysis

We use Matlab to get the relative errors of the Divisor Mapping GLD method and the conventional GLD

method. To see more clearly, we do subtraction between the relative errors of the two methods. Results

show that the difference between the relative errors of the two methods is smaller than 1× 10−14. So the

Divisor Mapping GLD method can achieve approximately the same precision as the conventional division

method.

The performance of the Divisor Mapping GLD method is evaluated by determining the deviation from

the true quotient at each step. The relative error is used as the performance metric to evaluate the speed

of the convergence for the improved division method. Then we compare the performance of our method

with the conventional division method and the DFQC method proposed in [5].

Based on the previous analysis, all divisors can be transformed to the interval of [1, p], namely 1 �D � p. By (7) and (8), the maximum relative error of the improved method after the ith iteration is

Emax improved = (p− 1)2i

= exp

(lnE

M − i

)

. (15)


Table 1 Comparison on maximum relative errors after ith iteration

Algorithms i = 0 i = 1 i = 2 i = 3

Conventional method 2−5.4 2−10.8 2−21.6 2−43.2

DFQC method in [5] 2−5.4 2−13.2 2−29.6 2−58.4

Our method 2−7.4 2−14.7 2−29.5 2−58.9

High precision floating operating arithmetic units play an important role in high performance com-

pution. For the IEEE-754 rounding modes, the final quotient error must be less than 2−55 when Q is in

[0.5, 1) [5]. In high-performance processor design which requires low error rate, we set E = 2−59 [9].

In [5], the maximum relative error of the conventional method after the ith iteration is

Emax conventional = ε2i

0 . (16)

Our method does not need the look-up table. The conventional method and the DFQC method in [5]

use a 5-bit-in 5-bit-out reciprocal ROM table. The initial approximation error ε0 = 2−5.4 is calculated

using the error of an optimized reciprocal table [5].

The errors of a three-iteration divider after the ith iteration using the three division methods are shown

in Table 1. The relative errors of the conventional method and the DFQC method are referred to [5].

Although the convergence of Divisor Mapping GLD method is still quadratic, the initial errors of the

improved method are smaller than the other two division method. From the above analysis, we can see

that the Divisor Mapping GLD method has smaller relative errors than the conventional division method

and the DFQC method in [5].

4 Resource analysis

For the M -iteration Divisor Mapping GLD method, we can deduce from (2) that every iteration needs

two multiplications and one addition (Here we consider the resources of additions and subtractions to be

equal). However, in the hardware implementation of the floating system, compared with multiplications,

the resources of additions can be ignored. So we just need to consider the resources of multiplications.

In the Divisor Mapping GLD method, we also need to consider the multiplications with the mapping

coefficients. Since there need to be two multiplications for one mapping, when the mapping time is T,

the mapping method needs 2T multiplications. However, considering that the multiplication in mapping

is operated with fixed factors, we can reduce the hardware resources using the coding of the fixed factors.

Define β (β ∈ [0, 1]) as the ratio of the resources in the multiplication with fixed factors to that of the

regular multiplication.

After the M th iteration, the required numbers of multiplication can be defined as

R(M) = 2M + 2Tβ. (17)

Together with (8) and (14), we can get

R(M) = 2M + 2β · ceil[

log10

lgE

2M +12− 1

]

. (18)

When CSD coding is applied in the multiplication with fixed factors, we can know from [10–12] that

the average value of β is 0.3. A multiplication method with fixed factors using the least resources is

proposed in [7], which can reduce the resources by 20%. Through this method, the average value of β

is 0.2.

We can see that the Divisor Mapping GLD method can save the storage resources at a cost of increasing

the numbers of multipliers. However, the mapping of divisors is implemented before the iteration process.

So the multipliers used for the mapping method can be reused in the iteration process such that they do

not need extra resources. Besides, there usually are many dividers in a processing system, so the module

of mapping can be shared, which can also save a lot of hardware resources.


5 Conclusion

The Divisor Mapping GLD method presented in this paper does not need the initial approximation, and

its relative error is just related to the value of divisors and the iteration times. To improve the precision

of the iteration results, a mapping method is proposed, which can facilitate the attainment of the optimal

results within the least iteration times. At last, the method has applied the multiplication with fixed

factors using CSD coding to reduce hardware resources. Comparison with the conventional GLD method

and an optimal method in [5] shows that the Divisor Mapping GLD method can achieve higher precision

without using storage resources.

References

1 Flynn M J. On division by functional iteration. IEEE Trans Comput, 1970, C-19: 702–706

2 Gallagher W L, Swarzlander E E. Fault-tolerant Newton-Raphson and Goldschmidt dividers using time shared TMR.

IEEE Trans Comput, 2000, 49: 588–595

3 Goldschmidt R E. Applications of Division by Convergence. MIT Press, 1964

4 Kong I, Swartzlander E E. A rounding method to reduce the required multiplier precision for Goldschmidt division.

IEEE Trans Comput, 2010, 59: 1703–1708

5 Kong I, Swartzlander E E. A Goldschmidt division method with faster than quadratic. IEEE Trans Very Large Scale

Integr (VLSI) Syst, 2011, 19: 696–700

6 Ercegovac M D, Matula D W. Improving Goldschmidt division, square root, and square root reciprocal. IEEE Trans

Comput, 2000, 49: 759–763

7 Mahesh R, Vinod A P. A new common subexpression elimination algorithm for realizing low-complexity higher order

digital filters. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2008, 27: 217–229

8 Piso D, Bruguera J D. Variable latency Goldschmidt algorithm based on a new rounding method and a remainder

estimate. IEEE Trans Comput, 2011, 60: 1535–1546

9 IEEE. IEEE Standard for Binary Floating-Point Arithmetic. IEEE Std754-2008. IEEE Computer Society, 2008

10 Xu F, Chang C H, Jong C C. Design of low-complexity FIR filters based on signed-powers-of-two coefficients with

reusable common subexpressions. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2007, 26: 1898–1907

11 Chang C H, Faust M. On a new common subexpression elimination algorithm for realizing low-complexity higher order

digital filters. IEEE Trans Comput-Aided Des Integr Circuits Syst, 2010, 29: 844–848

12 Faust M, Gustafsson O, Chang C H. Fast and VLSI efficient binary-to-CSD encoder using bypass signal. Electron

Lett, 2011, 47: 18–20

Download - Improved Goldschmidt division method using mapping of divisors

Top Related