chuyên san công ngh» thông tin và truy•n thông - sŁ 7 (10-2015)...

16
Chuyên san Công ngh thông tin và Truyn thông - S 7 (10-2015) LOW AREA ASIC IMPLEMENTATION OF LOGARITHM APPROXIMATION FOR DIGITAL SIGNAL PROCESSING Hoang Van Phuc 1 , Sai Van Thuan 1 , Pham Cong Kha 2 Abstract This paper presents an efficient ASIC implementation of the hardware logarithm approx- imation which can be used for emerging digital signal processing applications. By employing the quasi-symmetrical logarithm approximation method, the modified barrel shifter circuit and the optimized leading one detector and encoder, the proposed approach can reduce the hardware area and improve the logarithm computation speed significantly while achieve the similar accuracy compared with other implementations. The ASIC implementation results using a 0.18-μm CMOS technology standard library are also presented and discussed. Bài báo trình bày vic hin thc hóa hiu qu trên ASIC cho xp x hàm logarithm ng dng trong các h thng x lý tín hiu s tiên tin. Bng vic áp dng phương pháp xp x cn đi xng, mch dch bit tùy bin và b phát hin bit ‘1’ trng s cao nht kt hp mã hóa đưc ti ưu, phương án thit k đ xut cho phép gim lưng tài nguyên phn cng cn s dng và nâng cao tc đ tính toán logarithm mt cách đáng k, trong khi vn đm bo đ chính xác tính toán tương đương vi các phương án thc thi khác. Các kt qu thc thi trên thư vin chun ASIC công ngh CMOS 0.18-μm cũng đưc trình bày và tho lun trong bài báo này. Index terms ASIC, DSP, logarithm approximation. 1. Introduction Currently, many digital signal processing (DSP) systems requires efficient logarithm hardware approximation methods. For example, with the increasing demand for high dynamic range (HDR) image rendering, processing and display devices and applications, the efficient HDR image processing hardware implementation is highly required for such real-time applications. In this kind of applications, the efficient hardware approximation method for the logarithm transformation is one of the key issues [1]-[3]. Figure 1 (1)Le Quy Don Technical University, Hanoi, Vietnam. (2)University of Electro-Communications, Tokyo, Japan. 42

Upload: others

Post on 10-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

LOW AREA ASIC IMPLEMENTATIONOF LOGARITHM APPROXIMATION

FOR DIGITAL SIGNAL PROCESSING

Hoang Van Phuc1, Sai Van Thuan1, Pham Cong Kha2

Abstract

This paper presents an efficient ASIC implementation of the hardware logarithm approx-imation which can be used for emerging digital signal processing applications. By employingthe quasi-symmetrical logarithm approximation method, the modified barrel shifter circuitand the optimized leading one detector and encoder, the proposed approach can reduce thehardware area and improve the logarithm computation speed significantly while achieve thesimilar accuracy compared with other implementations. The ASIC implementation resultsusing a 0.18-µm CMOS technology standard library are also presented and discussed.

Bài báo trình bày việc hiện thực hóa hiệu quả trên ASIC cho xấp xỉ hàm logarithm ứngdụng trong các hệ thống xử lý tín hiệu số tiên tiến. Bằng việc áp dụng phương pháp xấp xỉcận đối xứng, mạch dịch bit tùy biến và bộ phát hiện bit ‘1’ trọng số cao nhất kết hợp mãhóa được tối ưu, phương án thiết kế đề xuất cho phép giảm lượng tài nguyên phần cứng cầnsử dụng và nâng cao tốc độ tính toán logarithm một cách đáng kể, trong khi vẫn đảm bảo độchính xác tính toán tương đương với các phương án thực thi khác. Các kết quả thực thi trênthư viện chuẩn ASIC công nghệ CMOS 0.18-µm cũng được trình bày và thảo luận trong bàibáo này.

Index terms

ASIC, DSP, logarithm approximation.

1. Introduction

Currently, many digital signal processing (DSP) systems requires efficient logarithmhardware approximation methods. For example, with the increasing demand for highdynamic range (HDR) image rendering, processing and display devices and applications,the efficient HDR image processing hardware implementation is highly required for suchreal-time applications. In this kind of applications, the efficient hardware approximationmethod for the logarithm transformation is one of the key issues [1]-[3]. Figure 1

(1)Le Quy Don Technical University, Hanoi, Vietnam.(2)University of Electro-Communications, Tokyo, Japan.

42

Page 2: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

Buffer

Luminance

Calculator

HDR

RGB

Color Tone

Mapper

Binary

Logarithm

L

Mapped RGB

log2L

Fig. 1. An typical HDR image encoder [2].

presents the block diagram of a typical HDR image encoder using logarithm trans-formation [2] where L denotes the luminance component. Efficient logarithm functiongenerators are also highly required in HDR video compression [3], digital communi-cation systems and real time digital signal processing (DSP) applications [4]. Besides,speech processing systems (including feature extraction) use logarithm generators asessential blocks, as depicted in Fig. 2. Moreover, the logarithmic and anti-logarithmicconverters are essential components in logarithmic number system (LNS) and hybridnumber system (HNS) processors [5]. Figure 3 shows that logarithm conversion (LOGC)consumes 47% chip area for typical 3-D graphics applications. As a result, reducingthe hardware complexity of these components can lead to a great impact on the over-all system performance of these applications. Therefore, many researchers have beenfocused on efficient methods to implement the logarithm approximation in differenthardware platforms [4]-[15]. In [16], we have presented an efficient method calledquasi-symmetrical method for the logarithm approximation.

This paper presents an low area ASIC (Application Specific Integrated Circuit) imple-mentation of an efficient binary logarithm approximation which can be used for manyDSP applications as mentioned above. The rest of this paper is organized as follows.In section 2, we will introduce briefly about the hardware architecture for logarithmapproximation and the proposed method for its key components. Then, section 3 willpresent the implementation results in 0.18-µm CMOS technology. Finally, the conclusionwill be included in section 4.

2. Logarithm hardware approximation

Without lost of generality, an unsigned integer number N can always be decomposedas [4]:

N = 2n(1 + x) (1)

43

Page 3: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

Windowing FFTMel-scale

Filter Banks

Speech

signal

LOGCepstral

coefficientDCT

Fig. 2. Feature extraction hardware architecture for the speech recognition system.

17% (ALOGC)

11% (FXP)

1%(Test Vector)

24% (LNS)

47%(LOGC)

Fig. 3. Area breakdown of a 3-D graphics chip in [5] (LOGC: logarithmic conversion, ALOGC:anti-logarithmic conversion, LNS: logarithmic number system, FXP: Fixed point).

where n called the characteristics of N , corresponds to the position of the most signif-icant ‘1’ bit in the binary representation of N and x is the fraction part with: 0 ≤ x <

1. As a result, the binary logarithm can be expressed as:

log2N = n+ log2(1 + x) (2)

Therefore, the binary logarithm can be computed by detecting the most significant ‘1’bit and approximating log2(1 + x) called the fundamental function. Many researchershave been focused on finding the efficient methods to approximate log2(1 + x) sinceit is the essential step in logarithm approximation. Based on the above mathematical

44

Page 4: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

LODEN

INV

Conv.

Barrel Shifterlog2(1+x)

W k

k

l

k

l

n

F

z

x

Modified

Barrel Shifter

Fig. 4. Proposed hardware architecture for the logarithm function generator.

properties of the logarithm approximation, its hardware architecture can be developedas depicted in Fig. 4. The leading one detector and encoder (LODE) block computesthe characteristic n and encodes it into the binary form. The modified barrel shifterincluding an inverter block (INV) and a conventional barrel shifter are used to generatethe fraction part x. The fundamental function approximation block provides the fractionpart (F ) of logarithm result by approximating log2(1+ x). Moreover, a flag (z) is usedto indicate the case of zero input. In this figure, W denotes the input bit-width, k andl are bit-widths of the integer and fraction parts in the result, respectively.

2.1. Mitchell approximation

J. N. Mitchell proposed a method of using a very simple linear approximation method[6] as follows:

log2(1 + x) ≈ x (3)

The error function due to this approximation method as:

EL(x) = log2(1 + x)− x (4)

where EL(x), called Mitchell error, denotes the error functions for this approximation.With the maximum value of the error function is 0.08639, the accuracy is only 3.5bits which is too low for many DSP applications. Therefore, there have been manyresearches focusing on the more accurate approximation methods.

45

Page 5: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

2.2. Shift-add piece-wise linear approximation

In this method, the fraction range of [0, 1) is divided into number of regions. Witheach region, EL(x) is approximated by a linear function called a segment. Basically, alinear function can be expressed as:

y = Slope ∗ x+ Offset (5)

The complicated multiplication with Slope in Eq. (5) can be simplified by using theshift-add logic. Some shift-add piece-wise linear approximation methods with differentnumbers of segments and parameter values of the linear function are presented in [5],[7]-[12]. In [5], [7]-[11], some methods were proposed with the number of segmentsof 2, 4 and 6 in which the parameters of each linear segment are selected by the“trial and error" method. B.-G. Nam et al. [5] presented a method of dividing theinput range into 24 regions for the logarithmic and 16 regions for the anti-logarithmicapproximation. In [12], the authors proposed an optimizing method for the integer binarynumber to logarithmic conversion. However, these methods should be improved forhigher accuracy applications. Increasing the number of segments [5], [12] or usinghigher order approximation [13] can improve the accuracy, but also leads to higherhardware complexity.

2.3. Table-based approximation methods

The multipartite table method (MTM) to approximate the elementary functions in-cluding logarithm was presented in [14] in which only tables and adders are used. Thisapproach can reduce the table size considerably compared with the direct look-up table(LUT)-based method in which only an LUT is used. S. Paul et al. [15] presented a table-based method by combining an LUT and a multiplier-less linear interpolation methodso that the table size can be reduced compared with some other table-based methodswith the same accuracy.

2.4. Truncated logarithmic approximation

M.B. Sullivan et al. [17] proposed the truncated logarithmic approximation methodwhich can improve the resource usage efficiency. However, its accuracy is very low dueto the simplicity of Mitchell method which the paper [17] is based on.

2.5. Combining LUT-based correction and difference method

The idea behind this method is that an LUT is used to store the difference function asdepicted in Fig. 5 so that higher accuracy approximation can be achieved. The difference

46

Page 6: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

EL(x) - D(x)

(LUT)

D(x)

+EL(x)

x

Fig. 5. Combined linear difference/LUT-based correction method.

function is defined by the difference value between EL(x) and the approximated functionD(x). R. Gutierrez et al. [4] proposed an improved method by using the 4-segment linearapproximation combined with a small error LUT. It is reported in [4] that this methodoutperforms previous methods for logarithm approximation in both area efficiency andperformance. However, the more improvements are desired and the selection of thelinear function parameters and coefficients which is performed by the “trial and error"method may lead to a non-optimal architecture.

In [18] and [19], the authors proposed two-stage logarithmic converter method whichis optimized for FPGA implementation only. However, the optimization method forASIC implementation is highly required.

Therefore, we have proposed an optimization algorithm to find the optimal parame-ters for the approximation. In the next section, we will present the quasi-symmetricalapproximation method for logarithm function [16].

2.6. Quasi-symmetrical logarithm approximation method

This section presents our approximation method which combines the 2-step parameteroptimization algorithm with the LUT-based correction [16] so that it is better than the“trial and error" method [4] which is often used in many previous works. As shownin Eq. (4), to approximate the fundamental function log2(1 + x), we can approximateEL(x) and use a simple addition. Consider the curves of the original EL(x), its mirrorfunction EL(1−x) and the mean function EM(x) as depicted in Fig. 6. Based on thesedefinitions, their relationship can be derived as follows:

EM(x) = 0.5× [(EL(x) + EL(1− x)] = EM(1− x) (6)

EM(x) is symmetrical via the line of x = 0.5, as shown in Eq. (4), and the maximum

47

Page 7: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.01

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

x

EL

EL(1−x)

Mean curveDifference

Fig. 6. The idea for the quasi-symmetrical approach.

value of the function of (EM(x) − EL(x)) is 0.0075 which is much smaller than themaximum value of EL(x) (0.08639). As a result, it will be promisingly efficient if wecan approximate the mean function because it requires only the approximation for ahalf range of x and the other half can be interpolated by a simple complement circuitusing the maximum significant bit (MSB) as the control bit.

Since the difference method uses an approximated function and an error correctionLUT, the parameter optimization algorithm has to take into account both the complexityof approximated function and the LUT size.

The 2-step optimization algorithm is proposed to find the optimal parameters fortwo segments that minimize the LUT size (by minimizing the maximum value of thedifference function) while achieve the low hardware complexity. Firstly, the entire range(0 ≤ x < 1) is divided into two halves. Then, the left half range (0 ≤ x ≤ 0.5) is furtherdivided into two equal intervals of [0; 0.25) and [0.25; 0.5] to make the selecting circuitsimple. In each interval, EL(x) is approximated by a linear function. Table 1 presentsthe 2-step optimization algorithm to find the optimal parameters of the linear segmentsin which Peak_point denotes the value of approximate function at the point x = 0.5and the difference function is defined by the difference between EL(x) and the linearfunction. Figure 7 depicts the optimization algorithm in which two linear segments arechosen independently. In step 1, a full search in the restricted ranges of Peak_point and

48

Page 8: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

Table 1. The 2-step optimization algorithm.

Step 1: For {Offset1L ≤ Offset1 ≤ Offset1H andPeak_pointL ≤ Peak_point ≤ Peak_pointH } :

Find the optimal values of Slope1 and Slope2.Step 2: Reassign the optimal Slope1 and Slope2 values in step 1 to

the adjacent power-by-2 values and find the optimal offset values.

Fig. 7. The 2-step optimization algorithm for the logarithmic conversion.

Offset1 is performed to find the optimal values of Slope1 and Slope2 that minimizethe maximum value (MaxDiff ) of the difference function. Then, in step 2, Slope1 andSlope2 are reassigned to the adjacent power-by-2 values and another search is performedto find the optimal offset values which minimize MaxDiff. The power-by-2 slope valuesare used so that the multiplications can be performed by the one-shift operations. Theranges of Peak_point and Offset1 are chosen to guarantee an acceptable level of theapproximation accuracy.

Table 2 summarizes the optimization results of the 2-step optimization algorithm forapproximating the fundamental function log2(1 + x). It can be seen that after step 2,MaxDiff increases slightly but the LUT size remains the same as the result of step 1.Two error factors including the mean error and the maximum error are used to evaluatethe approximation accuracy. The error analysis results are presented in Table 3 for the

49

Page 9: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

Table 2. Optimal parameter results of the 2-step optimization algorithm.

Step Slope1 Offset1 Slope2 Offset2 MaxDiff

Step 1 0.2332 0.008 0.728 0.0341 0.0089 (1/112)Step 2 0.25 0.004 0.0625 0.0518 0.0101 (1/99)

Table 3. Optimal parameters and error analysis of our quasi-symmetrical method and the method in [4].

Method In [4] Quasi-symmetrical methodNo. of segments 4 2 (4)

MaxDiff 0.0080 (1/125) 0.0101 (1/99)LUT size (bit) 640 640

Mean error 2.3 ×10−4 2.3 ×10−4

Maximum error 8.0 ×10−4 8.0 ×10−4

case of W = 13. It can be seen that the quasi-symmetrical method can achieve thesimilar accuracy and LUT size compared with the method in [4]. However, the quasi-symmetrical method can reduce the hardware complexity significantly because EL(x)

is approximated by four linear segments, but only two segments are computed actually.Figure 8 depicts the 2-segment quasi-symmetrical method after optimization and Fig. 9shows the hardware architecture to compute log2(1 + x). In Fig. 9, two shifters (2-bitand 4-bit left shift) are used to perform the slope multiplication. The MSB is usedto select the appropriate segment. An error correction LUT is added to improve theapproximation accuracy. The Sign Ext. (sign extension) block is used to provide 13-bitvalues from this LUT. Finally, the approximation result is computed by a 13-bit adder.

2.7. Decomposed leading one detector-encoder and modified barrel shifter

As shown in Fig. 4, LODE is used to generate the integer part (n) of the result andprovide the control signals for the barrel shifter. In this section, we present an efficientLODE circuit based on the merged LODE cell which can reduce both hardware areaand computation delay. Merged LODE means that the leading one detector and one-hotencoder are merged into a module in which the binary coded output (for the position ofthe leading one bit) is computed directly from the input word. Figure 10 presents theblock symbol of 4-bit merged LODE and its truth table in which a1a0 denotes the 2-bitresult with a 4-bit input (d3d2d1d0). A zero flag (z) is used to indicate the case of zeroinput. This method can also be applied directly for 8-bit and higher bit-width LODEcircuits. However, it may lead to the high hardware complexity and power consumption.Therefore, to get the trade-off between speed and hardware complexity for higher bit-

50

Page 10: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

x

EL

Appro. func.Difference

Fig. 8. The quasi-symmetrical linear method for logarithm approximation (after parameteroptimization). Appro. func. is the approximation function.

+log2(1+x)

x

MUX

LUT Sign Ext.

>> 2

>> 4

13

1357 MSB

1313

13

13

1 MSB

Fig. 9. The hardware architecture to approximate log2(1 + x) with W=13.

width designs, we employ the method of decomposing the input binary word into smallerparts which are processed in parallel. Some additional combinational circuits are alsoneeded to generate the final result as shown in Fig. 11 for the case of 16-bit LODE. Theinput word is decomposed into four parts which are fed to four 4-bit LODEs. Then, two4-input multiplexers (MUX4) are used to select the proper 4-bit input part and constantvalue and a 4-bit adder provides the final result. The architectures for 32-bit and 64-bitLODE circuits are the same as that of 16-bit LODE with the primitive components of8-bit and 16-bit LODEs, respectively, instead of 4-bit LODE.

Moreover, the modified barrel shifter (BS) architecture is used to reduce the hardware

51

Page 11: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

LODE4

d3

d2

d1

d0

a1

a0

z

Fig. 10. The 4-bit LODE and its truth table.

complexity by minimizing the number of control bits. In a conventional BS, the numberof shifts is (W + 1 − m) where m is the output of LODE. In our modified BS, theoutput of INV (inverter) block which is (W −m) is used to control the conventionalbarrel shifter. Then, an additional shift is performed to generate the fraction part (x).Since the number of shifts is smaller, the BS complexity can be reduced as well.

3. ASIC implementation results

The fundamental function log2(1+x), LODE circuit and the 16-bit logarithm genera-tor are implemented with a 0.18-µm CMOS standard cell library using Synopsys DesignCompiler and IC Compiler tools. For the implementation comparison, each circuit isimplemented with different methods using the same design constraints. Moreover, tocompare the overall converter performance, the results of area-delay product (ADP)are also presented. For the first prototype of our proposed method, we implemented alogarithm generator with W = 16 and l = 13 to support the 16-bit fixed-point formatincluding 13-bit fraction. In [4], a comparison for the fundamental function implementa-tion shows that the method proposed in [4] can reduce the hardware complexity by about43% compared with the method presented in [15] while achieve the similar conversionspeed and accuracy. Therefore, the implementation results of our method are comparedwith the method in [4] with the same hardware platforms to clarify the improvementsof the proposed architecture and parameter optimization method. As presented in Table4, the quasi-symmetrical method can reduce the ADP value of 44% compared withthe method in [4] for the hardware approximation of log2(1 + x). Table 5 presents theimplementation results of different LODE architectures in which the conventional LODEindicates the architecture with separate leading one detector and one-hot encoder circuitsand direct merged LODE indicates the merged LODE without input decomposition. Itis shown that the input decomposed LODE architecture outperforms other methods in

52

Page 12: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

Fig. 11. Input decomposed 16-bit LODE.

Table 4. Fundamental function implementation results in 0.18-µm CMOS technology with W = 13.

MethodCell Area Delay ADP

(×103 µm2) (ns) (×103)

Direct LUT-based 37.2 6.3 234.4MTM-based 7.5 9.6 72.0

Method in [4] 5.2 8.7 45.2Quasi-symmetrical method 4.2 6.0 25.2

both hardware area and delay. The ADP of the input decomposed LODE is 84% and14% lower compared with conventional and direct merged LODEs, respectively.

The implementation results for the logarithm generator is presented in Table 6. It canbe seen that our logarithm generator can achieve an ADP reduction of 37% comparedwith the one in [4]. Figure 12 is the chip microphotograph of the fabricated 16-bit

53

Page 13: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

Table 5. Implementation of different 16-bit LODE architectures using 0.18-µm CMOS technology.

ArchitectureCell Area Delay ADP

(µm2) (ns) (×103)

Conventional LODE 900 5.8 5.22Direct merged LODE 506 1.9 0.96

Input decomposed LODE 490 1.7 0.83

Fig. 12. Chip microphotograph of the proposed 16-bit logarithm generator.

logarithm generator and Table 7 shows the chip parameters in which the chip dimensionand power consumption are accounted only for the area of the logarithm generator. Thechip measurement also confirms that the design logarithm generator has performed itsfunction correctly.

4. Conclusion

In this paper, we have presented an efficient ASIC implementation for the low com-plexity logarithm hardware approximation which is highly required for HDR image/videoprocessing applications, speech feature extraction, HNS DSP processors and many otherDSP applications. The presented ASIC logarithm generator implementation combines

54

Page 14: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

Table 6. Implementation results in 0.18-µm CMOS technology of a 16-bit logarithm generatorusing different methods.

MethodCell Area Delay ADP

(×103 µm2) (ns) (×103)Direct LUT-based 41.8 14.2 593.6

MTM-based 20.5 13.6 278.8Method in [4] 9.4 10.3 96.8

Quasi-symmetrical method 7.6 8.0 60.8

Table 7. Chip parameters of the 16-bit logarithm generator.

Technology 0.18-µm CMOSSupply voltage (V DD) 1.8 V

Chip dimension 116 × 116 µmAverage power consumption 1.680 mW @ 50 MHz

the efficient quasi-symmetrical logarithm approximation method, the modified barrelshifter circuit and the decomposed leading one detector and encoder techniques. Theimplementation results in 0.18-µm CMOS technology have clarified the improvementof the proposed logarithm generator. In the future work, we will consider to implementthe higher bit-width logarithm generators for emerging applications such as a full HDRimage processing system.

Acknowledgment

This chip presented in this paper was fabricated in the chip fabrication program ofVLSI Design and Education Center (VDEC), The University of Tokyo in collaborationwith ROHM CO. LTD.

References[1] S. P. R. Xu and C. E. Hughes, “High-dynamic-range still image encoding in JPEG 2000,” IEEE Computer

Graphics and Applications, vol. 25, issue 6, pp. 57–64, Nov. 2005.[2] F. Hassan and Joan Carletta, “A high throughput encoder for high dynamic range images,” in Proc. IEEE

International Conference on Image Processing (ICIP 2007), vol.6, pp.VI-213 - VI-216, Sep. 2007.[3] A. Motra and H. Thoma, “An adaptive Logluv transform for high dynamic range video compression,” in Proc.

17th IEEE International Conference on Image Processing (ICIP), pp.2061-2064, Sep. 2010.[4] R. Gutierrez and J. Valls, “Low cost hardware implementation of logarithm approximation,” IEEE Trans. Very

Large Scale Integr. (VLSI) Syst., vol. 19, no. 12, pp. 2326-2330, Dec. 2011.

55

Page 15: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Chuyên san Công nghệ thông tin và Truyền thông - Số 7 (10-2015)

[5] B.-G. Nam, H. Kim and H.-J. Yoo: “A low-power unified arithmetic unit for programmable handheld 3-Dgraphics systems,” IEEE J. Solid-State Circuits, vol. 42, no. 8, pp. 1767-1778, Aug. 2007.

[6] J. N. Mitchell, “Computer multiplication and division using binary logarithms,” IEEE Trans. Electron. Comput.,vol. 11, no. 11, pp. 512-517, Aug. 1962.

[7] M. Combet, H. Van Zonneveld, and L. Verbeek, “Computation of the base two logarithm of binary numbers,”IEEE Trans. Electron Comput., vol. EC-14, no. 6, pp. 863-867, Dec. 1965.

[8] E. L. Hall, D. D. Lynch, and S. J. Dwyer, “Generation of products and quotients using approximate binarylogarithms for digital filtering applications,” IRE Trans. Comput., vol. 19, pp. 97-105, Feb. 1970.

[9] S. Sangregory, C. Brothers, D. Gallagher, and R. Siferd, “A fast, low-power logarithm approximation with CMOSVLSI implementation,” in Proc. 42nd Midwest Symp. Circuits Syst., pp. 388-391, Aug. 1999.

[10] K. H. Aded and R. E. Siferd, “CMOS VLSI implementation of low power logarithmic converter,” IEEE Trans.Comput., vol. 52, no. 9, pp. 1221-1228, Nov. 2003.

[11] Tso-Bing Juang, Sheng-Hung Chen, and Huang-Jia Cheng, “A lower error and ROM-free logarithmic converterfor digital signal processing applications,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 12, pp. 931-935,Dec. 2009.

[12] D. De Caro, N. Petra, and A. G. M. Strollo, “Efficient logarithmic converters for digital signal processingapplications,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 10, pp. 667-671, Oct. 2011.

[13] S. Nasayama, T. Sasao, and J. T. Butler, “Programmable numerical function generators based on quadraticapproximation: architecture and synthesis method,” in Proc. Asia and South Pacific Conference on DesignAutomation, pp. 378-383, Jan. 2006.

[14] Florent de Dinechin, and Arnaud Tisserand, “Multipartite table methods,” IEEE Trans. Comput., vol. 54, no.3, pp. 319-330, Mar. 2005.

[15] S. Paul, N. Jayakumar, and S. P. Khatri, “A fast hardware approach for approximate, efficient logarithm andantilogarithm computations,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 2, pp. 269-277,Feb. 2009.

[16] Van-Phuc Hoang and Cong-Kha Pham, “Novel Quasi-Symmetrical Approach for Efficient Logarithmic andAnti-logarithmic Converters,” in Proc. VDE-IEEE 8th Conference on Ph.D. Research in Microelectronics &Electronics (PRIME2012), pp. 111-114, Jun. 2012.

[17] M. B. Sullivan and E. E. Swartzlander, “Truncated Logarithmic Approximation,” in Proc. 2013 21st IEEESymposium on Computer Arithmetic (ARITH), pp. 191-198, Apr. 2013.

[18] M. Chaudhary and P. Lee, “Two-stage logarithmic converter with reduced memory requirements,” IETComputers & Digital Techniques, vol.8, no. 1, pp. 23-29, Jan. 2014.

[19] M. Chaudhary and P. Lee, “An Improved Two-Step Binary Logarithmic Converter for FPGAs,” IEEETransactions on Circuits and Systems II: Express Briefs, vol. 62, no. 5, pp. 476-480, May 2015.

Manuscript received 01-12-2014; accepted 24-11-2015.�

Hoang Van Phuc was born in Hung Yen, Vietnam. He received the B.E. (Bachelor ofEngineering) degree in Electronics-Telecommunications, the M.E. (Master of Engineering)degree in Electronic Engineering both from Le Quy Don Technical University, Hanoi, Vietnam,in 2006 and 2008, respectively and the PhD degree in Electronic Engineering from theUniversity of Electro-Communications, Tokyo, Japan in September 2012. He is working asa lecturer, Head of Microelectronic and Microprocessing Laboratory at Faculty of Radio-Electronics, Le Quy Don Technical University, Hanoi, Vietnam. His research interests includedigital circuits and systems, reconfigurable hardware systems and VLSI architecture for digitalsignal processing.

56

Page 16: Chuyên san Công ngh» thông tin và Truy•n thông - SŁ 7 (10-2015) …fit.mta.edu.vn/chuyensan/Pdf_Files/102015/07_04.pdf · Chuyên san Công ngh» thông tin và Truy•n

Tạp chí Khoa học và Kỹ thuật - Học viện KTQS - Số 172 (10-2015)

Sai Van Thuan received the B.E. degree in Electronics-Telecommunications and the M.E.degree in Electronic Engineering both from Le Quy Don Technical University, Hanoi, Vietnam,in 2004 and 2010, respectively. He is working towards the PhD degree at Faculty of Radio-Electronics, Le Quy Don Technical University, Hanoi, Vietnam. His research interests includearithmetic hardware design and VLSI design for digital signal processing.

Pham Cong Kha received the B.S., M.S. and Ph.D. degrees all from the Sophia University,Tokyo, Japan, in Electronics Engineering. From April 1992 to March 1996, he was with theDepartment of Electrical and Electronics Engineering, Sophia University, Tokyo, Japan, wherehe was a researcher assistant. From April 1996 to March 2000, he was with the Departmentof Information Systems at the Tokyo University of Information Science, Chiba, Japan, wherehe was an Assistant Professor. Since April 2000, he has been an Associate Professor of theDepartment of Electronic Engineering at the University of Electro-Communications, Tokyo,Japan. His research interests include design of analog and digital systems using integratedcircuits.

57