a novel compression approach for dual ...journalstd.com/gallery/8-aug2020.pdfa novel compression...
Post on 16-Feb-2021
4 Views
Preview:
TRANSCRIPT
-
A NOVEL COMPRESSION APPROACH FOR DUAL
QUALITY 32 BIT DADDA MULTIPLIER
Swayamvarapu Rajesh Kumar1
M-Tech scholar
Department of Electronics and Communication Engineering
Visakha Institute of Engineering and Technology, Vishakhapatnam, Andhra Pradesh, India.
Bighneswar Panda2
Associate Professor
Department of Electronics and Communication Engineering
Visakha Institute of Engineering and Technology, Vishakhapatnam, Andhra Pradesh, India.
Dr.Murali Krishna Gurram3
Senior Research Fellow
Department of Geospatial Analytics
National University, Singapore
J.Harini Nayana4
Assistant Professor
Department of Electronics and Communication Engineering
Visakha Institute of Engineering and Technology, Vishakhapatnam, Andhra Pradesh, India.
Abstract
In this paper, we propose four 4:2 compressors, Which have the flexibility to switch between exact and
approximate mode of operation. In approximate mode, these dual-quality compressors provide higher velocities
and lower power consumption at a lower precision cost. In approximate mode, each of these compressors has
its own level of accuracy as well as different delays and power dissipations in approximate and accurate mode.
The use of these compressors in parallel multiplier structures gives configurable multipliers whose precision
The efficiencies of these compressors in a 32-bit Dadda multiplier are evaluated in a 45-nm standard CMOS
system by contrasting their para-meters with those of the cutting-edge estimated multipliers. The results of the
analysis show an average reduction in delay and power consumption of 46 per cent and 68 per cent In certain
image processing applications the efficacy of these compressors is also evaluated. As compared with no
estimated multipliers based on a compressor, the errors of the proposed multipliers were higher while the
design parameters were considerably better. Finally, our studies showed that the multipliers realized based on
the suggested compressors have, on Compressor, Accuracy, Approximate design, Computing, Configurable,
Delay, Power Consumption average, about 93% smaller FOM value compared with the considered
approximate multipliers.
Keywords:- Approximate operating mode-computing, 4:2Compressor, Accuracy, configuring, Delay(lag), Power.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 65
-
1. Introduction
In conventional digital VLSI design, one usually assumes that a usable circuit/system should
always provide definite and precise results. But in fact, such exact operations are seldom needed in our
non-digital worldly experiences. The world accepts “analog computation,” which generates “good
enough” results rather than totally accurate results .The data processed by many digital systems may
already contain errors. In many applications, such as a communication system, the analog signal
coming from the outside world must first be sampled before being converted to digital data. The digital
data are then processed and transmit ted in a noisy channel before converting back to an analog signal.
During this process, errors may occur anywhere. Furthermore, due to the advances in transistor size
scaling, factors such as noise and process variations which are previously insignificant are becoming
important in today’s digital IC design. Of course, not all digital systems can engage the error-tolerant
concept. In digital systems like control systems, output signal impeccability is extremely important,
and this denies the use of the error tolerant circuit. However, for many digital signal processing (DSP)
systems that process signals relating to human senses such as hearing, sight, smell, and touch, e.g., the
image processing and speech processing systems, the Error-tolerant circuits may be applicable.
While there are many works in designing approximate multipliers, the research efforts on accuracy
configurable approximate multipliers are limited. In a static segment method (SSM) is presented,
which performs the multiplication operation on an m-bit segment starting from the leading 1 bit of the
input operands where m is equal to or greater than n/2. Hence, an m × m multiplier consumes much
less energy than an n × n multiplier. Also, a dynamic range unbiased multiplier (DRUM) multiplier,
which selects an m-bit segment, starting from the leading 1 bit of the input operands, and sets the least
significant bit of the truncated values to “1,” has been proposed in [11]. In this structure, the truncated
values are multiplied and shifted to the left to generate the final output. Although, by exploiting
smaller values for m, the structure of [11] provides higher accuracy designs than those of [10], its
approach requires utilizing extra complex circuitry.
1.1. Exact 4:2 Compressor
To reduce the delay of the partial product summation stage of parallel multipliers, 4:2 and 5:2
compressors are widely employed [18]. Some compressor structures, which have been optimized for
one or more design parameters (e.g., delay, area, or power consumption), have been proposed. The
focus of this project is on approximate 4:2 compressors. First, some background on the exact 4:2
compressor is presented.
Fig. 1 Block diagram of 4:2 compressor.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 66
-
This type of compressor, shown schematically in Fig. 1, has four inputs (x1– x4) along with an
input carry (Cin), and two outputs (sum and carry) along with an output Count. The internal structure of
an exact 4:2 compressor is composed of two serially connected full adders, as shown in Fig. 2. In this
structure, the weights of all the inputs and the sum output are the same whereas the weights of the carry
and Count Outputs are one binary bit position higher. The outputs sum, carry, and Count are obtained
from
Fig. 2 Structure of the conventional 4:2 compressor.
II Literature Survey
This literature describes about, a novel multiplier architecture with tunable error
characteristics, that leverages a modified inaccurate 2×2 building block. Our inaccurate multipliers
achieve an average power saving of 31.78% - 45.4% over corresponding accurate multiplier designs,
for an average error of 1.39% - 3.32%. Using image filtering and JPEG compression as sample
applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for
the same power savings when compared to recent voltage over-scaling based power-error tradeoff
methods. The multiplier power savings to bigger designs highlighting the fact that the benefits are
strongly design-dependent is presented. An enhance the design to allow for correct operation of the
multiplier using a residual adder, for non error-resilient applications is presented.
2.1 Multiplier structures for low power applications
This literature describes about, Energy-efficient serial and parallel multiplier structures are
explored to see their suitability in the low and ultra low power design regimes. 16 × 16-bit serial and
state-of-art parallel multipliers are compared in 45 nm CMOS. A multiplier structure is proposed by
optimizing the architecture, gate sizes and the voltage supply. The proposed structure provides 15%
more throughput as compared to two-cycle parallel multiplier with the same energy consumption for
high speed applications. In the low speed design region, it provides 3.7X energy reduction compared to
the serial multiplier.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 67
-
2.2 Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking
This literature describes about, various arithmetic units for possible use in high-speed, high-
yield ALUs operated at scaled supply voltage with adaptive clock stretching. Logic optimization of
the existing arithmetic units (to create hybrid units) indeed make them further amenable to supply
voltage scaling is demonstrated. Such hybrid units result from mixing right amount of fast arithmetic
into the slower ones. Simulations on different hybrid adder and multipliers in BPTM 70 nm
technology show 18%-50% improvements in power compared to standard adders with only 2%-8%
increase in die-area at iso-yield. These optimized data path units can be used to construct voltage
scalable robust ALUs that can operate at high clock frequency with minimal performance degradation
due to occasional clock stretching.
2.3 A reconfigurable approximate carry look-ahead adder
This literature describes about, a fast yet energy-efficient reconfigurable approximate carry
look-ahead adder (RAP-CLA) is implemented. This adder has the ability of switching between the
approximate and exact operating modes making it suitable for both error-resilient and exact
applications. The structure, which is more area and power efficient than state-of-the-art reconfigurable
approximate adders, is achieved by some modifications to the conventional carry look ahead adder
(CLA). The efficacy of the proposed RAP-CLA adder is evaluated by comparing its characteristics to
those of two state-of-the-art reconfigurable approximate adders as well as the conventional (exact) CLA
in a 15 nm FinFET technology. The results reveal that, in the approximate operating mode, the proposed
32-bit adder provides up to 55% and 28% delay and power reductions compared to those of the exact
CLA, respectively, at the cost of up to 35.16% error rate. It also provides up to 49% and 19% lower
delay and power consumption, respectively, compared to other approximate adders considered in this
brief. Finally, the effectiveness of the proposed adder on two image processing applications of
smoothing and sharpening is demonstrated.
2.4 Approximate data types for safe and general low-power computation
This literature describes about, Energy is increasingly a first-order concern in computer
systems. Exploiting energy-accuracy trade-offs is an attractive choice in applications that can tolerate
inaccuracies. Recent work has explored exposing this trade- off in programming models. A key
challenge, though, is how to isolate parts of the program that must be precise from those that can be
approximated so that a program functions correctly even as quality of service degrades. By using type
qualifiers to declare data that may be subject to approximate computation is implemented. Using these
types, the system automatically maps approximate variables to low-power storage, uses low-power
operations, and even applies more energy-efficient algorithms provided by the programmer. In addition,
the system can statically guarantee isolation of the precise program component from the approximate
component. This allows a programmer to control explicitly how information flows from approximate
data to precise data. Importantly, employing static analysis eliminates the need for dynamic checks,
further improving energy savings. As a proof of concept, EnerJ, an extension to Java that adds
approximate data types and hardware architecture that offers explicit approximate storage and
computation is implemented.
2.5 Reducing area complexity multiplier reduction
Multipliers are the basic unit for all signal processing applications and other applications. In
all technology advancement it plays a vital role, the targets are low power consumption, increase in
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 68
-
speed, reduction in area etc. The computations that are done by a modern computers that includes
microcomputers and microprocessor is astronomical. Even with the high speed computer chips the
process of the data coming from the devices all over the world requires efficient algorithms and to
achieve the compatibility we need to use the chip area effectively. The most often encountered
computation in data processing or signal processing is the operation of multiplication. This architecture
is to present a novice solution to reduce the total area of the multiplier by modifying the partial
products addition multiplier. Generally, to compute the data with high speeds modern hardware uses
the Wallace tree or dada multiplication techniques. By reducing the number of partial products
addition the number of gates can be reduced used to obtain the final result. In this proposed method
we reduced the real-estate of the chip by using more number of full adder in the earlier stages of the
partial products addition which is not present in the conventional multipliers.
III EXISTING SYSTEM
In order to diminish the time-taking of the intermediate products of the adder stages of multipliers, for providing high speed and lower power consumption with minimized area, Compressors are
equipped instead of regular Adders. Compressors based on their sizes can minimize as many inputs at
a time resulting in improved speed, reduced delay, minimized area on chip and lowered power
consumption. The basic adders- Half adder and full adder can only minimize two, three inputs at a
time where they are said to be 2:2 compressor and 3:2 compressor respectively
Fig 3 : Half-adder also termed as 2:2 Compressor
Fig4: Full-adder also termed as 3:2 Compressor
IV. Proposed Dual-Quality 4:2 Compressors
The proposed DQ4:2Cs operate in two accuracy modes of approximate and exact. The
general block diagram of the compressors is shown in Fig. 3. The diagram consists of two main parts
of approximate and supplementary. During the approximate mode, only the approximate part is
exploited while the supplementary part is power gated. During the exact operating mode, the
supplementary and some parts of the approximate parts are utilized.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 69
-
Fig. 5 Block diagram of the proposed approximate 4:2 compressors.
The hachured box in the approximate part indicates the components, which are not shared
between this and supplementary parts. In the proposed structure, to reduce the power consumption and
area, most of the components of the approximate part are also used during the exact operating mode.
We use the power gating technique to turn OFF the unused components of the approximate part. Also
note that, as is evident from Fig. 3, in the exact operating mode, tristate buffers are utilized to
disconnect the outputs of the approximate part from the primary outputs. In this design, the
switching between the approximate and exact operating modes is fast. Thus, it provides us with the
opportunity of designing parallel multipliers that are capable of switching between different accuracy
levels during the runtime. Next, we discuss the details of our four DQ4:2Cs based on the diagram shown
in Fig. 3. The structures have different accuracies, delays, power consumptions, and area usages. Note
that the i th proposed structure is denoted by DQ4:2Ci . The basic idea behind suggesting the
approximate compressors was to minimize the difference (error) between the outputs of exact and
approximate ones.
Structure 1 (DQ4:2C1): For the approximate part of the first proposed DQ4:2C structure, as shown in
Fig. 4(a), the approximate output carry (i.e., carry_) is directly connected to the input x4 (carry_ = x4),
and also, in a similar approach, the approximate output sum (i.e., sum_) is directly connected to input
x1 (sum_ = x1).
In the approximate part of this structure, the output Count is ignored. While the approximate part of
this structure is considerably fast and low power, its error rate is large (62.5%). During the
approximate mode, only the approximate part is exploited while the supplementary part is power
gated. During the exact operating mode, the supplementary and some parts of the approximate parts
are utilized. In the proposed structure, to reduce the power consumption and area, most of the
components of the approximate part are also used during the exact operating mode. We use the power
gating technique to the unused components of the approximate part.
Structure 2 (DQ4:2C2): In the first structure, while ignoring Cout simplified the internal structure of
the reduction stage of the multiplication, its error was large. In the second structure, compared with
the DQ4:2C1, the output Cout is generated by connecting it directly to the input x3 in the approximate
part. Fig. 5 shows the internal structure of the approximate part and the overall structure of DQ4:2C2.
While the error rate of this structure is the same as that of DQ4:2C1, namely, 62.5%, its relative error is
lower.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 70
-
Fig. 6 Approximate part and (b) overall structure of DQ4:2C2.
Structure 3 (DQ4:2C3): The previous structures, in the approximate operating mode, had maximum
power and delay reductions compared with those of the exact compressor. In some applications,
however, a higher accuracy may be needed. In the third structure, the accuracy of the approximate
operating mode is improved by increasing the complexity of the approximate part whose internal
structure is shown in Fig. 7(a).
Fig. 7(a) Approximate part of DQ4:2C3 and (b) overall structure of DQ4:2C3.
In this structure, the accuracy of output sum_ is increased. Similar to DQ4:2C1, the
approximate part of this structure does not support output Count. The error rate of this structure,
however, is reduced to 50%. The overall structure of DQ4:2C3 is shown in Fig. 6(b) where the
supplementary part is enclosed in a red dashed line rectangle. Note that in this structure, the utilized
NAND gate of the approximate part (denoted by a blue dotted line rectangle) is not used during the
exact operating mode.
Hence, during this operating mode, we suggest disconnecting supply voltage of this gate by
using the power gating. 4) Structure 4 (DQ4:2C4): In this structure, we improve the accuracy of the
output carry_ compared with that of DQ4:2C3 at the cost of larger delay and power consumption
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 71
-
where the error rate is reduced to 31.25%. The internal structure of the approximate part and the overall
structure of DQ4:2C4 are shown in Fig. 7. The supplementary part is indicated by red dashed line
rectangular while the gates of the approximate part, powered OFF during the exact operating mode,
are indicated by the blue dotted line. Note that the error rate corresponds to the occurrence of the errors
in the output for the complete range of the input.
Fig. 8 (a) Approximate part of DQ4:2C4 and (b) overall structure of DQ4:2C4..
4.1 MULTIPLIER DESIGN
Dadda multipliers realized by the proposed compressors are studied. A proper combination of
the proposed compressors may be utilized to achieve a better tradeoff between the accuracy and
design parameters.
As an option, the use of both DQ4:2C1 and DQ4:2C4 for the LSB and MSB parts in the
multiplication, respectively, Essential design targets of multiplier include high speed, low power
consumption, regularity of layout and hence less area or even combination of them in one multiplier
are required thereby making them suitable for various VLSI implementations.
Dadda multiplier is a hardware multiplier designed similar to Wallace multiplier. Unlike
Wallace multipliers that perform reductions as much as possible on each layer, Dadda multipliers do
as few reductions as Possible. Due to this, Dadda multipliers have less expensive reduction phase, but
the numbers may be a few bit longer, thus requiring slightly bigger Adders. This implies that fewer
columns are compressed in the initial stages of the column compression tree, and more columns in the
later levels of the Multiplier.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 72
-
Fig. 9 : Reduction circuitry of an 8-bit Dadda mutiplier.
V . EVALUATED RESULTS
In this fragment, the efficiencies of the approximate mode put-forward 4:2 compressors and the Exact
4:2 compressors are evaluated using Verilog coding to equip them in the multiplier, and the simulated
results are collected. The outputs are simulated by inputs given. By studying and comparing the results to
the conventional / theoretical multiplier outputs, the error percentile the results and comparing them to the
conventional/ theoretical multiplier outputs, the error percentage of the multiplier is calculated.
Fig10: Simulation results of approximate multipliers
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 73
-
Fig 11: RTL schematic of approximate multipliers
Fig 12: Technology schematic of approximate multiplier
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 74
-
Fig 13: Design report of approximate multipliers
Fig 14: Timing report of approximate multiplier
VI. Conclusion
For 8-bit, 16-bit, 32-bit and 64-bit multipliers the Dadda multipliers delay, area, power , energy,
and EDP using the proposed approximate compressors are improved compared to the Dadda multipliers
using the exact compressor. The enhancements increase as the length of the bits increases. Using this
approach, we can infer that system output (speed) can be improved with the use of reduced area and
estimated compressor delay in the system Hence the project has been successfully synthesized and
simulated using Xilinx tool. In this project had the flexibility of switching between the exact and
approximate operating modes by using the control signal. Dadda Multiplier is faster than other
multipliers and requires less gates than Wallace Multiplier and a low power consumption. The
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 75
-
compressors used has its own accuracy level in both the exact and approximate mode with variable
delay and less power consumptionIn future, the proposed work can be extended to reduce the
remainder size to N/2 bits like Karatsuba algorithm and the remainder can be calculated based on Bit
reduction Technique. Surely, this work can reduce the delay further and it can be used in ultra-high
speed processors. In BCD number system a group of binary bit is used to represent each of 10 decimal
digits. It can be designed to use it for Binary Coded Decimal (BCD) numbers and signed numbers. In
real time applications, BCD numbers have a vital role. BCD numbers are used to transfer the decimal
information into a computer, packet calculators, electronic counters, digital voltmeters and digital
clocks are the applications of BCD numbers. BCD code is referred as 8421 code. The implementation of
the arithmetic operations like addition, multiplication and division based on BCD numbers using Vedic
Mathematics will give the better result than the conventional methods
VII. References
[1] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power with an underdesigned
multiplier architecture,” in Proc. 24th Int. Conf. VLSI Design, Jan. 2011, pp. 346–351.
[2] D. Baran, M. Aktan, and V. G. Oklobdzija, “Multiplier structures for low power applications in
deep-CMOS,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2011, pp. 1061–1064.
[3] S. Ghosh, D. Mohapatra, G. Karakonstantis, and K. Roy, “Voltage scalable high- speed robust
hybrid arithmetic units using adaptive clocking,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 18, no. 9, pp. 1301–1309, Sep. 2010.
[4] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “RAP-CLA: A reconfigurable
approximate carry look-ahead adder,” IEEE Trans. Circuits Syst. II, Express Briefs, doi:
10.1109/TCSII.2016.2633307.
[5] A. Sampson et al., “EnerJ: Approximate data types for safe and general low-power
computation,” in Proc. 32nd ACM SIGPLAN Conf. Program. Lang. Design Implement. (PLDI), 2011,
pp. 164–174.
[6] A. Raha, H. Jayakumar, and V. Raghunathan, “Input-based dynamic reconfiguration of
approximate arithmetic units for video encoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 24, no. 3, pp. 846–857, May 2015.
[7] J. Joven et al., “QoS-driven reconfigurable parallel computing for NoC-based clustered
MPSoCs,” IEEE Trans. Ind. Informat., vol. 9, no. 3, pp. 1613–1624, Aug. 2013.
[8] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu, “On reconfigurationoriented approximate
adder design and its application,” in Proc. IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD),
Nov. 2013, pp. 48–54.
[9] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, “A low latency generic accuracy configurable
adder,” in Proc. 52nd ACM/EDAC/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.
[10] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim, “Energy- efficient
approximate multiplication for digital signal processing and classification applications,” IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1180–1184, Jun. 2015.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 76
-
[11] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A dynamic range unbiased multiplier for approximate
applications,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Austin, TX, USA, Nov. 2015,
pp. 418–425.
[12] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed multiplier for error-tolerant
application,” in Proc. IEEE Int. Conf. Electron Devices Solid-State Circuits (EDSSC), Dec. 2010, pp. 1–4.
[13] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired imprecise computational
blocks for efficient VLSI implementation of soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 57, no. 4, pp. 850–862, Apr. 2010.
[14] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate compressors
for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.
[15] C. H. Lin and I. C. Lin, “High accuracy approximate multiplier with error correction,” in Proc. IEEE
31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 33– 38.
Science, Technology and Development
Volume IX Issue VIII AUGUST 2020
ISSN : 0950-0707
Page No : 77
top related