implementation of wallace tree multiplier using adders.docx
TRANSCRIPT
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
1/7
International Journal of Emerging Engineering Research and Technology V3 I8 August 201 110110 International Journal of Emerging Engineering Research and Technology V3 I8 August 201
International Journal of Emerging Engineering Research and Technology
Volume 3! Issue 8! August 201! "" 110#11$
I%%& 23'(#'3( )"rint* + I%%& 23'(#''0( ),nline*
-esign and Im.lementation of /allace Tree ulti.lier sing
ogge %tone Adder and rent ung Adder
45 %hireesha1! -r5 45 ana6a -urga
2
1M.E, Department of Electronics and Communication Engineering, M.V.S.R Engineering College, Hyderabad
2rofessor ! H"D,Department of #nformation$ec%nologyEngineering,M.V.S.REngineeringCollege,Hyderabad
A%TRA7T
A fixed point Wallace tree multiplier architecture is used to perform multiple multiplications on different data
paths. Wallace tree multiplier using Kogge Stone and Brent Kung adders perform more number of
multiplications in parallel with fewer extra carry save adder stages than existing multiplier. he modified n!bit
Wallace tree multiplier structure is used to perform four "n#$%&"n#$%!bit multiplications' two n&"n#$%!bit
multiplications and one n&n!bit multiplication in parallel. (n the existing Wallace tree multiplier design )arryloo*ahed adder ")+A% is used. o further improve the speed and to reduce the area parallel prefix adders are
used in the modified Wallace tree multiplier. he Kogge Stone adder "KSA% is used for high!speed and Brent
Kung adder "BKA% is used to reduce the area. Wallace tree multiplier using KSA and BKA are implemented
using ,ilinx 1-.$
eyords9 Kogge stone adder' Brent Kung adder' Wallace tree multiplier.
I&TR,-7TI,&
he performance of the embedded system' microprocessor and many modern S/ applications are
mainly dependent on the performance of the multipliers as it is the *ey element. An efficient system
can be build by ma*ing suitable multiplier for the system. A multiplier has two stages' partial product
generation and partial product addition. An n bit multiplier produce n number of partial products andto reduce this number of partial products modified Booth algorithm 1 was used. But booth encoder
in this multiplier increases the circuit depth. he n bit Baugh Wooly array multiplier $ has the
circuit depth 2"n%. (n modern technology' vector processors - are playing a ma3or role to achieve
data level parallelism "+/%. 4ere multiple operands are following the multiple data paths in the same
hardware. So only one instruction is used to perform the operations on the vector of data. he twin
precision based array multiplier is explained in 5 is used for data level parallelism. Where the full
precision multiplier is used to implement two half precision multiplications with circuit depth of 2"n%.
hat is the 6!bit multiplier is used to perform two 5!bit multiplications or one 6 bit multiplication at a
time. he depth of )arry Save Adder ")SA% is 2"1%. he depth of the carry save addition tree is
2"log$ n% for Wallace tree 7 based multiplier and 2"n% for Braun based7 multiplier' where n is the
number of bits. he 8uarter precision Wallace tree multiplier 9 is used to produce the multiple
multiplications to improve the speed with increased circuit depth as trade off. o further improve thespeed and reduce the area Wallace tree multiplier using Kogge Stone adder and Brent Kung adder :!
6 are used.
32#IT /A::A7E TREE :TI":IER
he modern digital signal processor re8uires large multiplier to compute complex signal processing
operations. he S/ processor shows the need for 95!bit ;ultiplier' where four -$&-$!bit
multiplications or sixteen 19&19!bitmultiplications or four19&-$!bit multiplications or four6&6!bit
multiplications are performed using one 95!bit ;ultiplier in parallel. (n the same way' the Wallace
tree multiplier architecture is allowed to perform more than one multiplication in parallel to achieve
data level parallelism in vector processors. he -$!bit Wallace tree multiplier is having 6 carry save
stages and 75!bit final adder to get the product. Where the modified -$!bit Wallace structure is having
*Address for correspondence:
sirihanc y < g m a il.com
mailto:[email protected]:[email protected] -
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
2/7
45 %hireesha+ -r5 45 ana6a -urga ;-esign and Im.lementation of /allace Tree ulti.lier sing
ogge %tone Adder and rent ung Adderach of the Bloc* ( will
act as19!bit Wallace tree multipliers carry reduction tree and they are getting the partial products
from each of the 8uarters he Bloc* ( is having 9carry save stages. he final carry and sum from
)SA15 of Bloc* ( is sent to $7!bit )+A to get four19!bit multiplication results at a time.
=ig15 &allace tree multiplier
=ig25artial product arrangement
Bloc* ((( respectively. >ach of the Bloc*s ( will act as19!bit Wallace tree multipliers carry reduction
tree and they are getting the partial products from each of the 8uarters. he Bloc* ( is having six carry
save stages. he final carry and sum from )SA15 of Bloc* ( is sent to $7!bit )+A to get four19!bit
multiplication results at a time.
loc6s of 32#>it /allace Treeulti.lier
loc6 II
=ig35'loc( # of &allace tree multiplier
he inputs to Bloc* (( are from the output of )SA15 of two of the Bloc* (' which tends to produce
two19&-$!bit multiplication results in parallel. he Bloc* (( is having two carry save stages and one
50!bit recursive doubling based )+A. Similarly the inputs to bloc* ((( are from the output of )SA15
of the entire Bloc* (' which tends to produce one -$&-$!bit multiplication result. he Bloc* ((( is
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
3/7
having four carry save stages and one75!bit recursive doubling based )+A. And all the Bloc* (('
Bloc* ((( and $7!bit )+As are in parallel. herefore the critical path of the modified structure includes
Bloc* ( and Bloc* (((. So the total critical depth of the -$!bit Wallace tree multiplier is e8ual to the
addition of number of carry stages of Bloc* (' number of carry save stage of Bloc* ((( and depth of
75!bit recursive doubling based )+A. So the critical path of Wallace structure includes 10 carry save
stages and one 75!bit recursive doubling based )+A. So the Wallace structure re8uires two extra
carry save stages than existing -$!bit Wallace structure and these causes slightly increase in the worstpath delay of modified system than the existing multiplier. he -$!bit Wallace structure has -0 carry
save adders and one 75!bit recursive doubling based )+A. he Bloc* ( has15 )arry Save adders
")SA%' Bloc* (( has $ )arry Save adders and cell ((( has six )arry Save adders. So the -$!bit Wallacestructure has"515%?"$$%?"19% @ 99 )arry Save adders' four $7!bit recursive doubling based
)+As'
two 50!bit recursive doubling based )+As and one75!bit recursive doubling based )+A. And hence'this huge difference causes increase in total cell area' total number of cells and net power than
conventional structure.
loc6 III
loc6 III
=ig'5'loc( ## of &allace tree multiplier
=ig5'loc( ### of &allace tree multiplier
,-I=IE- /A::A7E TREE :TI":IER
(n Wallace tree multiplier 9 )+A is replaced with Kogge Stone adder "KSA% and Brent Kung
adder"BKA% to further improve the speed and reduce the area. Kogge Stone adder and Brent Kung
adders are the parallel prefix adders. /arallel!prefix structures are found to be common in high
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
4/7
performance adders because the delay is logarithmically proportional to the adder width .he parallel
prefix adders are more flexible and are used to speed up the binary additions. /arallel prefix adders
are obtained from )arry +oo* Ahead ")+A% structure. he construction of parallel prefix adder
involves three stages
1. /re! processing stage
$. )arry generation networ*-. /ost processing
"re#.ossessing stage
(n this stage we compute' generate and propagate signals to each pair of inputs A and B. hese signals
are given by the logic e8uations 1$
/i@Ai xor Bi "1%
i@Ai and Bi "$%
7arry generation netor6
(n this stage we compute carries corresponding to each bit. >xecution of these operations is carried
out in parallel. After the computation of carries in parallel they are segmented into smaller pieces. (tuses carry propagate and generate as intermediate signals which are given by the logic e8uations - and5
)/iC3@/iC*?1 and /*C3 "-%
)iC3@iC*?1 or "/iC*?1 and *C3% "5%
"ost .rocessing
his is the final step to compute the summation of input bits. (t is common for all adders and the sumbits are computed by logic e8uation 7 and 9C
)i!1@ "/i and )in% or i "7%
Si@/i xor )i!1 "9%
hese parallel prefix adders contains gray cells and bloc* cells. he blac* cell "B)% generates theordered pair' the gray cell ")% generates only left signal :.
=ig$5'lac( cell and gray cell
32 IT /A::A7E TREE :TI":IER %I&4 %A
=ig?5Modified &allace tree multiplier using )S*
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
5/7
Kogge!Stone adder is a parallel prefix form of carry loo*!ahead adder. A parallel prefix adder can be
represented as a parallel prefix graph consisting of carry operator nodes. he time re8uired to generate
carry signals in this prefix adder is 2"log n%. (t is a fastest adder design and common design for high
performance adders in industry .(n pre computation stage propagation and generation operations are
performed. Stage1 used 15 blac* cells and one gray cell. Stage$ used 1$ blac* cells and two gray
cells. Stage- used 6 blac* cells and 5 gray cells. Stage5 used 6 gray cells. (n final computation stage
sum and carry generated.1$#>it ogge %tone adder
loc6 II
loc6 III
=ig85 1+bit )ogge Stone adder
=ig(5'loc( ## of modified &allace tree multiplier using )S*
=ig105'loc( ### of modified &allace tree multiplier using )S*
32#IT /A::A7E TREE :TI":IER %I&4 A
=ig115Modified &allace tree multiplier using ')*
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
6/7
he cost and wiring complexity is greatly reduced using Brent Kung adders .(n pre computation stage
we are doing propagation and generation operations. After that stage 1 used seven blac* cells and one
gray cell. Stage $ used three blac* cells and one gray cell. Stage - used one blac* cells and one gray
cell. Stage 5 used one gray cell.stage7 used one gray cell. Stage 9 used three gray cells. Stage : used
seven gray cells. (n final computation stage sum and carry are generated.
1$#it rent ung Adder
loc6 II
loc6 III
=ig125 1+bit 'rent )ung adder
=ig135'loc( ## of modified &allace tree multiplier using ')*
=ig1'5'loc( ### of modified &allace tree multiplier using ')*
%I:ATI,& /AVE=,R%
-
7/25/2019 Implementation of Wallace Tree Multiplier Using adders.docx
7/7
he simulation is done using ,ilinx (S> 1-.$ tool. (n the above waveform a and b are -$!bit numbers.
he final output is 95!bit multiplication result. (n parallel four 19D19 bit and two 19D-$ bit results
also produced.
7,"ARI%,& ,= AREA A&- -E:A@ ,= /A::A7E TREE :TI":IER
%I&4 A--ER%
he delay of Wallace tree multiplier using Kogge Stone adder is improved from 60.E::ns to-$.519ns. Wallace tree multiplier using Brent Kung adder delay is improved from 60.E::ns to
70.E19ns and also the area is also reduced by three slices.
7,&7:%I,&
A -$!bit Wallace tree multiplier is modified and redesigned. he modified Wallace tree multiplier
achieves data level parallelism in vector processors. (n addition part )SA' )+A adders used to
enhance to prefix adders KSA' BKA. KSA adder having high!speed and BKA adder ta*e less area so
three adders using -$ bit Wallace tree multiplier is implemented by ,ilinx 1-.$ and hardware *it
=/A spartan-e.
RE=ERE&7E%
1 /.>. ;adrid' B. ;illar' and >.>. SwartFlander' G;odified booth algorithm for high radixmultiplicationH' (>>> (nternational )onference on )omputer esignC I+S( in )omputers and
/rocessors' pp. 116!1$1' 2ct.1EE$.
$ ).>. KoFyra*is and .A. /atterson' GScalable' vector processors for embedded systemsH' ;icro'
(>>> Journals and magaFines' vol. $-' no.9' pp. -9!57' $00-.
- S3alander ; and +arsson!>defors /' G4igh!Speed and +ow!/ower ;ultipliers sing the Baugh!
Wooley Algorithm and 4/; Leduction reeH' (>>> (nternational )onference on >lectronics'
)ircuits and Systems' page"s% --!-9' Sep. $006.
5 ;. S3alander and /. +arsson!>defors' G;ultiplication acceleration through twin precisionH' (>>>
ransactions on Iery +arge Scale (ntegration "I+S(% Systems' vol. 1:' no. E' pp. 1$--!1$59'
Sept. $00E.
7 ). S. Wallace' MA suggestion for a fast multiplier' (>>> ransactions on >lectronic )omputers'
vol. >)!1-' no. 1' pp. 151:' =eb. 1E95
9 ;ohamed Asan Basiri' ;. Samaresh )handra Naya* and Noor ;ahammad S*' M;ultiplication
Acceleration hrough Ouarter /recision Wallace ree ;ultiplier'(>>> $015.
: Sudheer *umar PeFerla' B La3endra Nai* '.esign and estimation of delay' power' and area forparallel prefix adders' (>>> $015.
6 Adila*shmi Siliveru' ;.Bharathi Mesign of Kogge stone and Brent *ung adders using
degenerate pass transistor logic' (nternational 3ournal of emerging science and engineering$01-.