digital filter design using vhdl

DIGITAL FILTER DESIGN

CONTENTS

1.INTRODUCTION……………………………………………………………...5

2. ELECTRICAL FILTER……………………………………………6

3.COMPARISON OF IIR & FIR FILTER…………………………..8 I.BUTTERWORTH FILTER II.ELIPTICAL FILTER III.CHEBYCHEV FILTER

4.EFFECT OF POLES & ZEROES………………………………….11

5.BI-LINEAR TRANSFORMATION……………………………..…12

6.IIR FILTER REALIZATION……………………………………...18 I.DIRECT FILTER REALIZATION II.CASCADE FILTERREALIZATION

7.VHDL:THE LANGUAGE……………………………………..…..23 I.LEVELS OF ABSTRACTION II.BIT PARALLEL ARITHMATIC A.ADDITION & SUBSTRACTION B.MULTIPLICATION III.BIT SERIAL AITHMATIC A.ADDITION & SUBSTRACTION B.MULTIPLICATION C.SHIFT & ADD MULTIPLIERS D.SHIFT & PARALLEL MULTIPLIERS E.LATENCY F.THROUGHPUT

8.IMPLEMENTATION & ANALYSIS OF SUB-BLOCKS…….…37 I.ADDER II.DELAY III.SERIAL-PARALLEL MULTIPLIER IV.BOOTH MULTIPLIER V.MAC

9.IMPLEMENTATION & ANALYSIS OF FIR FILTERS…….….45 I.DIRECT FOR OF REALIZATION A.USING BIT PARALLEL ARITHMATIC

1


B.USING BIT SERIAL ARITHMATIC C.AREA ANALYSIS D.POWER ANALYSIS II.CASCADE REALIZATION A.USING BIT PARALLEL ARITHMATIC B.USING BIT SERIAL ARITHMATIC C.AREA ANALYSIS D.POWER ANALYIS

10.CONCLUSION………………………………………………….…….76

11.FUTURE PLANS……………………………………………………...77

12.VHDL CODES FOR FIR FILTERS…………………………….…...78 I.USING BIT PARALLEL ARITHMATIC A.4 BIT COUNTER B.BOOTH MULTIPLIER C.16 BIT FULL ADDER D.MULTIPLIER E.SERIAL PARALLEL CONVERTER F.FIR FILTER G.ALU DESIGN II.USING BIT SERIAL ARITHMATIC A.D FLIP-FLOP B.FULL ADDER C.HALF ADDER D.RIGHT SHIFTER E.DELAY F.PIPEINE G.FIR FILTER13.REFERENCES……………………………………………………………………..100

2


LIST OF IMAGES

FIG1:MAGNITUDE RESPONSE OF BUTTERWORTH FILTER………………………………..8FIG2:MAGNITUDE RESPONSE OF ELLIPTIC FILTER…............................................................8FIG3:MAGNITUDE RESPONSE OF CHEBYCHEV FILTER……………………………………9FIG4:EFFECTS OF POLES & ZEROES………………………………………………………....…9FIG5:STABLE TRANSFORMATION……………………………………………………………..12FIG6:IIR FILTER BLOCK………………………………………………………………………….17FIG7:DIRECT REALIZATION OF IIR FILTER…………………………………………………..19FIG8:CASCADE REALIZATION OF IIR FILTER………………………………………………..20FIG9:BIT PARALLEL RIPPLE CARRY ADDER………………………………………………...26FIG10:MATRIX PRODUCT OF MULTIPLICATION…………………………………………….27FIG11:ARRAY MULTIPLIER OF TWO’S COMPLEMENT NUMBERS………………………..28FIG12:BIT SERIAL ADDER & SUBSTRACTOR…………………………………………………29FIG13:SHIFT & ADD MULTIPLER………………………………………………………………..31FIG14:S/P MULTIPLIER USING SHIFT ACUMULATOR………………………………………..33FIG15:SIMPLIFIED S/P MULTIPLIER…………………………………………………………….33FIG16:SIMPLIFIED MULTIFIER STRUCTURE…………………………………………………..34FIG17:LATENCY & THROUGHPUT OF A PROCESSING ELEMENT…………………...……..35FIG18:INCREASED THROUGHPUT WITHOUT AFFECTING LATENCY……………………..35FIG19:TWO INPUT ADDER BLOCK………………………………………………………………36FIG20:RTL SCHEMATIC OF ADDER BLOCK…............................................................................36FIG21:OUTPUT RESULT OF ADDER BLOCK……………………………………………………37FIG22:BIT SERIAL ADDER BLOCK……………………………………………………………….38FIG23:TEST BENCH WAVEFORM OF BIT-SERIAL ADDER……………………….…...………38FIG24:BIT-SERIAL & PARALLEL DELAY BLOCK…………………………………..….……....39FIG25:TEST BENCH WAVEFORM OF DELAY BLOCK………………………….….….……….39FIG26:S/P MULTIPLIER BLOCK…………………………………………………….….….………40FIG27:OUTPUT RESULT OF S/P MULTIPLIER……………………………………….….….…...41FIG28:SIMULATION RESULT OF BOOTHS MULTIPLIER………………….…..….…………...42FIG29:MAC CIRCUIT…………………………………………………………..….…..…….….……43FIG30:DIRECT FORM REALIZATION OF FIR FILTER…………………………….…….………45FIG31:FIR FILTER DIAGRAM……………………………………...…………………..…..……….46FIG32:RTL REPRESNTATION OF FIR FILTER………………...………………………..….….….46FIG33:DIRECT FORM REALIZATION OF FIR FILTER( BIT PARALLEL)…….…......................47FIG34:DIRECT FORM REALIZATION OF FIR FILTER(BIT SERIAL)……………………….….48FIG35:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT PARALLEL)……………………………………………………………………….….….….……...….53FIG36:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT SERIAL)……………………………………………………………………………….….….…….…..54FIG37:CASCADE REALIZATION………………………………………………………….…….….58FIG38:CASCADER REALIZATION………………………………………………………………….59FIG39:FIR FILTER CASCADE REALIZATION……………………………………………………..59FIG40:STRUCTURE OF ARITHMATIC OPERATION……………………………………….….….60FIG41:FIR FILTER CASCADE REALIZATION USING BIT SERIAL ARITHMATIC….….…….61FIG42:OUPUT WAVEFORM OF POWER ANALYSIS OF LATICE REALIZATION USING BIT PARALLEL ARITHMATIC…………………………………………………………………....….…..64

3


LIST OF CHARTS

CHART 1: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMATIC……………………………………………………………………………..50

CHART 2: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USING BIT SERIAL ARITHMATIC…………………………………………………………………….….….….….51

CHART 3: POWER SUMMARY OF DIRECT FORM REALIZATION…………………….….…….51

CHART 4: DESIGN SUMMARY OF CASCADE FORM REALIZATION USING BIT PARALLEL ARITHMATIC…………………………………………………………………………….….….………61

CHART 5: DESIGN SUMMARY OF CASCADE FORM REALIZATION USING BIT SERIAL ARITHMATIC………………………………………………………………………….….…………….61

CHART 6: POWER SUMMARY OF CASCADE FORM REALIZATION…………………….…….62

CHART 7: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMATIC…………………………………………………………………………….72

CHART 8: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USING BIT SERIAL ARITHMATIC…………………………………………………………………………………72

CHART 9: POWER SUMMARY OF LATTICE REALIZATION OF FIR FILTER…………………..73

4


INTRODUCTION

Many of today’s electronic applications contain various types of signal processing. This includes systems used for music, radar, sonar, audio, video, and communication. Some of these represent small-volume markets, while others are high-volume consumer products like mobile phones.

There are many reasons behind the increased use of digital signal processing compared to its analog counterparts. One is the advent of VLSI, where large complex systems can be manufactured in large quantities at a low cost per unit. Another reason is that the use of digital circuitry removes the need for tuning, which analog circuits generally require. A stringent requirement on communication systems to efficiently utilize limited resources such as bandwidth and transmitter power has led to the use of complex signal processing algorithms that are only practical to implement using digital signal processing (DSP). Typical DSP operations are frequency selective and adaptive filtering, time-frequency transformations, and sample rate changes.

The signals to be processed are obtained either from nature itself or from man-made machines. The signal processing is generally aimed at extracting information or to transform the signal into a form more suited for transmission or storage. Signal processing systems often have finite time available to compute the result. Some systems can accept a missed dead-line, while others give an unacceptable error if a deadline is exceeded. The later is called hard real-time systems and are the type of signal processing systems discussed here.

The signal processing can be implemented using various signal representations and circuit techniques. The evolution has gone from analog time-continuous signal processing such as passive LC filters, through design and implementation of recursive digital filters using Bit-Serial Arithmetic for time analog circuits such as switched-capacitor filters, to purely digital implementations. Analog and switched-capacitor circuits are, however, still needed for interfacing of analog signals to the digital signal processing system through anti-aliasing filters, A/D and D/A converters, and for systems with very high bandwidths.

Theory, design, and implementation of high-performance DSP (sub)systems in terms of throughput, size, and cost are important research and development areas. Also, the increasing use of portable equipment together with the cost of cooling electronic equipment will be a strong incentive to increase the efforts of reducing the power consumption in the DSP (sub)systems.

The work presented in this report addresses several important issues in the design of DSP algorithm and hardware co-design with the aim of obtaining efficient architectures with respect to design effort, throughput, chip area, and power consumption; high-speed and low-power consumption in implementation of recursive digital filters.

5


ELECTRICAL FILTER

An Electrical Filter is a system that can be used to Modify, Reshape, or Manipulate the Frequency Spectrum of an Electrical Signal according to some prescribed requirements, viz. Attenuate a selected frequency component, Locate or Isolate a Frequency Component, and so on.

Digital filters can be designed using analog design methods by following these steps:

1. Filter specifications are specified in the digital domain. The filter type (highpass, lowpass, etc.) is specified.2. An equivalent lowpass filter is designed that meets these specifications.3. The analog lowpass filter is transformed using spectral transformations into the correct type of filter.4. The analog filter is transformed into a digital filter using a particular mapping.

Analog filters:

Classical theory for analogue filters operating below about 100MHz is generally based on "lumped parameter" resistors, capacitors, inductors and operational amplifiers (with feedback) which obey LTI differential equations: [ i(t) = Cdv(t)/dt,v(t) = Ldi(t)/dt,v(t)= i(t)R,v 0(t)=A v i(t)]. Analysis of such LTI circuits gives a relationship between input x(t) and output y(t) in the form of a differential equation:

b0 y ( t )+b1dy ( t )

dt+b2

d2 y ( t )dt 2 +⋯=a0 x ( t )+a1

dx ( t )dt

+a2d2 x ( t )

dt 2 +⋯

whose system (or transfer) functions is of the form:

Ha (s )=

a0+a1s+a2 s2+. ..+aN sN

b0+b1 s+b2s2+. ..+bM s M

This is a ratio of polynomials in ‘s’. The order of the system function is max(N,M). Replacing s by j gives the frequency-response H a (j), where denotes frequency in radians/second. For values of s with non-negative real parts, H a (s) is the Laplace Transform of the analogue filter’s impulse response h a(t). H(s) may be expressed in terms of its poles and zeros as:

Ha (s )=k

(s−z1 ) (s−z2 ) .. . (s−z N )(s−p1 ) (s−p2) .. . (s−pM )

6


The entire real life signals that are taken as inputs & processed are analog signals. But, in today’s world, all the systems and their components have been digitized. And for their utilization and processing in the digital computers, the analog signals have to be sampled, processed, and reconstructed via the digital system. Thus samplers and digital filter are an integrated part of today’s electrical components.

There are many methods for transforming an Analog Signal to a Digital Signal. Some preferred methods are listed below –

i. Backward Difference Method,ii. Impulse Invarianceiii. Bilinear Transformationiv. Step Invariance, and so on.

There is no optimum method. The selection criteria depends on the Sampling Frequency, Highest Frequency Component of the system, etc.

7


COMPARISON OF IIR AND FIR DIGITAL FILTERS

IIR type digital filters have the advantage of being economical in their use of delays, multipliers and adders. They have the disadvantage of being sensitive to coefficient round-off inaccuracies and the effects of overflow in fixed point arithmetic. These effects can lead to instability or serious distortion. Also, an IIR filter cannot be exactly linear phase.

FIR filters may be realized by non-recursive structures which are simpler and more convenient for programming especially on devices specifically designed for digital signal processing. These structures are always stable, and because there is no recursion, round-off and overflow errors are easily controlled. A FIR filter can be exactly linear phase. The main disadvantage of FIR filters is that large orders can be required to perform fairly simple filtering tasks.

Note the frequency response is the transfer function H(z) evaluated around the unit circle on the Argand diagram of z and since the shape of the transfer function can be determined from the positions of its poles and zeroes, so can be the frequency response.

The frequency response can be determined by tracing around the unit circle on the Argand diagram of the z plane:

project poles and zeroes radially to hit the unit circle

poles cause bumps zeroes cause dips the closer to the unit circle, the

sharper the feature

8


IIR filters can be designed using different methods. One of the most commonly used is via the reference analog prototype filter. This method is the best for designing all standard types of filters such as low-pass, high-pass, band-pass and band-stop filters.

Here is a summary of three continuous time low pass filters:

BUTTERWORTH FILTERS

Butterworth ensures a flat response in the passband and an adequate rate of rolloff. A good "all rounder," the Butterworth filter is simple to understand and suitable for applications such as audio processing.

FIG1: Magnitude response of Butterworth

Filters

ELLIPTIC FILTERS This filter has equiripple (the same amount of ripple in the passband and stopband).

FIG2: Magnitude response of Elliptic Filters

9


CHEBYCHEV FILTERS

The Chebyshev filter has ripple in the passband of the filter. There is also an Inverse Chebyshev analog filter is also known as Chebyshev filter II. Chebyshev-II has ripple in the stopband.

FIG3: Magnitude response of Chebychev Filters

The z-transform of the transfer function is of great importance for IIR filters. The location of poles in the z plane is used for testing stability of designed IIR filter. The poles of the IIR filter transfer function must be located within the unit circle in order that filter is stable.

Figure illustrates zeros and poles of the transfer function of a stable IIR filter in the z plane.

Transfer function zeros are denoted by small circles, whereas its poles are denoted by small crosses.

10


EFFECTS OF THE POLES AND ZEROS OF THE TRANSFER FUNCTION

The location of poles and zeros of the transfer function is very important for discrete-time system analyses and synthesis. In order that a discrete-time system is stable, all poles of the discrete-time system transfer function must be located within the unit circle. The location of zeroes doesn’t affect the stabilty of discrete-time systems. Recalling that FIR flters do not have a feedback, which makes them stable. However, this doesn’t apply on IIR filters. Therefore, it is preferable to use bilinear transformation because it always makes filter stable.

In the Impulse Invariance method, the derived signal has exactly the same unit-step, impulse, or sinusoid response as for the original analog filter with t=nT. Here aliasing may occur. But if order of filter ‘N’ is high enough, aliasing will be small enough to be acceptable, i.e., within our tolerance.

11


BILINEAR TRANSFORMATION

A transformation T (z) : z → w is called bilinear if it takes the form

This type of transformation occurs numerous times in electrical engineering, for example, as dielectric hysteresis, mutual impedance coupling between circuits, transmission line calculations, propagation in a stratified medium, loudspeaker impedance, & many more.

A continuous-time (CT) signal must be appropriately band-limited in order to avoid frequency aliasing distortions. Additionally, if the number of time samples used in a particular computation is constrained, the Nyquist approximation may do a poor job of representing the original signal.

In the 1960’s a basis expansion was proposed implementing a nonlinear frequency warping between a CT signal and its discrete-time (DT) representation according to the bilinear transform. Since there is a one-to-one relationship between the two frequency domains, this bilinear expansion theoretically avoids both the band-limited requirement and the frequency aliasing distortions associated with Nyquist sampling.

Furthermore, the DT expansion coefficients can be obtained using a cascade of first-order analog systems. Modern-day integrated circuit technology has made it practical to compute these coefficients through conventional circuit design techniques. Consequently, the bilinear expansion can be considered as a better procedure in filter designs in various applications.

In the Bilinear Transformation technique (BLT), we shall compress the analog frequency scale [0 to ∞] to [0 to 2π] in the digital filter. That is, we will compress an infinite frequency span to a finite span.

The philosophy of BLT is the following: If we are given an analog transfer function H a(s) we can always simulate Ha(s) in a basic Analog Circuit.

In the simulation of Ha(s), we require summation, multiplication by a constant and a dynamic element, namely an integrator. What people used to do is to use op-amps for integration and simulate any given transfer function by op-amps only. Multiplication by a constant alpha is either a potentiometer, if alpha is less than 1, or an op-amp if alpha is greater than 1. We can take care of both the plus sign and minus signs by inverting op-amp and non inverting op-amp.

12


Integration is done by putting a capacitor in the feedback loop and the integration is usually associated with a negative sign before the integral. The basic fact is that if we have an adder, a multiplier and a block of transfer function 1/s which describes an integrator, we can simulate any given analog transfer function. If we simulate the given transfer function by adders, multipliers and integrators, then we can convert that diagram into a digital filter because in a digital filter addition and multiplication are the same and there is no change; the only change is that we shall require a digital integrator.

The bilinear transform is defined by

which is accomplished by replacing ‘s’ by

s-plane to z-plane mapping

Here the entire jω axis maps into one complete revolution of the unit circle.

(z=eTs maps jω axis into infinite number of revolutions of the unit circle)

FIG4:STABLE TRANSFORMATION

13


PROCEDURE FOR BILINEAR TRANSFORMATION

Points:

1) Left half of s-plane mapping to inside of the unit circle in z-plane, i.e.,

2) Right half of s-plane mapping to outside of the unit circle in z-plane, i.e.,

Hence, a causal and stable continuous time system will be mapped to a causal and stable discrete-time system.

..… (1)

14


Unlike Impulse Invariant Transformation where the relationship was simply ω = ΩT as indicated, in BLT, there is a deviation from linearity because the relation between Ω and ω is nonlinear.

This is how an infinite axis is compressed to a finite axis, that is 0 to infinity is compressed to 0 to pi; this phenomenon is called Warping. So frequency scale is warped, which is a disadvantage. We shall do pre-warping or anti-warping so that the effect of warping ultimately is cancelled and we get what we want.So, ω’s are transformed to Ω’s and by the relationship

This is pre-warping, that is the digital filter frequencies are pre-warped to analog frequencies. Thus we get the specs on the corresponding analog filter.

There is absolutely no aliasing in Bilinear Transformation because the total transfer function is being transformed. In Impulse Invariant Transformation, only poles were transformed.

In Impulse Invariant Transformation, it is simply ωs/ωp because the relationship is linear. IIT is an approximation for the BLT relationship because for small theta, tanѲ can be replaced by Ѳ, which gives IIT. For small ω or Ω, IIT and BLT are indistinguishable.

Alternatively, if we have an inverse bilinear transform, we can follow these steps:1. Use the inverse bilinear transform on the filter specifications in the digital domain to produce equivalent specifications in the analog domain.2. Construct the analog filter transfer functions to meet those specifications.

3. Use the bilinear transform to convert the resultant analog filter into a digital filter.4. The Inverse Bilinear Transform can be expressed as Z=(1+s)/(1-s)

15


Two of the well known methods, the impulse invariance method & the matched Z-transform method are conceptually similar to sampling a continuous waveform that we're familiar with. Denoting the inverse Laplace transform by L−1 and the Z transform as Z, both these methods involve calculating the impulse response of the analog filter as a(t)=L−1{A(s)} and sampling a(t) at a sampling interval T that is high enough so as to avoid aliasing. The transfer function of the digital filter is then obtained from the sampled sequence a[n] asDa(z)=Z{a[n]}

However, there are key differences between the two.

Impulse invariance method:

In this method, you expand the analog transfer function as partial fractions as

where Cm is some constant and αm are the poles.

Mathematically, any transfer function with a numerator of lesser degree than the denominator can be expressed as a sum of partial fractions. Only low-pass filters satisfy this criterion (high-pass and bandpass/bandstop have at least the same degree), and hence impulse invariant method cannot be used to design other filters.

Matched Z-transform

In this method, instead of splitting the impulse response as partial fractions, you do a simple transform of both the poles and the zeros in a similar manner (matched) as βm→eβmT and αm→eαmT (also stability preserving), giving

You can easily see the limitation of both these methods. Impulse invariant is applicable only if your filter is low pass and matched z-transform method is applicable to bandstop and bandpass filters (and high pass up to the Nyquist frequency).

16

http://en.wikipedia.org/wiki/Partial_fraction

http://en.wikipedia.org/wiki/Matched_Z-transform_method

http://en.wikipedia.org/wiki/Matched_Z-transform_method

http://en.wikipedia.org/wiki/Impulse_invariance


Digital filters designed via bilinear transformation are guaranteed to be stable. However, the accurate values of coefficients are obtained immediately after the implementation of bilinear transformation. On filter realization, it is impossible to represent coefficients without an error. In software digital filter realization (implementation), the resulting coefficients are quantized, which also generates a certain error. Any error made during the quantization of coefficients affects more or less the frequency response, which may further cause the stopband attenuation to decrease.

17


IIR FILTER REALIZATION

IIR filter transfer function can be expressed as:

N is the filter order, bk the coefficient of non-recursive part of IIR filter ak the coefficient of feedback of IIR filter.

IIR Filter Difference Equation can be expressed as :

y[n] = b0x[n] + b1x[n] + …………..+bM-1x[n-(M-1)]- a1y[n-1]-a2y[n-1]-………..-aNy[n-N]

The block diagram of IIR filter is as follows :

FIG5:BOCK DIAGRAM OF IIR FILTER

18


DIRECT REALIZATION

Direct realization of IIR filters starts with this expression:

The first part of the expression refers to non-recursive part and the other refers to recursive part of IIR filter. In IIR filter direct realization, these two parts are separately considered and realized.

The realization of non-recursive part of IIR filter is identical to the direct realization of FIR filter. Figure illustrates the block diagram of direct realization of non-recursive part of IIR filter.

Realization of non-recursive part of IIR filter is similar to that of recursive part. Figure illustrates the direct realization of the filter recursive part.

19


As non-recursive and recursive part of IIR filter are separately realized, it doesn’t matter which of them will be used first in filtering process.

Direct realization is very convenient for software implementation and this is where it is most commonly used.

Some of disadvantages of this realization are the greatest sensitivity to accuracy of realized coefficients (i.e. the largest finite word-length effect), and the greatest complexity due to implementation (i.e. needs most resources).

CASCADE REALIZATION

Cascade realization structure is the most difficult to obtain from the transfer function (comparing to other realization structures given in this book). It is very convenient for its modular structure and less sensitivity to the accuracy of non-recursive and recursive coefficients realization. On cascade IIR filter realization, a filter is divided into several, mutually independent sections of the first or second order.

Since the sections are mutually independent after design process, the finite word-length effect on the accuracy of coefficients, modulation of frequency response and IIR filter stability are separately examined for each section. The analyze is simplified this way.

20


The IIR filter transfer function is expressed as:

bi are the coefficients of transfer function numerator ; aj are the coefficients of transfer function denominator; H0 is a constant; qi are the zeros of the transfer function; pj are the poles of the transfer function; B(z) is the transfer function of non-recursive part; A(z) is the transfer function of recursive part ; M is the number of sections in cascade realization structure.Cascade realization requires the given expression to be factorized so that the transfer function

is expressed as follows:

a[i, k] are the coefficients of recursive part of the i-th IIR filter section;b[i, k] are the coefficients of non-recursive part of the i-th IIR filter section.

Figure illustrates a second-order section.

FIG6: SECOND ORDER FILTER

21


The use of direct transpose realization structure reduces necessary number of delay lines and adders as well. Filter dividing in independent sections reduces the sensitivity to the accuracy of quantization coefficients and simplifies analysing the stability of the resulting filter. Besides, the possibility that IIR filter becomes instable after quantization is drastically reduced as the coefficients quantization is performed after dividing filter in sections, so the changes of poles locations are smaller, therefore.

Software realization requires M buffer of length 2 or 1. Each section must have its own buffer for saving samples of intermediate signals. Such complexity and needed factorization are two main disadvantages of this realization structure.

Figure illustrates the block diagram describing cascade IIR filter structure.

FIG7: CASCADE REALIZATION OF IIR FILTER

22


VHDL - THE LANGUAGE

The VHSIC Hardware Description Language (VHDL) is an industry standard language used to describe hardware from the abstract to concrete level. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Provides extensive range of modelling capabilities, it is possible to quickly assimilate a core subset of the language that is both easy and simple to understand without learning the more complex features.It’s very useful in teaching top-down design. We can design a system at high level & express the algorithm in VHDL. We can then simulate and debug the designs at this level before actually proceeding with detailed logic design. A dataflow level of description offers a combination of the behavioural and structural levels of description.

LEVELS OF ABSTRACTION

1) Data Flow level :In this style of modelling the flow of data through the entity is expressed using

concurrent signal assignment statements.2) Structural level :

In this style of modelling the entity is described as a set of interconnected statements.3) Behavioral level :

This style of modelling specifies the behavior of an entity as a set of statements that are executed sequentially in the specified order.VHDL utilizes these two types of computational procedure,

1) Bit-Parallel Arithmetic2) Bit-Serial Arithmetic

1) Inputs to a bit-parallel arithmetic operation are stored in registers. In bit-parallel arithmetic, all bits are conceptually processed at once, i.e. all bits in the inputs are applied in parallel and all of the bits in the output occur simultaneously and the obtained output is stored in the registers.

An advantage of bit-parallel arithmetic is that the amount of work performed by a processing element during one clock cycle is relatively large, and the clock frequency can therefore be kept low. It means it has high computational speed.

23


Disadvantage of bit-parallel arithmetic is that it has high power consumption and chip area as compared to bit-serial arithmetic.

2) In serial arithmetic one bit of the input data is processed in each clock cycle, generally starting with the LSB.

Advantage in bit-serial arithmetic is its power consumption. Bit serial digital filters have less power consumption because of serial parallel multiplier. Also it consumes smaller area compared to bit parallel.

Disadvantage of bit-serial arithmetic is their design complexity. The design time for the bit-serial system increases due to the higher complexity of timing the bit-serial streams.

The potential performance of bit-serial processing elements may be somewhat degraded due to practical problems with high frequency clocking.

BIT SERIAL ARITHMETIC & BIT PARALLEL ARITHMETIC

Numbers may be described as floating-point or fixed-point numbers. Floating-point numbers use

a signed mantissa M and a signed exponent E to represent a number F = M×βE ,where β is the

base of the number. Fixed point numbers on the other hand have a fixed exponent with the binary

point in the mantissa is always located at the same position, independent of the represented

number.

The variable exponent in the floating-point number representation enables a large number range,

but quantization introduces value-dependent errors which may be troublesome in some

algorithms. Most DSP algorithms do not require the increased number range of floating-point

numbers if appropriate measures are taken to scale the signal levels in the algorithm.

Parasitic oscillations in a system using floating-point arithmetic are in general harder to suppress

compared to a system using fixed-point arithmetic. Implementation of fixed-point arithmetic is

also less complex compared to floating-point arithmetic, making fixed-point arithmetic the

preferred number representation in many cases. Floating-point arithmetic thus becomes slower,

24


consumes more power, and requires more chip area. Signed fixed-point numbers can be

described using various representations. One representation is sign-magnitude representation,

where a sign bit denotes the sign of the number, and the rest of the bits denote the magnitude.

There is in this case two representations of the number zero that will increase the complexity of

the implementation of additions and subtractions. Other representations include one’s-

complement, two’s complement, bias, and signed-digit code.

Most fixed-point systems use two’s-complements to represent signed fixed point numbers.

Signed addition and subtraction are then treated as unsigned addition and subtraction. The most

significant bit has a negative weight, while the other bits have positive weight. Two’s-

complement representations will be assumed throughout the rest of the text.

A number X represented in two’s complement is shown in Eq. (4.1). The number range is here

limited to -1 ≤ X < 1. Larger number ranges is achieved by scaling the representation by a factor

2k, where k is the required number of integer bits.

……………………………………………Eq.4.1.

One important property of two’s-complement representation is that a sum of numbers can be

computed in an arbitrary order. An overflow in an intermediate result can be neglected if the

correct sum is within the available number range. This means that the order of the additions is

unimportant with regard to overflow, and it is therefore possible to rearrange the order without

affecting the final result.

There are beside the ordinary binary number representations also redundant representations [8],

with multiple representations of a single number. Operations involving comparison of numbers

using this type of representation is however often difficult to implement. Some of these

redundant representations are easy to convert into ordinary binary numbers, e.g., signed-digit

code. Others, like Residue Number systems, are difficult to encode and decode to and from

ordinary non redundant binary numbers, but they are efficient for certain operations as long as

conversion between the number systems is not required.

25


The most common operations in DSP algorithms are additions, subtractions, and multiplications.

Multiplications with fixed coefficients are common, which enables the designer to simplify the

hardware. Such simplifications save resources with a possible speed-up.

1) BIT-PARALLEL ARITHMETIC:-

Typically, inputs and outputs to a bit-parallel arithmetic operation are stored in registers. In bit-

parallel arithmetic, all bits are conceptually processed at once, i.e., all bits in the inputs are

applied in parallel and all of the bits in the output occur simultaneously. However, in practice it

is necessary to process them sequentially. An advantage of bit-parallel arithmetic compared to

bit-serial arithmetic is that the amount of work performed by a processing element during one

clock cycle is relatively large, and the clock frequency can therefore be kept low.

ADDITION AND SUBTRACTION:-

A sum Z of two numbers X and Y in two’s-complement representation is computed by adding

the bits two and two, as shown in Eq. (4.2). Carry values are propagating from least significant

bit (LSB) up to the most significant bit (MSB).

.Eq.4.2.

This can be implemented in parallel using a set of full-adders, which adds the bits on the same

significance level including a carry bit from the lower significance level. A straightforward

implementation is shown in Fig.4.1. The carry input at the LSB is set to zero, and the carry

output from each significance level is connected to the next significance level.

The result bit si depends on every input bit of equal or lower significance level. There will

therefore be a combinatorial path from LSB through all full-adders to the MSB resulting in a

long propagation delay.

26


FIG:8 Bit-parallel ripple-carry adder.

The computation of the result will be sequential in the worst case, starting at LSB and generating

carry values up to MSB.

Many techniques have been proposed to avoid this problem of long carry propagation paths, e.g.,

carry look-ahead, carry-save, and carry-select. One common property of these solutions is the

increase of resources that are required to speed up the computation compared to the ripple-carry

implementation.

Unwanted switching in the logic circuits is generated by implementations using the simple full-

adder based structure in Fig.4.1, as intermediate incorrect results are computed before the correct

carry has arrived to a bit level stage. The number of full-adders, and therefore also the carry

propagation path limiting the addition time, is proportional to the data word length Wd.

Subtraction is carried out using the same structure as for addition. By using the property that the

sign of a two’s complement number is changed by inverting all bits and adding one to the LSB

position, the addition is converted into subtraction by inverting the value to be subtracted, and

setting the input carry bit at the LSB.

MULTIPLICATION:-

Binary multiplication may be carried out using a scheme similar to common hand calculation.

An array of partial-product terms is generated and then added as shown in Fig. 4.2. Each dot in

27


the summation array corresponds to two digits multiplied, and this is in the binary case

equivalent with a logic AND function of two bits.

FIG 9: Matrix of partial products generated in multiplication.

Summation of the partial-products can be performed in various ways [8]. The straightforward

method of using a full-adder for the addition of each dot will result in the array multiplier shown

in Fig. 4.3, with a multiplication time proportional to the sum of the data word length and

coefficient

FIG10: Array multiplier for two’s-complement numbers.

28


word length (propagating down and then from right to left). The required area will be

proportional to the data word length and the coefficient word length (Wd×Wc).

Other methods of adding the partial product terms include Wallace trees and similar structures,

where the carry propagation is reduced by changing the addition order of the input data [10].

Such addition schemes use a treelike adder structure to speed up the additions, thereby reducing

the propagation delay. Carry is only propagated from one level to another, resulting in short

combinatorial paths. Only the final step, where the two last intermediate results are to be added,

requires a carry-propagate adder.

2) BIT-SERIAL ARITHMETIC:-

In bit-serial arithmetic one bit of the input data is processed in each clock cycle, generally

starting with the LSB. The complexity of an operation is low as there are few input bits to

operate on in each clock cycle. Combinatorial paths through the logic are short, allowing for high

bit-rates, which will make the total computation time comparable to bit-parallel ripple carry

implementations.

Using of bit-serial arithmetic results in small processing elements and short interconnection paths

between the processing elements. The total chip area therefore becomes smaller which makes the

interconnection between the processing elements shorter. This allows for higher clock frequency

and also reduces the power consumption as the capacitive loads on the gates are reduced.

ADDITION AND SUBTRACTION:-

A bit-serial adder adds two bits during one clock cycle generating a sum bit. A carry bit is also

generated which is added in the next clock cycle, as shown to the left in Fig.4.4. The carry is

29


saved in a flip-flop, which is reset at the start of the addition. This reset of the D flip-flop

corresponds to the zero at the LSB carry input in the bit-parallel case.

FIG 11: Bit-serial adder and sub tractor.

The area of the adder is independent on the data word length, but the number of clock cycles is

proportional to the word length. Power consumption is lower in the bit-serial case compared to a

long bit-parallel ripple carry implementation because the combinatorial depth of the circuit is

smaller, and the output is correctly computed directly without excessive switching.

Subtraction may be implemented as in the bit-parallel case, i.e., by changing the sign of the

subtrahend. This is accomplished by inverting all bits and adding a one to the LSB position. In

the bit-serial case an adder with one inverted input is sufficient to implement the subtraction as

shown to the right in Fig. 4.4. The carry flip-flop is set at the beginning of the subtraction.

MULTIPLICATION:-

Multiplication of two numbers can be accomplished using two bit-serial inputs, generating a bit-

serial output [1]. Many DSP algorithms like digital filters and FFTs only use multiplications of

data and a fixed coefficient. We will only discuss this type of multiplication here.

30


SHIFT-AND-ADD MULTIPLIERS:-

A common case is multiplication with a fixed coefficient, which may be realized as

multiplication of a bit-serial input of Wd bits with a bit parallel coefficient of Wc bits, generating

a bit-serial output of Wd+Wc-1 bits. Both the input and output bit-serial data streams are in a LSB

first order. This shift-and-add multiplier structure computes the product by adding rows in the

matrix representation, generating a new row of bits after each addition. The input stage to a shift-

and-add multiplier consists of a row of AND gates which performs a bit-wise multiplication of

the serial input bit with the parallel coefficient. This stage is in Fig.4.5 implemented as a

multiplexer that selects either the coefficient a or zero. The result of this bit-wise multiplication

is then added to the partial product. The accumulated sum is then shifted right one position. The

rightmost bit yields one bit in the product. Once the last addition is completed, the multiplier is

clocked for additional Wc clock cycles with a zero input in order to shift out the Wc most

significant number of bits.

FIG 12: Shift-and-add multiplier.

Use of a coefficient in two’s-complement form requires the shifting to be done using arithmetic

shifts, copying the sign-bit, as the intermediate result may be negative.

Serial input data in two’s-complement form requires a special treatment compared to positive

binary data. The last bit, the sign-bit, has a bit weight of -1. The sign-bit should therefore be

31


multiplied with the coefficient and the resulting partial product should then be subtracted from

the accumulated sum. One approach is to include logic that convert the addition of (x0×a) to a

subtraction. Finally, the last Wc bits are generated while keeping the serial input to zero. Another

approach to handle the sign-bit is to sign-extend the serial input [4]. The subtraction in the

multiplication of two’s-complement numbers may be eliminated by sign-extending the serial

input as shown in Eq. (4.3). The left part of the last expression is the subtraction operation of the

coefficient. It only contributes to the product in bit-positions with bit-weights above 20. The right

part of the last expression only contributes to the product at bit-positions with bit-weights up to

20. The final subtraction is therefore not required and the multiplier can therefore be

implemented using only additions. The sign-extension logic may consist of a latch.

……………………….Eq.4.3

The multiplication time in a serial/parallel multiplier is Wd+Wc-1 clock cycles, where Wd is the

bit-serial data word length and Wc is the coefficient word length. The maximum clock frequency

will be limited by the addition time in one bit-adder. Only the least significant bit is used as

output at each clock cycle, allowing the rest of the intermediate result to be in an arbitrary

number format. A redundant representation of the intermediate result is therefore acceptable, as

long as the LSB is calculated. Use of carry-save adders will therefore allow for a high clock

frequency since they have a short combinatorial path. Shifting is automatically performed each

clock cycle due to the wiring.

32


SERIAL/PARALLEL MULTIPLIERS:-

An alternative realization of the shift-and-add algorithm is shown in Fig.4.6. This realization is

referred to as a serial/parallel (S/P) multiplier [4]. It consists of two parts. The first part generates

the partial bit-products and the second part is a so-called shift-accumulator. A serial/parallel

multiplier requires little chip area and can be clocked with high clock frequency [11].

Serial/parallel multipliers are natural building blocks for more complex operations. For example,

a processing element corresponding to a two-port adaptor, can be built using a single multiplier,

three bit-serial adders, and a number of D flip-flops. Several implementations of digital filters

with multiplexed processing elements of this type have successfully been implemented using

both standard-cell and full-custom layout styles [9, 12, 6].

FIG 13: Serial/parallel multiplier using a shift-accumulator.

S/P MULTIPLICATION WITH FIXED COEFFICIENTS:-

The serial/parallel multiplier structure may be significantly simplified if the coefficient is fixed

[5]. The number of full-adders in a simplified implementation is equal to the number of non-zero

bits in the coefficient minus one if the coefficient is positive, and the number of non-zero bits in

the case of a negative coefficient. Procedures for simplifying serial/parallel multipliers with fixed

coefficients, either in two’s-complement or signed digit code, is presented in [2]. An example of

a simplified serial/parallel multiplier is shown in Fig.4.7. Here, the logic drawn with dotted lines

can be removed.

33


FIG 14: Simplified serial/parallel multiplier with coefficient 0.0112.

Multiplication generates a product that has a larger data word length than the word length of the

serial input. The number of fractional bits in the coefficient determines the number of extra bits

(of lower significance level compared to the input data). These additional bits must be truncated/

rounded in a recursive path.

LATENCY:-

The computational speed is characterized by two parameters, latency and throughput. The

latency for an operation is defined as the time required for an input of a given significance level

to affect the output at the same significance level [7, 3, 4]. It describes how long time it takes for

an input value to be transformed into an output value. It is often convenient to measure the

latency in terms of clock cycles instead of real time unit.

Latency depends on the function of the processing element (PE). One example is the simplified

serial/parallel multiplier in Fig.4.8, which may be used in multiplication with 0.112 or 0.0112,

without changing the structure. The latency is, however, different in the two cases since the

multiplication with 0.0112 will generate one more fractional bit compared to the multiplication

with 0.112. The 0.0112 case will therefore require one more clock cycle before a result bit of the

same significance level is available at the output.

34


FIG 15: Simplified multiplier structures for fixed coefficients 0.112 and 0.0112.

THROUGHPUT:-

Throughput is defined as the reciprocal of the time between successive outputs as illustrated in

Fig.4.9 [7, 3, 4]. The throughput is measured in operations per time unit.

FIG 16: Latency and throughput of a processing element.

Throughput is not directly connected to the latency and it is possible to modify the throughput

without affecting the latency of a system. This is illustrated in Fig. 4.10, which describes how the

throughput of a system consisting of a single multiplier may be doubled by interleaving of two

multipliers. However, the latency has not changed.

35


FIG 17: Increased throughput without affecting latency.

Upper and lower limits on throughput and latency will depend on the technology used for the

implementation.

36


IMPLEMENTATION & ANALYSIS OF SUB-BLOCKS

A filter consists of various sub blocks like Adder, Multiplier and Delay etc. So to design filters it

is necessary to design all this sub blocks first then by combining these sub blocks as per

requirement filters can be designed. This chapter provides information about design,

implementation and analysis of various sub blocks which are required for filter design.

IMPLEMENTATION OF ADDER SUB-BLOCKS USING VHDL:-

Fig illustrates a block diagram of 15-bit fixed-point adder sub-block.

FIG 18: Two input Adder block.

To design a 15 bit full adder, first a single bit three input adder is created .by port mapping the

ports of this three input adder block 15 bit full adder is created. This generic VHDL code of 15

bit full adder is used as a library component. For the three input adder block, the carry output of

present state is feed back as the carry input of previous state, which is shown in fig.5.1.2.

FIG 19: RTL schematic of adder block.

37


Fig shows the output result of 15 bits adder block, where ‘a’ and ‘b’ are 15 bit input vectors. The

output vector is stored in ‘yout’ variable, which is the sum of input ‘a’ and ‘b’. By the same way

32 bit and 64 bit adder blocks are created. These adder blocks are used in bit parallel

implementation of digital filters.

FIG 20: output result of two input adder block.

Fig shows the block diagram of bit serial adder .To implement this adder we need the memory

block to store the sum and carry, for that we use D flip flops. In this adder circuit carry output of

present state is feed back as input to the previous state. Here the reset bit is used to reset the

output. Output is not available in the output port until the set bit is in on state, which is shown in

fig.5.1.5.

38


FIG 21: BIT serial adder block.

FIG 22: Test bench waveform of bit serial adder.

IMPLEMENTATION OF DELAY SUB-BLOCKS USING VHDL:-

Fig Show the bit serial and bit parallel implementation of delay sub blocks. This delay sub

blocks are used as a memory element to store the data up to one clock cycle. Here the reset bit is

used to reset the output.

39


FIG 23: Bit serial delay block.

FIG 24: Bit parallel delay block.

Inputs are given at the rising edge of the clock pulses and based on that same output is obtained

after a delay of one clock pulse. This memory block behaves like a D flip flop. The output is

shown in fig.5.2.3. which we can get after a delay of one clock pulse from the given input.

According to this figure variable‘d’ is input vector and variable ‘q’ is output vector. The input

vector‘d’ appear in the time slot of 90ns to 180ns. The output vector ‘q’ which appear in the time

slot immediately after the first rising edge of clock, (that is 180 ns to 360 ns).This memory block

hold the output for at least one clock cycle.

FIG 25: Test bench wave form of delay block.

40


IMPLEMENTATION OF MULTIPLIER SUB-BLOCKS USING VHDL:-

There is different way of designing multiplier. Here two of such design method has been

discussed.

SERIAL PARALLEL MULTIPLIER SUB-BLOCK USING VHDL:-

Fig shows the RTL schematic of a serial parallel multiplier. One of the input vector ‘a’ is applied

serially to the circuit (one bit at a time starting from the LSB), while the other ‘b’ is applied

parallel.(all bit simultaneously).Say that ‘a’ has M bit while ‘b’ has N. Then after all M bit of ‘a’

have been presented to the system a string of M ‘0’s must follows , in order to complete M+N

bit output product. As can be seen in fig that the system is pipelined and constructed using And

gates full Adder units and Registers. Each unit of the pipe line (except the left most one) requires

one Adder two Registers an And gate to compute one of the inputs.

FIG 26: serial parallel multiplier.

Simulation results are shown in Fig ‘a=1100’(decimal 12 ) was applied to the serial input. Notice

that this input must start with the LSB (a(0)=’0’), which appear in the time slot of 50ns to

100ns.while the MSB(a(3)=’1’)is situated in 350ns to 400ns.Recall that four zeros must then

follow. On the other hand at the parallel input, b=’1101’(decimal 13)was applied. The expected

result ‘prod=10011100’(decimal 156) can be observed in the lower plot. Recall that the first bit

out is the LSB, that is ‘prod(0)=0’,which appear in the time slot immediately after the first rising

41


edge of clock,(that is 100ns to 200 ns).while the last bit (MSB)of prod is situated in 600 ns to

700 ns. This kind of serial parallel multiplier is used as multiplier in bit serial arithmetic.

FIG 27: Simulation result of serial parallel multiplier.

BOOTH MULTIPLIER:-

Booth multiplication algorithm for radix 4

One of the solutions of realizing high speed multipliers is to enhance parallelism which helps to

decrease the number of subsequent calculation stages. The original version of the Booth

algorithm (Radix-2) had two drawbacks. They are: (i) the number of add subtract operations and

the number of shift operations become variable and become in convenient in designing parallel

multipliers. (ii) The algorithm becomes inefficient when there are isolated 1’s. These problems

are overcome by using modified Radix4 Booth algorithm which scans strings of three bits with

the algorithm given below:

1) Extend the sign bit 1 position if necessary to ensure that n is even.

2) Append a 0 to the right of the LSB of the multiplier.

3) According to the value of each vector, each Partial Product will he 0, +y, -y, +2y or -2y.

42


The negative values of y are made by taking the 2’s complement and in this paper Carry-look-

ahead (CLA) fast adders are used. The multiplication of y is done by shifting

y by one bit to the left. Thus, in any case, in designing a n-bit parallel multipliers, only

n/2 partial products are generated.

FIG 28: simulation result for booths multiplier.

Boots multiplier is used in bit parallel arithmetic. The output result of boots multiplier is shown

Fig

MULTIPLY ACCUMULATE SUB-BLOCKS USING VHDL:-

Multiplication followed by accumulation is a common operation in many digital system,

particularly those highly interconnected, like digital filters neural networks, data quantizes, etc.

43


FIG 29: MAC circuit.

One typical MAC (Multiply-Accumulate) architecture is illustrated in Fig.29 It consist of

multiplying two values, then adding the result to the previous accumulated value, which must

then be restored in the register for future accumulations. Another feature of MAC circuits is that

it must check for overflow, which might happened when the no of MAC operation is large. The

design can be done using components, because we have designed each of the units shown in Fig.

However it is relatively simple circuit, it can also be designed directly. In any case, the MAC

circuit, as a whole, can be used as a component in application like digital filters and neural

networks.

44


IMPLEMENTATION & ANALYSIS OF FIR FILTER S

Digital signal processing finds innumerable applications in the field of audio, video and

communications. Such application is generally based on LTI (linear time invariant) systems,

which can be implemented with digital circuitry. An LTI system is represented by following

equation:

Where Ak and Bk are the filter coefficient and x[n-k],y[n-k] are the current (for k=0) and earlier

(for k>0) input and output values ,respectively. To implement this expression, register are

necessary to store x[n-k] and or y[n-k] (for k>0),beside multiplication and adders , which are

well known building block in the digital domain.

The impulse response of digital filter can be divided in to two categories: IIR (infinite impulse

response) and FIR (Finite impulse response). The former correspond to general case described by

the equation above, while the latter occurs when N=0. Only FIR Filter can exhibits linear phase,

so they are indispensable when linear phase are required, like in many telecom applications.

With N=0, the equation above becomes

Where ck = bk/a0 are the coefficient of FIR filter .This equation can be obtained by the system of

Fig Where D (delay) represented a register (flip flops), a triangle is a multiplier, and a circle

means an adder.

45


TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION OF

FIR FILTER:-

The system function of FIR filter can be written as

H (Z) =∑ h (n)z-n for n=0 to N-1.

=h(0) + h(1) z-1 + h(2)z-2 …….+h(N-1)z-(N-1) …………………….Eq.6.1.1.

Y(Z)=h(0)X(Z)+ h(1)z-1X(Z)+ h(2)z-2X(Z)+ ……. h(N-1)z-(n-1) X(Z)

This equation is realized in FigThis is known as transversal structure. This structure requires N

multipliers, N-1 adders, and N-1 delay elements.

FIG 30: Transversal structure or Direct form realization on FIR Filter (with five coefficients).

An equivalent RTL representation is shown in Fig.6.1.3. As shown the values of ‘x’ are stored on

shift register, whose output are connected to the multipliers and then to the adders. The

coefficient must be stored on chip. However if the coefficient are always same, their value can be

implemented by means of logic gates rather than registers. On the other hand if it is general

46


purpose filter, then register are required for the coefficients. In the architecture of Fig the output

vector ‘y’ was always stored, in order to provide a clean synchronous output.

FIG 31: FIR Filter diagram (with four coefficients)

FIG 32: RTL representation of FIR Filter

The circuit of Fig can be constructed in many ways. However, if it is intended for future reuse or

sharing, then it should as generic as possible. The lower section of the filter contains a MAC

(multiply Accumulate) pipeline. This circuit is closely related to MAC circuit discussed

previously. Here to, over flow can happen, so add /truncate procedure must be included in the

design. In this circuit the random coefficient are chosen as constants. No algorithm is used to

generate coefficients. The value chosen are coeff(0)=3,coeff(1)= 9,coeff(2)=6,coeff(3)=13.

Simulation results are shown in Fig

47



FIR FILTER USING BIT PARALLEL ARITHMETIC:-

FIG 33: FIR Filter Direct form realization using bit parallel arithmetic.

Fig shows output result of Direct form realization of FIR Filter using bit parallel arithmetic. Here

8 bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the

coefficients are coeff(0)=3,coeff(1)=9,coeff(2)=6,coeff(3)=13. The sequence applied to the input

were x[0]=4, x[1]=3,x[2]=5,x[3]=2.Therefore, with all the flip flops previously reset, at the first

positive edge of the clock the expected output is y[0]=coef(0)* x[0]=12, which coincides with

the first result of the output for ‘y’ in Fig.3.At the next upward transition of the clock, the

expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=45 .And one clock cycle later Y[2]=

coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 , and so on.

48



FIR FILTER USING BIT SERIAL ARITHMETIC:-

Fig shows output result of Direct form realization of FIR Filter using bit serial arithmetic. Here

five single bit input from ‘x0’ to ‘x4’ are used which are feed with the rising edge of the clock

pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of

50ns to 100ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=5,x(3)=2.Therefore,

with all the flip flops previously reset, at the first positive edge of the clock the expected output

is y(0)=coef(0)* x(0)=12. In the output LSB will appear first, which will appear in the time slot

immediately after the first rising edge of clock, (that is 50ns to 100 ns).while the last bit (MSB)

of ‘y0’ is situated in 200 to 250 ns. The expected value of y1= coef(0)*x1+ coef(1)*x0=45 .Here

one addition operation take place and we know that for each bit serial addition operation output

will be delayed by one clock pulse. So the output of ‘y1’will appear after a delay of one extra

clock pulse from the output ‘y0’. That means the LSB of ‘y1’ will appear at the time slot of

100ns to 150 ns. So there will be an initial latency of one clock pulse. This trends will be

followed in ‘y2’ also, the LSB for ‘y2’ will appear at the time slot of 150ns to 200 ns. So there

will be an initial latency of two clock pulse. In the next output there will be an initial latency of

three clock pulse and these trends will go on for other outputs also.

FIG 34: FIR Filter Direct form realization using bit serial arithmetic

49


SIMULATION TIME ANALYSIS OF TRANSVERSAL STRUCTURE OR

DIRECT FORM REALIZATION FILTER USING BIT PARALLEL

ARITHMETIC & BIT SERIAL ARITHMETIC:-

In bit serial arithmetic the data are feed serially, first the LSB is given then in the next clock

pulse second bit is given. In this way the data of all input variables are feed and we get the output

in the same fashion. This way of entering input data and extracting output data will introduced

latency in the output waveform. As the latency is go on increasing in every individual output, so

it will take time for the last bit (MSB) of each output to appear in the waveform. The last bit

(MSB) of output ‘y0’ is situated in 200 to 250 ns and the last bit (MSB) of ‘y1’ is situated in 450

to 500 ns. In case of bit parallel arithmetic although there is initial latency in each output but the

output bit of a stage (y15 to y0) appears in synchronous with clock pulse. So we get LSB to MSB

output data of a stage within a single clock pulse. Bit parallel arithmetic of Fig.6.1.4 shows

Y[2]= coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 is situated in 350ns to 450 ns. And bit

serial arithmetic of Fig.6.1.5 shows Y[2]=66 is situated in 200ns to 600 ns(LSB at 200ns and

MSB at 600ns ). So in bit serial arithmetic LSB to MSB of the output is situated in different

clock pulse which is not the case of bit parallel arithmetic. For this reason if some one use bit

serial arithmetic, it will take time to get the complete output data compare to bit parallel

arithmetic. This is one such disadvantage of using bit serial arithmetic compare to bit parallel

arithmetic.

50


AREA ANALYSIS OF TRANSVERSAL STRUCTURE OR DIRECT FORM

REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC

& BIT SERIAL ARITHMETIC:-

According to design summery of Direct form realization of FIR filter, which use bit parallel

arithmetic in fig Number of 4 input LUTs are 479, number of occupied slices are 271, number of

bonded INPUT/OUTPUT are 26. Fig.6.1.7. shows the design summery of Direct form realization

of FIR filter which use bit serial arithmetic. According to this figure the number of 4 input LUTs

are 73, number of occupied slices are 47, number of bonded INPUT/OUTPUT are 13. If a

comparison is made between these two design summery, then it is found that bit parallel

arithmetic realization have used more number of 4 input LUTs, more number of occupied slices,

more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra number of

LUTs used are (479-73) =406, extra number of occupied slices are (271-47) =224, extra number

of bonded INPUT/OUTPUT are (26-13) =13.

Number of Slices Flip Flops 34

Number of 4 input LUTs 479

Number of occupied slices 271

Number of bonded INPUT/OUTPUT 26

Chart 1: Design summary of Direct form realization of FIR Filter using bit parallel arithmetic.

From this comparison it is found that bit parallel implementation of Direct form realization will

need more chip area compared to bit serial implementation. As the modern electronics devices

become smaller and smaller so chip area is an important design parameter for any electronics

51


circuits. If the design is considered in terms of chip area, then bit serial implementation of this

digital Filter is advantageous compared to the bit parallel implementation of digital filters.

Power consumption in the circuits is also related to the chip area. If the chip area is increased

then Power consumption will also increased in the circuits as well.





Chart 2: Design summary of Direct form realization of FIR Filter using bit serial arithmetic.

POWER ANALYSIS OF TRANSVERSAL STRUCTURE OR DIRECT

FORM REALIZATION OF FIR FILTER USING BIT PARALLEL

ARITHMETIC & BIT SERIAL ARITHMETIC:-

Comparative study on total estimated power consumption for direct form realization of FIR filter

reveals that, bit parallel arithmetic representation of FIR filter consume more power compare to

bit serial arithmetic representation. Fig.6.1.8 shows the data of xpower analysis of direct form

realization of FIR filters by using Xilinx tool. Which tell that direct form realization of FIR filter

using bit serial arithmetic will consume 0.084 watt power while the same filter produced by

using bit parallel arithmetic will consume 0.090 watt power in the circuitry.

Total estimated power Direct form realization of Direct form realization of

52


consumption in Watt FIR filter using bit parallel

arithmetic

FIR filter using bit serial

arithmetic

0.090w 0.084w

Chart 3: Power summary of Direct form realization of FIR filters.

According to first wave form of Fig.6.1.9.total power consumption is the sum of quiescent

power, logic power, IO power & digital clock manager power. Where quiescent power (also

called static power) is the power drawn by the device when it is powered up, configured with

user logic and there is no switching activity. In XPower Analyzer, the value reported for Total

Quiescent Power is composed of these quiescent power components:

Device static power – This represents power consumed by the device when it is powered

up without programming the user logic. The main contributor to this number is the

junction temperature. Any change affecting the device operating environment will affect

this power.

Design static power – This represents the power consumed by the user logic when the

device is programmed and without any switching activity. For instance, depending on

the device family and resource configuration, some blocks used in a design (such as

clock management, I/Os, and Multi-Gigabit Transceivers) will consume a set amount of

power regardless of activity.

The Logic power is used to account for the number of CLB resources, including LUTs, SRLs,

LUT-based RAMs, and flip-flops estimated for use in the design. By implementing the pre-

existing blocks that constitute a design, it is possible to accurately estimate resource utilization

for the bulk of a design. These resource utilization estimates help to predict the logic power,

which is typically the larger share of the dynamic power consumed in any design.

With higher switching speeds and capacitive loads, switching I/O power can be a substantial part

of the total power consumption of an FPGA. Because of this, it is important to accurately define

all I/O related parameters in order to measure IO power.

53


The Digital Clock Manager (DCM) primitive in Xilinx FPGA parts is used to

implement e.g. delay locked loop, digital frequency synthesizer, digital phase shifter, or a digital

spread spectrum. The digital clock manager module is a wrapper around the DCM primitive

which allows it to be used in the EDK tool suite.

FIG 35: Output wave form of power analysis of FIR filter (Direct form) using bit parallel

arithmetic.

In bit parallel arithmetic more no of input output ports are used compared to bit serial arithmetic.

The first waveform of Fig and Fig reveals that bit parallel arithmetic representation consume

more power because of higher input output ports compared to bit serial arithmetic representation.

Junction temperature plays an important part in measuring the device static power .Small change

in junction temperature will radically change the device power consumption. The third and

fourth waveform of Fig and Fig provides the information of changes of power with the change in

junction temperature for bit parallel and bit serial arithmetic.

54


From this analysis we come to know if power is considered as one of the design

criteria, then it is better to design direct form FIR filters by using bit serial arithmetic. Above

results reveals that direct form realization of FIR filters using bit serial arithmetic consume less

power compared to bit parallel arithmetic representation.

FIG 36: Output wave form of power analysis of FIR filter (direct form) using bit serial

arithmetic

CASCADE REALIZATION OF FIR FILTER:-

The Eq. No of transversal structure can be realized in Cascade form from factored form of H(Z)

for N odd value.

55


H(Z) = ∏ (bk0 +bk1z-1 +bk2z-2) for k=1 to (N-1)/2.

= (b10 +b11z-1 +b12z-2) (b20 +b21z-1 +b22z-2)……(b((N-1)/2)0 +b((N-1)/2)1 z-1 +b((N-1)/2)2z-2)

………………………………………………………………………………….Eq.6.2.1.

For N odd, N-1 will be even and H(Z) will have (N-1)/2 second order factors. Each second order

factored form of H (Z) is realized in direct form and in Cascaded to realize H(Z) as shown in

Fig.6.2.1.

FIG 37: Cascade realization of Eq.6.2.1.

For N even

H(Z) = (1+ b10z-1)∏ (bk0 +bk1z-1 +bk2z-2) for k=2 to N/2 ………….Eq.6.2.2.

When N is even, N-1 is odd and H(Z) will have one first order factor and (N-2)/2 second order

factors.

H(Z) = (1+ b10z-1) (b20 +b21z-1 +b22z-2) (b30 +b31z-1 +b32z-2)……… (b(N/2)0 +b(N/2)1z-1 +b(N/2) 2 z-2)

Now each factored form in H(Z) is realized in Direct form and are Cascaded to obtain the

realization of H(Z) as shown in Fig

56


FIG 38: Cascade realization of Eq

CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL

ARITHMETIC:-

Fig shows output result of Cascade realization of FIR filter using bit parallel arithmetic. Here 8

bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the

coefficients are coeff(0)=1,coeff(1)=2,coeff(2)=3,coeff(3)=4 ,coeff(4)=5,coeff(5)

=6,coeff(6)=7,coeff(7)=8, coeff(8)=9. The sequence applied to the input were x[0]=4,

x[1]=3,x[2]=2.For this realization three slice are chosen ,the first stage output are stored in ‘Y’

vector , the second stage output are stored in ‘Z’ vector and the final stage output are stored in

‘P’ vector .Therefore, with all the flip flops previously reset, at the first positive edge of the clock

the expected output is y[0]=coef(0)* x[0]=4. At the next upward transition of the clock, the

expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=11 .And one clock cycle later Y[2]=

coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=20 .

According to the Fig.6.2.4. if we consider the output result of second slice then

at the first positive edge of the clock the expected value of output is Z[0]=coef(3)* Y[0]=16. In

57


the next upward transition of the clock, the value of Z[1]= coef(3)*Y[1]+ coef(4)*Y[0]=64 .And

one clock cycle later Z[2]= coef(3)* Y[2] +coef(4)* Y[1] +coef(5)* Y[0]=159 .

FIG 39: FIR Filter Cascade realization using bit parallel arithmetic.

Finally the output result, which is the output of third slice (slice 2), is P[0]=coef(6)* Z[0]=112.

At the next upward transition of the clock, the value of P[1]= coef(6)*Z[1]+ coef(7)*Z[0]=576

and one clock cycle later P[2]= coef(6)* Z[2] +coef(7)* Z[1] +coef(8)* Z[0]=1769 . Where p[0]

appear in the time slot of 300ns to 380ns.

58


CASCADE REALIZATION OF FIR FILTER USING BIT SERIAL

ARITHMETIC:-

Bit serial implementation of this filter is done with help of bit serial adder, serial parallel

multiplier and delay element. In bit serial adder implementation registers are also used. The

registers are used to store the carry output bit and feed this output as carry input in the next clock

cycle. Digital filters are made of adder blocks, so there will be accumulation of delay of at least

one clock pulse for every addition operation. This is one of the reason for bit serial Filter to have

high initial latency.

Figshows output result of direct form realization of FIR Filter using bit serial arithmetic. Here

three single bit input from x0 to x2 are used which are feed with the rising edge of the clock

pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of

60ns to 120ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=2.Therefore, with all

the flip flops previously reset, at the first positive edge of the clock the expected output value is

y(0)=112. In the output LSB will appear first (that is 180ns to 240 ns) and the last bit (MSB) of

‘y0’ is situated in 480 to 540 ns. The output ‘y0’ will appear after a delay of three clock pulse

from the first rising edge of the clock. To propagate the result via three different slice delay due

to serial addition operation will get accumulated in each slice. That’s why the output will appear

after a delay of three clock pulse form the first rising edge of the clock. The expected value of

‘y1’=576 will appear after a delay of six clock pulse from the first rising edge of the clock. This

will go on increasing for ‘y2’ also, the LSB for ‘y2’ will appear after a delay of nine clock pulse

from the first rising edge of the clock.

59


FIG 41: FIR Filter Cascade realization using bit serial arithmetic.

SIMULATION TIME ANALYSIS OF CASCADE REALIZATION OF FIR

FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL

ARITHMETIC:-

In bit serial arithmetic the data are feed serially in each clock pulse. As it has been discussed

earlier, in bit serial adder implementation registers are used. The registers are used to store the

carry output bit of adder and feed this output as carry input of adder in the next clock cycle.

Serial adder blocks are integral part of this filter design. There will be accumulation of delay of

one clock pulse for every addition operations. This is one of the reasons for bit serial filter to

have high initial latency. In this example of Cascade filter three slices are used. If we consider

one addition take place in each slice so there will be an initial latency of three clock cycle for

each output. Because the outputs are passes through each slice only ones.

According to Fig.in bit parallel arithmetic Y[0] = 112 is situated in 300ns to 400 ns, after a

initial latency of two clock pulse and according to bit serial arithmetic of Fig Y[0]=112 is

situated after a initial latency of three clock pulse. In bit parallel arithmetic Y[1] = 576 is situated

after a initial latency of three clock pulse and in bit serial arithmetic Y[1] = 576 is situated after a

60


initial latency of six clock pulses. In bit parallel arithmetic Y[2] = 1769 is situated after a initial

latency of four clock pulse and in bit serial arithmetic Y[2] = 1769 is situated after a initial

latency of nine clock pulses So from this comparison it is found that if the no of slice is further

increased , then the initial latency in bit serial arithmetic will increase much higher than that of

initial latency in bit parallel arithmetic. The delay in bit parallel arithmetic is only due to the

presence of registers. To get the output data form a register we have to wait for at least one clock

cycle.

So we can say that, if simulation time is taken as a design parameter then Cascade realization of

FIR filter using bit parallel arithmetic is advantageous compared to the bit serial arithmetic.

AREA ANALYSIS OF CASCADE REALIZATION OF FIR FILTER

USING BIT PARALLEL ARITHMETIC & BISERIAL ARITHMETIC:-

According to design summery of Cascade realization of FIR Filter in Fig,which use bit parallel

arithmetic the number of slice flip flop are 149, number of 4 input LUTs are 729, number of

occupied slices are 395, number of bonded INPUT/OUTPUT are 74. Fig.6.2.7. shows the design

summery of Cascade realization of FIR Filter which use bit serial arithmetic. According to this

figure the number of slice flip flop are 80, the number of 4 input LUTs are 63, number of

occupied slices are 42, number of bonded INPUT/OUTPUT are 9. If a comparison is made

between these two design summery, then it is found that bit parallel arithmetic realization have

used more number of slice flip flops , more number of 4 input LUTs, more number of occupied

slices, more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra

number of slice flip flops are (149-80) =69, extra number of LUTs used are (729-63) =666 extra

number of occupied slices are (395-42) =353, extra number of bonded INPUT/OUTPUT are

(74-9)=65.

61


From this comparison it is found that bit parallel implementation of Cascade realization will need

more chip area compared to bit serial implementation. The chip area is an important design

parameter for any electronics circuits.

If the design is considered with respect to the chip area, then bit serial implementation of this

digital Filter is advantageous compared to the bit parallel implementation of that filter. Power

consumption in the circuits is also related to the chip area. If the chip area is increased then

Power consumption will also increased in the circuits.





Chart 4: Design summary of Cascade realization of FIR Filter using bit parallel arithmetic.





Chart 5: Design summary of Cascade realization of FIR Filter using bit serial arithmetic.

62


POWER ANALYSIS OF CASCADE REALIZATION OF FIR FILTER

USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-

The study on total estimated power consumption for Cascade realization of FIR filter reveals

that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial

arithmetic representation. Fig shows the data of Xpower analysis of lattice realization of FIR

filters by using Xilinx tool. Which tell that Cascade realization of FIR filter using bit serial

arithmetic will consume 0.057 watt power while the same filter produced by using bit parallel

arithmetic will consume 0.068 watt power in the internal circuitry.

Total estimated power

consumption in Watt

Cascade realization of FIR

filter using bit parallel

arithmetic

Cascade realization of FIR

filter using bit serial

arithmetic

0.091 w 0.083w

Chart 6: Power summary of Cascade realization of FIR filters.

According to the wave form of Fig.6.2.9.total power consumption is the sum of quiescent power,

logic power, IO power & digital clock manager power. Details description about each and

individual power consumption is given previously in this chapter.

63


FIG 42: Output wave form of power analysis of FIR filter (Cascade realization) using bit

parallel arithmetic.

Comparative study between the first wave form of Fig and Fig reveals that quiescent power,

logic power and digital clock manager (DCM) power are almost same in both the cases. But IO

power consumption is high for bit parallel cases.

In bit parallel arithmetic more number of input output ports are used compared to bit serial

arithmetic. Which is the reason for bit parallel arithmetic representation of Cascade realized filter

to consume more IO power compared to bit serial arithmetic representation.

As a result of that the over all power consumption for bit parallel arithmetic representation of

Cascade FIR filter is higher with respect to bit serially represented lattice filter.

64


FIG 43: Output wave form of power analysis of FIR filter (Cascade realization) using bit serial

arithmetic.

The third and fourth waveform of Fig and Fig provides the information about changes in power

with the change in junction temperature for bit parallel and bit serial arithmetic. Junction

temperature plays an important role in measuring the device static power .Small change in

junction temperature will drastically change the device power consumption. Here operational

junction temperature is chosen as 27.1 degree Celsius.

So the study reveals that if power is considered as one of the design criteria, then it is better to

design Cascade realization of FIR filters by using bit serial arithmetic compared to bit parallel

arithmetic representation.

65


LATTICE STRUCTURE OF AN FIR FILTER:-

Let us consider a FIR Filter with system function

H(Z) = Am(Z) =1+ ∑α m(k)z-k m>=1 for k=1 to m

From which we have

Y(Z) =X(Z)[1 +∑α m(k)z-k ] for k=1 to m

Taking inverse Z-transform on both side we get

y(n)=x(n)+ ∑α m(k)x(n-k) for k=1 to m……………………Eq

Eq.6.3.1. represent a FIR system with system function H(Z)= Am(Z).

Lattice structure for an all zero FIR system is obtained by interchanging the role of input and

output. For an all pole Filter the input x(n) =ƒN(n) and the output y(n) =ƒ0(n)

For all zero FIR system of order M-1 the input x(n) =ƒ0(n) and the output y(n) =ƒM-1(n)

For m =1 the Eq.6.3.1. reduces to

y(n)=x(n)+α1(1) x(n-1)………………………………………………Eq

The output can be obtained from single stage lattice Filter shown in Fig from which we have

x(n) = f0(n) = g0(n)

y(n) = ƒ1(n) = ƒ0(n) + k1g0(n-1)

= x(n) +k1x(n-1)

and g1(n) = k1 ƒ0(n) + g0(n-1)

= k1x(n) + x(n-1)………………………………………Eq

Comparing Eq. with Eq we get α1(0)=1 and α1(1)=k1.

66


FIG 44: single stage all zero Lattice Filter.

Now let us consider an FIR Filter for which m=2.

then y(n)=x(n)+α2(1) x(n-1) + α2(2) x(n-2)………………..

By cascading two lattice stage as shown in Fig it is possible to obtain the output y(n).

FIG 45: Two stage all zero Lattice Filter.

From Fig the output for second stage is

y(n) = ƒ2(n) = ƒ1(n) + k2g1(n-1) ……………………………………………Eq

= g2(n) = k2 ƒ1(n) + g1(n-1)

67


Substitute for ƒ1(n) and g1(n-1) from Eq. in Eq. we get

y(n) = ƒ2(n) = x(n) + k1x(n-1) + k2 [ k1x(n-1) + x(n-2)]

= x(n) +k1(1 + k2)x(n-1) + k2x(n-2)…………………………….Eq

Eq is identical to Eq from which we have

α2(0)= 1, α2(2)= k2 , α2(1) =k1(1+k2) = α1(1)[1 + α2(2)].

Similarly g2(n) = α2x(n) +k1(1+k2)x(n-1) +x(n-2).

LATTICE STRUCTURE OF FIR FILTER USING BIT PARALLEL

ARITHMETIC:-

Fig shows the output result of single stage Lattice realization of FIR Filter using bit parallel

arithmetic. Here 7 bit input vector x is feed parallel with the rising edge of the clock pulse. The

only one coefficient chosen in this stage is k1=3. The block diagram of single stage lattice

realization of FIR Filter is shown in Fig The sequence applied to the input were x[0]=1,

x[1]=2,x[2]=3,X[3]=4.when all the flip flop previously reset, then at the first positive edge of the

clock the expected output is y[0]=x[0]+k1* x[0-1]=1. At the next upward transition of the clock,

the expected value of y[1]= x[1]+k1* x[1-1]=5 .And one clock cycle later Y[2]= x[2]+k1* x[2-

1]=9 and at last Y[3]= x[3]+k1* x[3-1]=13.

As each stage have two sets of output so there will be another set of output in

terms of ‘g’. where at the first positive edge of the clock pulse the value is g[0]=k1*x[0]+ x[0-

1]=3. At the next upward transition of the clock pulse the value is g[1]=k1*x[1]+ x[1-1]=7.And

one clock cycle later g[2]= k1*x[2]+ x[2-1]=11 and at last g[3]= k1*x[3]+x[3-1]=15.

68


FIG 46: FIR Filter Lattice realization using bit parallel arithmetic.

LATTICE STRUCTURE OF FIR FILTER USING BIT SERIAL

ARITHMETIC:-

Bit serial implementation of this Filter is done with help of bit serial adder, serial parallel

multiplier and delay element. Fig.6.3.4 shows output result of Lattice realization of FIR Filter

using bit serial arithmetic. Here three single bit input from x0 to x3 are used, which are feed with

the rising edge of the clock pulse.

As the data are feed serially so the LSB is applied first, which appear in the time slot of 50ns to

150ns for all four input data bit. The sequence applied to the input was x0=1, x1=2,x2=3 and

X3=4.when all the flip flops are previously reset, then at the first positive edge of the clock the

expected output are y0=1 and g0=3. In both the output ‘y’ and ‘g’, LSB will appear first. Which

appear after a delay of one clock pulse from the first rising edge of the clock,(that is 100ns to 200

ns). The expected value of ‘y1’ and ‘g1’ will appear after a delay of two clock pulse from the

first rising edge of the clock. This will go on in the same fashion for y2 and g2 also. At last the

LSB for y3 and g3 will appear after a delay of four clock pulse from the first rising edge of the

clock.

69


FIG 47: FIR Filter Lattice realization using bit serial arithmetic.

SIMULATION TIME ANALYSIS OF LATTICE REALIZATION OF FIR

FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL

ARITHMETIC:-

As it has been discussed earlier, in bit serial adder implementation registers are used. The

registers are used to store the carry output bit of adder and feed this output as carry input of

adder in the next clock cycle. So there will be a delay of one clock cycle. As the Filter is made

of this adder block, so there will be accumulation of delay of one clock pulse for every add

operations in the Filter. This is one of the reason for bit serial Filter to have high initial latency.

In this example of Lattice Filter one stage is used. In this Filter realization each output depend on

the present and previous value of input, where previous value of input means input at earlier

clock. So there will be propagation of delay via addition operation in each output. In the bit serial

and parallel realization of this Filter each output has one additional initial latency than the

previous output. For example, output ‘y1’ have one extra initial latency than output ‘y0’. In

Fig.That means LSB of ‘y1’ will appear one clock cycle later than the LSB of ‘y0’.

70


But for bit serial case LSB and MSB of output and input are not appear in the same clock cycle.

As the output LSB appear late so it will take time for the MSB to appear in the output .Which is

not the case of bit parallel implementation. In bit parallel implementation LSB and MSB of any

output are appear in the same clock cycle. So it will take less simulation time to get the output.

From this study it is found that the simulation time taken to make Lattice realization of FIR Filter

using bit serial arithmetic is much higher than bit parallel implementation. So from simulation

time analysis point of view use of bit parallel arithmetic for designing Lattice Filter is

advantageous compared to the bit serial arithmetic.

AREA ANALYSIS OF LATTICE REALIZATION OF FIR FILTER USING

BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-

According to design summery of Lattice realization of FIR Filter in Fig.6.3.5,which use bit

parallel arithmetic the number of slice flip flop are 5, number of 4 input LUTs are 123, number

of occupied slices are 65, number of bonded INPUT/OUTPUT are 42. Fig.6.3.6. shows the

design summery of Lattice realization of Fir Filter which use bit serial arithmetic. According to

this figure the number of slice flip flop are 24, the number of 4 input LUTs are 24, number of

occupied slices are 14, number of bonded INPUT/OUTPUT are 15. If a comparison is made

between these two design summery, then it is found that bit parallel arithmetic realization have

used more number of 4 input LUTs, more number of occupied slices, more number of bonded

INPUT/OUTPUT compared to bit serial realization. Extra number of LUTs used are (123-24)

=99, extra number of occupied slices are (65-14) =51, extra number of bonded INPUT/OUTPUT

are (42-15) =27.

71






Chart 7: Design summary of Lattice realization of FIR Filter using bit parallel arithmetic.

But bit serial arithmetic realization have used more number of slice flip flop (24-5 =19)

compared to bit parallel realization, which is different from the previous two cases. But from

over all analysis it is found that bit parallel implementation of Lattice realization will need more

chip area compared to bit serial implementation. The chip area is an important design parameter

for any electronics circuits.

If the design is considered in terms of chip area, then bit serial implementation of this digital

Filter is advantageous compared to the bit parallel implementation.





Chart 8: Design summary of Lattice realization of FIR Filter using bit serial arithmetic.

72


POWER ANALYSIS OF LATTICE REALIZATION OF FIR FILTER

USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:-

Fig shows the data of xpower analysis of lattice realization of FIR filters by using Xilinx tool.

Which tell that Lattice realization of FIR filter using bit serial arithmetic will consume 0.057

watt power while the same filter produced by using bit parallel arithmetic will consume 0.068

watt power in the internal circuitry.

73


Total estimated power

consumption in Watt

Lattice realization of FIR

filter using bit parallel

arithmetic

Lattice realization of FIR

filter using bit serial

arithmetic

0.068 w 0.057w

Chart 9: Power summary of lattice realization of FIR filters.

So the study on total estimated power consumption for Lattice realization of FIR filter reveals

that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial

arithmetic representation. According to the wave form of Fig total power consumption is the sum

of quiescent power, logic power, IO power & digital clock manager power. Details description

about each and individual power consumption is given previously in this chapter.

74


FIG 48:Output wave form of power analysis of FIR filter (Lattice realization) using bit parallel

arithmetic.

Comparative study between the first wave form of Fig and Fig. tells that quiescent power, logic

power and digital clock manager (DCM) power are almost same in both the cases. But IO power

consumption is high for bit parallel cases.

In bit parallel arithmetic more number of input output ports are used compared to bit serial

arithmetic. Which is the reason for bit parallel arithmetic representation of Lattice realized filter

to consume more IO power compared to bit serial arithmetic representation.

As a result of that the over all power consumption for bit parallel arithmetic representation of

Lattice FIR filter is higher with respect to bit serially represented lattice filter.

75


FIG 49: Output wave form of power analysis of FIR filter (Lattice realization) using bit serial

arithmetic.

Junction temperature plays an important role in measuring the device static power .Small change

in junction temperature will drastically change the device power consumption. The third and

fourth waveform of Fig. and Fig. provides the information about changes in power with the

change in junction temperature for bit parallel and bit serial arithmetic. Here operational junction

temperature is chosen as 26.3 degree celcious.

So the study reveals that if power is considered as one of the design criteria, then it is better to

design lattice realization of FIR filters by using bit serial arithmetic compared to bit parallel

arithmetic representation.

CONCLUSION

This current work is dealing with an approach to design and implementation of very fast fixed-

function digital filters using bit-serial and bit-parallel arithmetic. The main concerns of the filter

designs are high throughput, small chip area and low power consumption. The increased

throughput can be traded for reduced power consumption through power supply voltage scaling.

VHDL and FPGA provided the platform for realization of Direct form, Cascade form and Lattice

structure of digital Filters using bit serial bit parallel arithmetic.

By making a comparative study among all this Filters to estimate the performance in terms of

simulation time, chip area and power consumption, several important performances are observed.

From these performances it is found that

(i) Simulation time - bit parallel implemented digital filters take less time compared to bit serial

implementation.

76


(ii) Initial latency - bit serially designed filters have higher initial latency compared to bit parallel

implemented filters.

(iii) Chip area - bit parallel implementation of digital filters consume much larger area compared

to the same filters realized using bit serial arithmetic.

(iv) Power consumption - bit serial digital filters have less power consumption than bit parallel

implementation.

VHDL has been used successfully for designing the filters by loading the VHDL software

(Xilinx) of version 7.1i in pc. For implementing the design, Spartan-3E kit has been chosen,

which is connected via USB port of the pc. But with such direct connection there are some

incompatibilities arising while input bits exceed more than 16 bits. As the Spartan-3E kit is

having 16 input ports and 8 output ports, our implementation is thus restricted by only checking

the peripheral filter circuitry (such as adder, multiplier, subtractor etc.). In all of our designed

filters, there are 32 input bits and 32 output bits. So we have to develop proper interfacing which

will be able to handle more numbers of input and output bits.

FUTURE PLANS

In this present work our study is restricted to three different FIR filter realizations. The other

realizations of FIR Filter like direct form two, linear phase realization could be achieved by the

same arithmetic. The other filter realization arithmetic like distributed arithmetic, digit serial

arithmetic etc. can be incorporated as our future plan.

It has been observed that the project have certain limitations regarding

measurement of the area consumed by the designed filter due to unavailability of the proper

simulation tools.

According to Chapter 6, the total chip area is measured by counting the number of look up table,

flip flops etc. So, one of our future goals will be to develop a simulation tool which can measure

the exact chip area in terms of milimeter2 or micrometer2. In the same chapter, we have measured

the power consumed indirectly by some software tools. No tools are available as a free

77


simulation tool to measure power directly from the designed filters, and also analysis the power

performance of the filter.

At the same time in future, we will include our design expertise to explore the domain of IIR filters.

VHDL CODES FOR FIR FILTERS

USING BIT-PARALLEL ARITHMETIC

VHDL Code for 4-BIT COUNTER

entity counter_4_bit is

Port ( clk : in STD_LOGIC;

rst : in STD_LOGIC;

q : inout STD_LOGIC_VECTOR (4 downto 1);

78


qbar : inout STD_LOGIC_VECTOR (4 downto 1));

end counter_4_bit;

architecture Behavioral of counter_4_bit is

component d_flip_flop is

Port ( d : in STD_LOGIC;

clk : in STD_LOGIC;

rst : in STD_LOGIC;

q : out STD_LOGIC);

end component;

signal i,j,k,l : STD_LOGIC;

begin

qbar <= not q;

i<=qbar(1);

j<=q(1) xor q(2);

k<=(q(1) and q(2) and qbar(3)) or (q(3) and (qbar(1) or qbar(2))) ;

l<=(q(4) and (qbar(1) or qbar(2) or qbar(3))) or (q(1) and q(2) and q(3) and qbar(4));

d1: d_flip_flop port map(i,clk,rst,q(1));

d2: d_flip_flop port map(j,clk,rst,q(2));

d3: d_flip_flop port map(k,clk,rst,q(3));

79


d4: d_flip_flop port map(l,clk,rst,q(4));

end Behavioral;

VHDL CODE FOR BOOTH MULTIPLIER library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_SIGNED.ALL;

use ieee.numeric_std.all;

entity encoder is

Port ( a : in std_logic_vector(7 downto 0);

arg : in std_logic_vector(2 downto 0);

pprod : out std_logic_vector(15 downto 0));

end encoder;

architecture Behavioral of encoder is

function encoder(arg1: std_logic_vector(2 downto 0);data:std_logic_vector(7

downto 0))

return std_logic_vector is

variable temp,temp1,temp2: std_logic_vector(8 downto 0);

variable sign: std_logic;

begin

case arg1 is

when "001"|"010" =>

if data <0 then

temp:='1'& data

else

temp:='0'&data;

end if;

80


when "011" =>

if data<0 then

temp1:='1'&data;

temp:=temp1(7 downto 0)&'0';

else

temp:='0'&data(6 downto 0)&'0';

end if ;

when "100" =>

if data<0 then

temp1:='1'&data;

temp2:=(not temp1)+"000000001";

temp:=(temp2(7 downto 0)&'0');

else

temp1:='0'&data;

temp2:=(not temp1)+"000000001";

temp:=(temp2(7 downto 0)&'0');

end if;when "101"|"110" =>if data < 0 thentemp1:='1'&data;

temp:=not(temp1)+"000000001";elsetemp1:='0'&data;temp:=(not temp1)+"000000001";end if;when others =>temp:="000000000";end case;return temp;end encoder; signal s1: std_logic_vector(8 downto 0);signal s2: std_logic;begins1<=encoder(arg,a);pprod<=sxt(s1,16);

81


end Behavioral;

VHDL CODE FOR SIXTEEN BIT FULL ADDER

library IEEE;



use IEEE.STD_LOGIC_SIGNED.ALL;

entity sixteenbit_fa is

Port ( a : in STD_LOGIC_VECTOR (15 downto 0);

b : in STD_LOGIC_VECTOR (15 downto 0);

yout : out STD_LOGIC_VECTOR (15 downto 0));

end sixteenbit_fa;

architecture Behavioral of sixteenbit_fa is

signal s: std_logic_vector(15 downto 0);

signal carry1: std_logic_vector(16 downto 0);

COMPONENT twobit_add

PORT(

a : IN std_logic;

b : IN std_logic;

cin : IN std_logic;

sum : OUT std_logic;

cout : OUT std_logic);

82


END COMPONENT;

begin

carry1(0)<='0';

g1 : for i in 0 to 15 generate

f0 : twobit_add PORT MAP(a(i), b(i),carry1(i),yout(i), carry1(i+1));

-- inter_carr<=carry(i+1);

end generate g1;

--cout<=carry1(16);

end Behavioral;

VHDL CODE FOR MULTIPLIER

entity multiply is


a : in STD_LOGIC_VECTOR (4 downto 1);


y : out STD_LOGIC_VECTOR (8 downto 1));

end multiply;

architecture Behavioral of multiply is

signal x1,x2,x3,x4,x5,x6,x7,x8 : std_logic_vector(8 downto 1);

83


begin

process (clk) is

begin

if (b(1)='1') then

x1 <= "0000" & a ;

else

x1 <= "00000000";

end if;

if (b(2)='1') then

x2 <= "000" & a & '0' ;

else

x2 <= "00000000";

end if;

if (b(3)='1') then

x3 <= "00" & a & "00" ;

else

x3 <= "00000000";

end if;

if (b(4)='1') then

x4 <= '0' & a & "000" ;

else

x4 <= "00000000";

84


end if;

end process;

y<= x1 + x2 + x3 + x4;

end Behavioral;

VHDL CODE FOR SERIAL PARALLEL COVERTER

entity converter is

Port ( rst : in STD_LOGIC;

clk : in STD_LOGIC;

start : in STD_LOGIC;

din : in STD_LOGIC_VECTOR (7 downto 0);

dout : out STD_LOGIC);

end converter;

85


architecture Behavioral of converter is

signal dst :std_logic_vector(7 downto 0):=(others => '0');

signal data,stop: std_logic:= '0';

begin

process (clk,rst)

begin

if rst = '1' then

dst <=(others => '0');

data <= '0';

stop <= '0';

elsif rising_edge(clk) then

if start ='1' then

data <= '1';

stop <= '1';

dst <= din;

else

data <= dst(7);

stop <= '0';

dst <= dst(6 downto 0)&stop;

end if;

end if;

86


end process;

dout<=data;

end Behavioral;

VHDL CODE FOR FIR FILTERS

entity Filter is

Port ( h0,h1,h2,h3,h4 : in STD_LOGIC_VECTOR (4 downto 1);

cp : in STD_LOGIC;

rst: in STD_LOGIC;

clk: in STD_LOGIC;

xin : in STD_LOGIC_VECTOR (4 downto 1);

y : out STD_LOGIC

);

end Filter;

architecture Behavioral of dekhi is

component multiply is


a : in STD_LOGIC_VECTOR (4 downto 1);


y : out STD_LOGIC_VECTOR (8 downto 1));

end component;

component converter is

Port ( rst : in STD_LOGIC;

clk : in STD_LOGIC;

87


start : in STD_LOGIC;

din : in STD_LOGIC_VECTOR (7 downto 0);

dout : out STD_LOGIC);

end component;

component d_flip_flop is


clk : in STD_LOGIC;

rst : in STD_LOGIC;

q : out STD_LOGIC);

end component;

signal p1,p2,p3,p4,p5,p6,p7,p8,p9,y1,y2,y3,y4,y5 : STD_LOGIC;

signal yy1,yy2,yy3,yy4,yy5 : STD_LOGIC_VECTOR (8 downto 1);

begin

m1: multiply port map(cp,xin,h4,yy1);

c1: converter port map(rst,clk,'1',yy1,y1);

d1: d_flip_flop port map(y1,clk,rst,p1);



p2 <= p1 or y2;

d2: d_flip_flop port map(p2,clk,rst,p3);



88


p4 <= p3 or y3;




p6 <= p5 or y4;




y <= p7 or y5;

end Behavioral;

VHDL Code for ALU-DESIGN

entity ALU is

Port ( a : in STD_LOGIC_VECTOR (3 downto 0);


ch : in STD_LOGIC_VECTOR (1 downto 0);

y : out STD_LOGIC_VECTOR (7 downto 0);

clk : in STD_LOGIC);

end ALU;

architecture Behavioral of ALU is

89


begin

process (clk,ch,a,b) is

begin

if(rising_edge( clk )) then

case ch is

when "00" =>

y <= a or b;

when "01" =>

y <= a nor b;

when "10" =>

y <= a xor b;

when "11" =>

y <= a nand b;

when others => "00000000"

end case;

end if;

end process;

end Behavioral;

USING SERIAL-BIT ARITHMETIC

90


VHDL CODE FOR D FLIP-FLOP

entity d_flip_flop is


clk : in STD_LOGIC;

rst : in STD_LOGIC;

q : out STD_LOGIC);

end d_flip_flop;

architecture Behavioral of d_flip_flop is

begin

dff : process (clk,rst) is

begin

if (rst='1') then

q <= '0';

elsif (rising_edge (clk)) then

q <= d;

end if;

end process dff;

end Behavioral;

91


VHDL Code for FULL ADDER

entity fa1 is

Port ( a : in STD_LOGIC;

b : in STD_LOGIC;

ci : in STD_LOGIC;

s : out STD_LOGIC;

co : out STD_LOGIC);

end fa1;

architecture Behavioral of fa1 is

component or1

port(a,b: in std_logic;

y: out std_logic);

end component;

component ha1

port(a,b: in std_logic;

s,co: out std_logic);

end component;

signal s1,c1,c2: std_logic;

92


begin

a1: ha1 port map(a,b,s1,c1);

a2: ha1 port map(s1,ci,s,c2);

a3: or1 port map(c1,c2,co);

end Behavioral;

VHDL Code for HALF ADDER

entity ha1 is

Port ( a : in STD_LOGIC;

b : in STD_LOGIC;

s : out STD_LOGIC;

co : out STD_LOGIC);

end ha1;

architecture Behavioral of ha1 is

begin

93


s<= a xor b;

co<= a and b;

end Behavioral;

VHDL Code for RIGHT SHIFTER

entity rt_shifter is


rst : in STD_LOGIC;

sin : in STD_LOGIC;

y : out STD_LOGIC);

end rt_shifter;

architecture Behavioral of rt_shifter is

component d_flip_flop


clk : in STD_LOGIC;

rst : in STD_LOGIC;

q : out STD_LOGIC);

end component;

94


signal tmp,y4,y3,y2,yo : STD_LOGIC;

begin

process (clk,rst) is

begin

if (rising_edge(clk)and rst ='0') then

tmp <=sin;

end if;

end process;

d1: d_flip_flop port map(tmp,clk,rst,y4);

d2: d_flip_flop port map(y4,clk,rst,y3);

d3: d_flip_flop port map(y3,clk,rst,y2);

d4: d_flip_flop port map(y2,clk,rst,y);

end Behavioral;

VHDL CODE FOR DELAY

library IEEE;

95




use IEEE.STD_LOGIC_UNSIGNED.ALL;

entity reg is

Port ( d : in std_logic;

clk : in std_logic;

rst : in std_logic;

q : out std_logic);

end reg;

architecture reg of reg is

signal state : std_logic;

begin

process(clk,rst)

begin

if (rst='1')

then state<= '0';

elsif (clk'event and clk='1')

then state<=d;

end if;

end process;

q<= state;

end reg;

96


VHDL CODE FOR PIPELINE

library IEEE;




entity pipe is

Port ( a : in std_logic;

b : in std_logic;

clk : in std_logic;

rst : in std_logic;

q : out std_logic);

end pipe;

architecture structural of pipe is

component reg is

Port ( d : in std_logic;

clk : in std_logic;

rst : in std_logic;

q : out std_logic);

end component;

97


component fau is

Port ( a : in std_logic;

b : in std_logic;

cin : in std_logic;

s : out std_logic;

cout : out std_logic);

end component;

signal s,cin,cout: std_logic;

begin

u1: component fau port map(a ,b,cin, s , cout);

u2: component reg port map(cout ,clk ,rst, cin);

u3: component reg port map(s ,clk ,rst, q);

end structural;

VHDL CODE FOR FIR FILTER

library IEEE;


61



entity fir is

Port ( x0 : in STD_LOGIC;

98


x1 : in STD_LOGIC;

clk : in STD_LOGIC;

rst : in STD_LOGIC;

y0 : out STD_LOGIC;

y1 : out STD_LOGIC);

end fir;

architecture Behavioral of fir is

COMPONENT adder

PORT(

a : IN std_logic;

b : IN std_logic;

clk: IN std_logic;

rst: IN std_logic;

s : OUT std_logic);

END COMPONENT;

COMPONENT multiplier

PORT(

a : IN std_logic;

clk: IN std_logic;

rst: IN std_logic;

b : IN std_logic_vector(3 downto 0);

prod : OUT std_logic);

END COMPONENT;

type coefficients is array (3 downto 0)of std_logic_vector(3 downto 0);

constant coef:coefficients

99


:=("1001","0011");

signal p1,p2: std_logic;

begin

m1: multiplier port map(x0,clk,rst,coef(0),y0);

m2: multiplier port map(x0,clk,rst,coef(1),p1);

m3: multiplier port map(x1,clk,rst,coef(0),p2);

a1: adder port map(p1,p2,clk,rst,y1);

end Behavioral;

100


REFERENCES

Articles from published conference proceedings:

[1] M. Vesterbacka, K. Palmkvist, and L. Wanhammar: Serial Squarers and Serial/Serial

Multipliers, National Conference on Radio Science (RVK-96), Lule., Sweden, June 3-6, 1996.

[2] M. Vesterbacka, K. Palmkvist, and L. Wanhammar: Realization of Serial/Parallel Multipliers

with Fixed Coefficients, National Conference on Radio Science (RVK-93), Lund Institute of

Technology, Lund, Sweden, pp. 209-212, April 5-7, 1993.

[3] K. Palmkvist, M. Vesterbacka, P. Sandberg, L. Wanhammar: Scheduling of Data-

Independent Recursive Algorithms, Proc. European Conference on Circuit Theory and Design

(ECCTD’95), Istanbul, Turkey, pp. 855-858, Aug. 27-31, 1995.

Books:

[4] L. Wanhammar: DSP Integrated Circuits, Linkoping University, 1996.

[5] A. P. Chandrakasan and R. W.Brodersen: Low Power Digital CMOS Design, Kluwer

Academic Publ., 1995

[6] P. Sandberg , K. Palmkvist ,L. Wanhammar ,R. Gustavsson : Synthesis of the SIC

Architecture from VHDL, LiTH-ISY-R-1610, Linkoping University, Sweden.

[7] A. Bellaouar and M. Elmasry: Low-Power Digital VLSI Design: Circuits and Systems,

Kluwer Academic Publ., 1995.

[8] I. Koren : Computer Arithmetic Algorithms, Prentice Hall, 1993.

Technical reports:

101


[9] M. Vesterbacka ,K. Palmkvist ,P. Sandberg , and L. Wanhammar : Implementation of Fast

Bit-Serial Lattice Wave Digital Filters, Proc. IEEE Int. Symposium on Circuits and Systems

(ISCAS’94), Vol. 2, pp. 113-116, London, England, May 29- June 2, 1994.

[10] C.G.Wallace: A Suggestion for a Fast Multiplier, IEEE Trans. Electronic Computers, Vol.

EC-13, pp. 14-17, February, 1964.

[11] M.Vesterbacka : Implementation of Maximally Fast Wave Digital Filters, Linkoping

Studies in Science and Technology, Thesis No. 495, Linkoping University, Sweden, 1995.

[12] P. Sandberg, K. Palmkvist , and L. Wanhammar : Some Experiences From Automatic

Synthesis of Digital Filters, Proc. NorChip-94, G.teborg, Sweden, Nov. 8-9, 1994.

102

digital filter design using vhdl

Engineering