chapter 4 low-power vlsi design

44
Chapter 4 Low-Power VLSI Design Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jhongli, Taiwan

Upload: others

Post on 21-Nov-2021

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 4 Low-Power VLSI Design

Chapter 4Low-Power VLSI Design

Jin-Fu Li Advanced Reliable Systems (ARES) Lab.

Department of Electrical EngineeringNational Central University

Jhongli, Taiwan

Page 2: Chapter 4 Low-Power VLSI Design

2EE4012VLSI DesignNational Central University

Outline

• Introduction

• Low-Power Gate-Level Design

• Low-Power Architecture-Level Design

• Algorithmic-Level Power Reduction

• RTL Techniques for Optimizing Power

Page 3: Chapter 4 Low-Power VLSI Design

3EE4012VLSI DesignNational Central University

Introduction • Most SOC design teams now regard power as one

of their top design concerns

• Why low-power design? Battery lifetime (especially for portable devices) Reliability

• Power consumption Peak power Average power

Page 4: Chapter 4 Low-Power VLSI Design

4EE4012VLSI DesignNational Central University

Overview of Power Consumption • Average power consumption

Dynamic power consumption Short-circuit power consumption Leakage power consumption Static power consumption

• Dynamic power dissipation during switching

Cinput

CdrainCinput

interconnect

Page 5: Chapter 4 Low-Power VLSI Design

5EE4012VLSI DesignNational Central University

Overview of Power Consumption • Generic representation of a CMOS logic gate for

switching power calculation

inputerconnectdrain CCC int

pMOS network

nMOS network

Vout

2/

0 2/]))(()([1 T T

Tout

loadoutDDout

loadoutavg dtdt

dVCVVdtdt

dVCVT

P

VA

VB

VA

VB

Page 6: Chapter 4 Low-Power VLSI Design

6EE4012VLSI DesignNational Central University

Overview of Power Consumption • The average power consumption can be expressed

as

• The node transition rate can be slower than the clock rate. To better represent this behavior, a node transition factor ( ) should be introduced

• The switching power expressed above are derived by taking into account the output node load capacitance

CLKDDloadDDloadavg fVCVCT

P 221

CLKDDloadTavg fVCP 2T

Page 7: Chapter 4 Low-Power VLSI Design

7EE4012VLSI DesignNational Central University

Overview of Power Consumption

Cload

Cinternal

Vinternal

VA

VB

VA VB

Vout

Vinternal

VA

VB

Vout

CLKDD

ofnodes

iiiTiavg fVVCP

#

1

The generalized expression for the average power dissipation can be rewritten as

Page 8: Chapter 4 Low-Power VLSI Design

8EE4012VLSI DesignNational Central University

Gate-Level Design – Technology Mapping

• The objective of logic minimization is to reduce the boolean function.

• For low-power design, the signal switching activity is minimized by restructuring a logic circuit

• The power minimization is constrained by the delay, however, the area may increase.

• During this phase of logic minimization, the function to be minimized is

i

iii CPP )1(

Page 9: Chapter 4 Low-Power VLSI Design

9EE4012VLSI DesignNational Central University

Gate-Level Design – Technology Mapping

• The first step in technology mapping is to decompose each logic function into two-input gates

• The objective of this decomposition is to minimize the total power dissipation by reducing the total switching activity

0384.0

ABCD

A 0.2

B 0.2

C 0.5D 0.5

A 0.2

B 0.2

C 0.5D 0.5

0099.00196.0

0099.0

1875.0

0384.0

Page 10: Chapter 4 Low-Power VLSI Design

10EE4012VLSI DesignNational Central University

Gate-Level Design – Phase Assignment

A

B

High activity node

C

A

B

High activity node

C

Page 11: Chapter 4 Low-Power VLSI Design

11EE4012VLSI DesignNational Central University

Gate-Level Design – Pin Swapping

b

a

d

c

ba dc

b

a

d

c

ba dc

Sw

itching activity

Sw

itching activity

ba

dc b

a

dc

Page 12: Chapter 4 Low-Power VLSI Design

12EE4012VLSI DesignNational Central University

Gate-Level Design – Glitching Power

• Glitches spurious transitions due to imbalanced path delays

• A design has more balanced delay paths has fewer glitches, and thus has less power dissipation

• Note that there will be no glitches in a dynamic CMOS logic

A

B

C

D

E

A

B

CD

E

Page 13: Chapter 4 Low-Power VLSI Design

13EE4012VLSI DesignNational Central University

Gate-Level Design – Glitching Power

• A chain structure has more glitches

• A tree structure has fewer glitchesA

B

C

D

A

B

CD

Chain structure

Tree structure

Page 14: Chapter 4 Low-Power VLSI Design

14EE4012VLSI DesignNational Central University

Gate-Level Design – Precomputation

Combinational LogicREG

R1

REG

R2

Combinational LogicREG

R1

REG

R2

Precomputation Logic

Page 15: Chapter 4 Low-Power VLSI Design

15EE4012VLSI DesignNational Central University

Gate-Level Design – Precomputation

1-bit Comparator(MSB)

REGR4

(n-1)-bitComparator

REGR1

REGR2

REGR3

A<n-1>B<n-1>

A<n-2:0>

B<n-2:0>

Precomputation logicEnable

F

Page 16: Chapter 4 Low-Power VLSI Design

16EE4012VLSI DesignNational Central University

Gate-Level Design – Gating Clock

D Q D Q D Q D Q

D Q D Q D Q D Q

T

clk

clk

Fail DFT rule checking

Add control pin to solve DFT violation problem

Page 17: Chapter 4 Low-Power VLSI Design

17EE4012VLSI DesignNational Central University

Gate-Level Design – Input Gating

+

clk

f1

f2

select

Page 18: Chapter 4 Low-Power VLSI Design

18EE4012VLSI DesignNational Central University

Clock-Gating in Low-Power Flip-Flop

D QD

CK

Source: Prof. V. D. Agrawal

Page 19: Chapter 4 Low-Power VLSI Design

19EE4012VLSI DesignNational Central University

Reduced-Power Shift Register

D Q D Q D Q

D QD QD Q

D Q

D Q

D

CK(f/2)

mul

tiple

xer

Output

Flip-flops are operated at full voltage and half the clock frequency.

Source: Prof. V. D. Agrawal

Page 20: Chapter 4 Low-Power VLSI Design

20EE4012VLSI DesignNational Central University

Power Consumption of Shift Register

P = C’VDD2f/n

Degree of parallelism, n1 2 4

Nor

mal

ized

pow

er

1.0

0.5

0.25

0.0

Deg. Of parallelism

Freq (MHz)

Power (μW)

1 33.0 1535

2 16.5 887

4 8.25 738

16-bit shift register, 2μ CMOS

C. Piguet, “Circuit and Logic LevelDesign,” pages 103-133 in W. Nebeland J. Mermet (ed.), Low PowerDesign in Deep SubmicronElectronics, Springer, 1997.

Source: Prof. V. D. Agrawal

Page 21: Chapter 4 Low-Power VLSI Design

21EE4012VLSI DesignNational Central University

Architecture-Level Design – Parallelism

16x16multiplier

RA

RB

fref

fref

32 16x16multiplier

RA

R

fref/2

32

MUXfref/2

16x16multiplier

R

B R

fref/2

fref/2

32

32

fref

16

16

16

16

16

16

Assume that With the same 16x16 multiplier, the power supply can be reduced from Vref to Vref/1.83.

refrefref

refparallel PfV

CP 33.02

)83.1

(2.2 2

Page 22: Chapter 4 Low-Power VLSI Design

22EE4012VLSI DesignNational Central University

Architecture-Level Design – Pipelining

HalfmultiplierREG(A ,B)

fref

32REG

Halfmultiplier

32

refrefref

refpipeline PfV

CP 36.0)83.1

(2.1 2

The hardware between the pipeline stages is reduced then the reference voltage Vref can be reduced to Vnew to maintain the same worst case delay. For example, let a 50MHz multiplier is broken into two equal parts as shown below. The delay between the pipeline stages can be remained at 50MHz when the voltage Vnew is equal to Vref/1.83

Page 23: Chapter 4 Low-Power VLSI Design

23EE4012VLSI DesignNational Central University

Architecture-Level Design – Retiming

Retiming is a transformation technique used to change the locations of delay elements in a circuit without affecting the input/output characteristics of the circuit.

D D 2DD

D

D

2D

y(n)x(n)

w(n)

y(n)x(n)

w1(n)

w2(n)

b

a

(2)

(1) (2)

(1)

a

(2)

(1) (2)

(1)

retiming

Two versions of an IIR filter.

b

Page 24: Chapter 4 Low-Power VLSI Design

24EE4012VLSI DesignNational Central University

Architecture-Level Design – Retiming

REG

fref

REG C3(4ns)

Retiming for pipeline design

C1(6ns)

C2(2ns)

REG

fref

REG C3(4ns)

C1(6ns)

C2(2ns)

Page 25: Chapter 4 Low-Power VLSI Design

25EE4012VLSI DesignNational Central University

Architecture-Level Design – Retiming

Clock cycle is 4 gate delays

Clock cycle is 2 gate delays

Page 26: Chapter 4 Low-Power VLSI Design

26EE4012VLSI DesignNational Central University

Architecture-Level Design – Power Management

C2

C1

C1_FREEZE

C2_FREEZE

C2

C1

C1_FREEZE

C2_FREEZE

Page 27: Chapter 4 Low-Power VLSI Design

27EE4012VLSI DesignNational Central University

Architecture-Level Design – Bus Segmentation

• Avoid the sharing of resources Reduce the switched capacitance

• For example: a global system bus A single shared bus is connected to all modules, this

structure results in a large bus capacitance due to The large number of drivers and receivers sharing the same

bus The parasitic capacitance of the long bus line

• A segmented bus structure Switched capacitance during each bus access is

significantly reduced Overall routing area may be increased

Page 28: Chapter 4 Low-Power VLSI Design

28EE4012VLSI DesignNational Central University

Architecture-Level Design – Bus Segmentation

Cbus

Cbus1

Cbus1

Bus

In

terfa

ce

Page 29: Chapter 4 Low-Power VLSI Design

29EE4012VLSI DesignNational Central University

Algorithmic-Level Design – factivity Reduction

Binary Code Gray Code Decimal Equivalent000001010011100101110111

000001011010110111101100

01234567

Minimization of the switching activity, at high level, is one way to reduce the power dissipation of digital processors.One method to minimize the switching signals, at the algorithmic level, is to use an appropriate coding for the signals rather than straight binary code.The table shown below shows a comparison of 3-bit representation of the binary and Gray codes.

Page 30: Chapter 4 Low-Power VLSI Design

30EE4012VLSI DesignNational Central University

State Encoding for a Counter• Two-bit binary counter:

State sequence, 00 → 01 → 10 → 11 → 00 Six bit transitions in four clock cycles 6/4 = 1.5 transitions per clock

• Two-bit Gray-code counter State sequence, 00 → 01 → 11 → 10 → 00 Four bit transitions in four clock cycles 4/4 = 1.0 transition per clock

• Gray-code counter is more power efficient.G. K. Yeap, Practical Low Power Digital VLSI Design, Boston:Kluwer Academic Publishers (now Springer), 1998.

Source: Prof. V. D. Agrawal

Page 31: Chapter 4 Low-Power VLSI Design

31EE4012VLSI DesignNational Central University

Binary Counter: Original Encoding

Present state Next state

a b A B0 0 0 10 1 1 01 0 1 11 1 0 0

A = a’b + ab’B = a’b’ + ab’

A

B

a

b

CKCLR

Source: Prof. V. D. Agrawal

Page 32: Chapter 4 Low-Power VLSI Design

32EE4012VLSI DesignNational Central University

Binary Counter: Gray Encoding

Present state Next state

a b A B

0 0 0 1

0 1 1 1

1 0 0 0

1 1 1 0

A = a’b + abB = a’b’ + a’b

A

B

a

b

CKCLR

Source: Prof. V. D. Agrawal

Page 33: Chapter 4 Low-Power VLSI Design

33EE4012VLSI DesignNational Central University

Three-Bit Counters

Binary Gray-code

State No. of toggles State No. of toggles

000 - 000 -

001 1 001 1

010 2 011 1

011 1 010 1

100 3 110 1

101 1 111 1

110 2 101 1

111 1 100 1

000 3 000 1

Av. Transitions/clock = 1.75 Av. Transitions/clock = 1

Source: Prof. V. D. Agrawal

Page 34: Chapter 4 Low-Power VLSI Design

34EE4012VLSI DesignNational Central University

N-Bit Counter: Toggles in Counting Cycle

• Binary counter: T(binary) = 2(2N – 1)

• Gray-code counter: T(gray) = 2N

• T(gray)/T(binary) = 2N-1/(2N – 1) → 0.5Bits T(binary) T(gray) T(gray)/T(binary)1 2 2 1.02 6 4 0.66673 14 8 0.57144 30 16 0.53335 62 32 0.51616 126 64 0.5079∞ - - 0.5000

Source: Prof. V. D. Agrawal

Page 35: Chapter 4 Low-Power VLSI Design

35EE4012VLSI DesignNational Central University

FSM State Encoding

11

0100 0.1

0.10.40.3

0.6 0.9

0.6

01

1100 0.1

0.10.40.3

0.6 0.9

0.6

Expected number of state-bit transitions:

1(0.3+0.4+0.1) + 2(0.1) = 1.0

Transitionprobabilitybased on

PI statistics

State encoding can be selected using a power-based cost function.

2(0.3+0.4) + 1(0.1+0.1) = 1.6

Source: Prof. V. D. Agrawal

Page 36: Chapter 4 Low-Power VLSI Design

36EE4012VLSI DesignNational Central University

FSM: Clock-Gating• Moore machine: Outputs depend only on the

state variables. If a state has a self-loop in the state transition

graph (STG), then clock can be stopped whenever a self-loop is to be executed.

Sj

SiSk

Xi/Zk

Xk/Zk

Xj/ZkClock can be stoppedwhen (Xk, Sk) combinationoccurs.

Source: Prof. V. D. Agrawal

Page 37: Chapter 4 Low-Power VLSI Design

37EE4012VLSI DesignNational Central University

Clock-Gating in Moore FSM

Combinationallogic

LatchClock

activationlogic

Flip

-flop

s

PI

CK

PO

L. Benini and G. De Micheli,Dynamic Power Management,Boston: Springer, 1998.

Source: Prof. V. D. Agrawal

Page 38: Chapter 4 Low-Power VLSI Design

38EE4012VLSI DesignNational Central University

Bus Encoding for Reduced Power• Example: Four bit bus

0000 → 1110 has three transitions. If bits of second pattern are inverted, then 0000 →

0001 will have only one transition.

• Bit-inversion encoding for N-bit bus:

Number of bit transitions0 N/2 N

N

N/2

0Num

ber o

f bit

trans

ition

saf

ter i

nver

sion

enc

odin

g

Source: Prof. V. D. Agrawal

Page 39: Chapter 4 Low-Power VLSI Design

39EE4012VLSI DesignNational Central University

Bus-Inversion Encoding Logic

Polaritydecision

logic

Sen

t dat

a

Rec

eive

d da

ta

Bus register

Polarity bitM. Stan and W. Burleson, “Bus-Invert Coding for Low Power I/O,” IEEE Trans. VLSI Systems, vol. 3, no. 1, pp. 49-58, March 1995.

Source: Prof. V. D. Agrawal

Page 40: Chapter 4 Low-Power VLSI Design

40EE4012VLSI DesignNational Central University

RTL-Level Design – Signal Gating

module decoder (a, sel); input [1:0[ a;ouput [3:0] sel;reg [3:0] sel; always @(a) begin

case (a)2’b00: sel=4’b0001;2’b01: sel=4’b0010;2’b10: sel=4’b0100;2’b11: sel=4’b1000;

endcase end

endmodule

module decoder (en,a, sel); input en;input [1:0[ a;ouput [3:0] sel;reg [3:0] sel; always @({en,a}) begin

case ({en,a})3’b100: sel=4’b0001;3’b101: sel=4’b0010;3’b110: sel=4’b0100;3’b111: sel=4’b1000;default: sel=4’b0000;

endcase end

endmodule

Decoder with enableSimple Decoder

Page 41: Chapter 4 Low-Power VLSI Design

41EE4012VLSI DesignNational Central University

RTL-Level Design – Datapath Reordering

Mux

Muxstable

glitchyA<B Mux

Mux

stable

glitchyA<B

ReorderedInitial

Page 42: Chapter 4 Low-Power VLSI Design

42EE4012VLSI DesignNational Central University

RTL-Level Design – Memory Partition

128x32

MUX

32

128x32

dout

32

32

dout

dout

qdpre_addr

clkwriteaddrdin

dinaddrwrite

addr[7:0] addr[7:1]

addr0

8noe

noe

Page 43: Chapter 4 Low-Power VLSI Design

43EE4012VLSI DesignNational Central University

RTL-Level Design – Memory Partition

ARMCore

64K bytes

Data

Addr

R/W

Reads

AddrRange

64K28K 4K 32K

• Application-driven memory partition

Page 44: Chapter 4 Low-Power VLSI Design

44EE4012VLSI DesignNational Central University

RTL-Level Design – Memory Partition

ARMCore

Data

Addr

R/W

CS

Data

Addr

R/W

CS

Data

Addr

R/W

CS

Decoder

• A power-optimal partitioned memory organization