ee241b : advanced digital circuitsee241b/sp20/lectures/lecture21-dvs2-1up.pdfmulti-v dd clock gating...

Post on 19-Sep-2020

15 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

inst.eecs.berkeley.edu/~ee241b

Borivoje Nikolić

EE241B : Advanced Digital Circuits

Lecture 21 – DVS II

1EECS241B L21 DVS2

April 8, NY Times: Computers Already Learn From Us. But Can They Teach

Themselves?

Scientists are exploring approaches that would help machines develop their own sort of common sense

Features Pieter Abbeel and Sergey Levine from UC Berkeley

Announcements

• Assignment 4 due next Friday.

• Reading• T. Burd, et al, JSSC, Nov 2000.

2EECS241B L21 DVS2

Outline

•Module 5• Dynamic voltage and frequency scaling

3EECS241B L21 DVS2

5.F Dynamic Voltage Scaling

4EECS241B L21 DVS2

Power /Energy Optimization Space

Constant Throughput/Latency Variable Throughput/Latency

Energy Design Time Sleep Mode Run Time

Active

Logic designScaled VDD

Trans. sizingMulti-VDD

Clock gatingDFS, DVS

Leakage

Stack effectsTrans sizingScaling VDD

+ Multi-VTh

Sleep T’sMulti-VDD Variable VTh

+ Input control

DVSVariable VTh

EECS241B L21 DVS2 5

Adaptive Supply Voltages

EECS241B L21 DVS2 6

Processors for Portable Devices

1000

100

10

1Perfo

rman

ce (M

IPS)

Processor Energy (Watt*sec)1 100.1

• Eliminate performance energy trade-off

PDAs

Pocket-PCs

NotebookComputers

DynamicVoltageScaling

BurdISSCC’00

EECS241B L21 DVS2 7

Typical MPEG IDCT Histogram

EECS241B L21 DVS2 8

timeSystem Idle

DesiredThroughput

Maximum Processor Speed

Background andhigh-latency processes

Compute-intensive andlow-latency processes

• Maximize Peak Throughput• Minimize Average Energy/operation

System Optimizations:BurdISSCC’00

Processor Usage Model

EECS241B L21 DVS2 9

Compute ASAP:

Deliv

ered

Thr

ough

put

Clock Frequency Reduction:

Excessthroughput

Always high throughput

Energy/operation remains unchanged…while throughput scaled down with fCLK

fCLKReduced

time

time

Common Design Approaches (Fixed VDD)

EECS241B L21 DVS2 10

0

0.5

1

0 0.5 1

Ener

gy/o

pera

tion

Throughput ( fCLK)

Constant supply voltage

Reduce VDD, slowcircuits down.

~10x EnergyReduction

3.3V

1.1V

BurdISSCC’00

Scale VDD with Clock Frequency

EECS241B L21 DVS2 11

InverterRingOscRegFileSRAM

1.0

0.5

0VT 2VT 3VT 4VT

Norm

alize

d m

ax. f

CLK

VDD

Delay tracks within +/- 10%BurdISSCC’00

CMOS Circuits Track Over VDD

EECS241B L21 DVS2 12

time

• Dynamically scale energy/operation with throughput.• Always minimize speed minimize average energy/operation.• Extend battery life up to 10x with the exact same hardware!

Vary fCLK,VDDDe

liver

edTh

roug

hput

1 2 Dynamically adapt

BurdISSCC’00

Dynamic Voltage Scaling (DVS)

EECS241B L21 DVS2 13

• DVS requires a voltage scheduler (VS).• VS predicts workload to estimate CPU cycles.• Applications supply completion deadlines.

0

20

40

60

80

0 0.2 0.4 0.6 1.0 1.2 1.40.8

Processor Speed (MPEG)

F DESI

RED

(MHz

)

Time (sec)

Operating System Sets Processor Speed

DESIREDCPU cycles F

time

EECS241B L21 DVS2 14

RST Counter

Latch

Digital Loop Filter

L CDD

VDD

PENAB

NENABFERR

FMEAS

f1MHz

0110

100 FDES

+Register

fCLK

Ring Oscillator Processor

IDD

• Feedback loop sets VDD so that FERR 0.• Ring oscillator delay-matched to CPU critical paths.• Custom loop implementation Can optimize CDD.

7

Buck converter

Set byO.S.

BurdISSCC’00

Converter Loop Sets VDD, fCLK

EECS241B L21 DVS2 15

• Circuit design constraints. (Functional verification)

• Circuit delay variation. (Timing verification)

• Noise margin reduction. (Power grid, coupling)

• Delay sensitivity. (Local power distribution)

Design verification complexity similar to high-performance processor design @ fixed VDD

Design Over Wide Range of Voltages

EECS241B L21 DVS2 16

• Cannot use NMOS pass gates – fails for VDD < 2VT.

• Functional verification only needed at one VDD value.

InverterRingOscRegFileSRAM

1.0

0.5

0VT 2VT 3VT 4VT

Norm

alize

d m

ax. f

CLK

VDD

BurdISSCC’00

Delay Variation & Circuit Constraints

EECS241B L21 DVS2 17

+40

+20

0

-20Perc

ent D

elay V

ariat

ion

VDDVT 2VT 3VT 4VT

• Timing verification only needed at min. & max. VDD.

Delay relative to ring oscillator

Gate

Interconnect

DiffusionSeries

Four extreme cases ofcritical paths:

All vary monotonically with VDD.

BurdISSCC’00

Relative Delay Variation

EECS241B L21 DVS2 18

Multiple Path Tracking

A. Drake, ISSCC’07EECS241B L21 DVS2 19

Multiple Path Tracking

Cho, ISSCC’16EECS241B L21 DVS2 20

Tracking with SRAM in Critical Path

Mismatch between logic and SRAM

SRAM multiplictive replica

Niki, JSSC’11EECS241B L21 DVS2 21

Alternative: Error Detection

Bull, ISSCC’2010EECS241B L21 DVS2 22

• Static CMOS logic.

• Ring oscillator.

• Dynamic logic (& tri-state busses).

• Sense amp (& memory cell).

Max. allowed |dVDD/dt| Min. CDD = 100nF (0.6m)Circuits continue to properly operate as VDD changes

Design for Dynamically Varying VDD

EECS241B L21 DVS2 23

VDD

• Static CMOS robustly operates with varying VDD.

Vin = 0 Vout = VDDrds|PMOS

CL

Vout

max. = 4ns

0.6m CMOS: |dVDD/dt| < 200V/s

Static CMOS Logic

EECS241B L21 DVS2 24

Ring Oscillator

• Output fCLK instantaneously adapts to new VDD.

60 80 100 120 140 160 180 200 220 240 260

0

1

2

3

4Vo

lts

Time (ns)

fCLK

VDD

Simulated with dVDD/dt = 20V/s

EECS241B L21 DVS2 25

VDD

Vout

Vin

clk

clk

Volts

Time

VoutVDDFalse logic low: VDD > VTP

Latch-up: VDD > Vbe

Errors

• Cannot gate clock in evaluation state.• Tri-state busses fail similarly Use hold circuit.

0.6m CMOS: |dVDD/dt| < 20V/s

clk = 1

VDD

VDD

Dynamic Logic

EECS241B L21 DVS2 26

100

80

60

40

20

00 1 2 3 4 5 6

Dhry

ston

e 2.1

MIPS

Energy (mW/MIPS)

85 MIPS @5.6 mW/MIPS

(3.8V)

6 MIPS @0.54 mW/MIPS

(1.2V)

• Dynamic operation can increase energy efficiency > 10x.

x

Static VDD

Dynamic VDD

BurdISSCC’00

Measured System Performance & Energy

EECS241B L21 DVS2 27

VDD-Hopping

MPEG-4 encoding

Nor

mal

ized

pow

er

0

0.2

0.4

0.6

0.8

1

2 3 8

# of frequency levels1

Transition time

between ƒ levels

= 200µs

Time

n-th slice finished hereNext milestone

#n #n+1

Application slicing and software feedback guarantee real-time operation.

Two hopping levels are sufficient.EECS241B L21 DVS2 28

Dithering Between Supply Levels

Keller et al, ESSCIRC’16

EECS241B L21 DVS2 29

• Done with switched-capacitor DC-DC converters which efficiently work only at discrete levels

Dithering Between Supply Levels

• Dithering fills in between fixed DC-DC modes

EECS241B L21 DVS2 30

Keller et al, ESSCIRC’16

Next Lecture

• Low-power design• Clock gating

• Power gating

EECS241B L21 DVS2 31

top related