vlsi digital circuits

Upload: arash-mazandarani

Post on 04-Apr-2018

248 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 VLSI Digital Circuits

    1/22

    CSE477 L26 System Power.1 Irwin&Vijay, PSU, 2002

    CSE477

    VLSI Digital CircuitsFall 2002

    Lecture 26: Low Power Techniques

    inMicroarchitectures and Memories

    Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~cg477

    [Adapted from Rabaeys Digital Integrated Circuits, 2002, J. Rabaey et al.]

    http://www.cse.psu.edu/~mjihttp://www.cse.psu.edu/~cg477http://www.cse.psu.edu/~cg477http://www.cse.psu.edu/~mji
  • 7/30/2019 VLSI Digital Circuits

    2/22

    CSE477 L26 System Power.2 Irwin&Vijay, PSU, 2002

    Review: Energy & Power Equations

    E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage

    P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage

    Dynamic power

    (~90% today anddecreasingrelatively)

    Short-circuit

    power(~8% today and

    decreasingabsolutely)

    Leakage power

    (~2% today andincreasing)

    f01 = P01 * fclock

  • 7/30/2019 VLSI Digital Circuits

    3/22

    CSE477 L26 System Power.3 Irwin&Vijay, PSU, 2002

    Power and Energy Design Space

    ConstantThroughput/Latency

    VariableThroughput/Latency

    Energy Design Time Non-active Modules Run Time

    Active Logic Design

    Reduced Vdd

    Sizing

    Multi-Vdd

    Clock Gating DFS, DVS

    (Dynamic Freq,VoltageScaling)

    Leakage + Multi-VT Sleep Transistors

    Multi-Vdd

    Variable VT

    + Variable VT

  • 7/30/2019 VLSI Digital Circuits

    4/22

    CSE477 L26 System Power.4 Irwin&Vijay, PSU, 2002

    Bus Multiplexing

    Share long data buses with time multiplexing (S1 uses even

    cycles, S2 odd)

    S2

    S1D1

    D2

    S1

    S2 D2

    D1

    Buses are a significant source of power dissipation due tohigh switching activities and large capacitive loadingq 15% of total power in Alpha 21064

    q 30% of total power in Intel 80386

    But what if data samples are correlated (e.g., sign bits)?

  • 7/30/2019 VLSI Digital Circuits

    5/22

    CSE477 L26 System Power.5 Irwin&Vijay, PSU, 2002

    Correlated Data Streams

    0

    0.5

    1

    14 12 10 8 6 4 2 0

    Muxed

    Dedicated

    Bit positionMSB LSB

    Bitswitching

    probabilities

    For a shared (multiplexed)

    bus advantages of datacorrelation are lost (buscarries samples from twouncorrelated datastreams)q Bus sharing should not be

    used forpositivelycorrelated data streams

    q Bus sharing may proveadvantageous in a

    negatively correlated datastream (where successivesamples switch sign bits) -more random switching

  • 7/30/2019 VLSI Digital Circuits

    6/22

    CSE477 L26 System Power.6 Irwin&Vijay, PSU, 2002

    Glitch Reduction by Pipelining Glitches depend on the logic depth of the circuit - gates

    deeper in the logic network are more prone to glitchingq arrival times of the gate inputs are more spread due to delay

    imbalancesq usually affected more by primary input switching

    Reduce logic depth by adding pipeline registers

    q additional energy used by the clock and pipeline registers

    PC

    Fetch Decode Execute Memory WriteBack

    Instruction

    MA

    R

    MD

    R

    I$ D$

    clk

    pipelinestage

    isolationregister

  • 7/30/2019 VLSI Digital Circuits

    7/22CSE477 L26 System Power.7 Irwin&Vijay, PSU, 2002

    Power and Energy Design Space

    ConstantThroughput/Latency

    VariableThroughput/Latency

    Energy Design Time Non-active Modules Run Time

    Active Logic Design

    Reduced Vdd

    Sizing

    Multi-Vdd

    Clock Gating DFS, DVS

    (Dynamic Freq,VoltageScaling)

    Leakage + Multi-VT Sleep Transistors

    Multi-Vdd

    Variable VT

    + Variable VT

  • 7/30/2019 VLSI Digital Circuits

    8/22CSE477 L26 System Power.8 Irwin&Vijay, PSU, 2002

    Clock Gating

    Gate off clock to idle functionalunitsq

    e.g., floating point unitsq need logic to generate

    disablesignal

    - increases complexity of control logic

    - consumes power

    - timing critical to avoid clock glitchesat

    OR gate output

    q additional gate delay on clock signal

    - gating OR gate can replace a buffer

    in the clock distribution tree

    Most popular method for power reduction of clock signalsand functional units

    Reg

    clockdisable

    Functional

    unit

  • 7/30/2019 VLSI Digital Circuits

    9/22CSE477 L26 System Power.9 Irwin&Vijay, PSU, 2002

    Clock Gating in a Pipelined Datapath

    For idle units (e.g., floating point units in Exec stage, WBstage for instructions with no write back operation)

    PC

    Fetch Decode Execute Memory WriteBack

    Instruction

    MAR

    MDRI$ D$

    clk

    No FP No WB

  • 7/30/2019 VLSI Digital Circuits

    10/22CSE477 L26 System Power.10 Irwin&Vijay, PSU, 2002

    Power and Energy Design Space

    ConstantThroughput/Latency

    VariableThroughput/Latency

    Energy Design Time Non-active Modules Run Time

    Active Logic Design

    Reduced Vdd

    Sizing

    Multi-Vdd

    Clock Gating DFS, DVS

    (Dynamic Freq,VoltageScaling)

    Leakage + Multi-VT Sleep Transistors

    Multi-Vdd

    Variable VT

    + Variable VT

  • 7/30/2019 VLSI Digital Circuits

    11/22CSE477 L26 System Power.11 Irwin&Vijay, PSU, 2002

    Review: Dynamic Power as a Function of VDD

    Decreasing the VDD

    decreases dynamicenergy consumption(quadratically)

    But, increases gatedelay (decreasesperformance)

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

    VDD (V)

    tp

    (nor

    ma

    liz

    ed)

    Determine the critical path(s) at design time and use highVDD for the transistors on those paths for speed. Use alower VDD on the other logic to reduce dynamic energyconsumption.

  • 7/30/2019 VLSI Digital Circuits

    12/22CSE477 L26 System Power.12 Irwin&Vijay, PSU, 2002

    Dynamic Frequency and Voltage Scaling

    Intels SpeedStepq

    Hardware that steps down the clock frequency (dynamic frequencyscaling DFS) when the user unplugs from AC power

    - PLL from 650MHz 500MHz

    q CPU stalls during SpeedStep adjustment

    Transmeta LongRunq Hardware that applies both DFS and DVS (dynamic supply

    voltage scaling)

    - 32 levels of VDD from 1.1V to 1.6V

    - PLL from 200MHz 700MHz in increments of 33MHz

    q Triggered when CPU load change is detected by software- heavier load ramp up VDD, when stable speed up clock

    - lighter load slow down clock, when PLL locks onto new rate,ramp down VDD

    q CPU stalls only during PLL relock (< 20 microsec)

  • 7/30/2019 VLSI Digital Circuits

    13/22CSE477 L26 System Power.13 Irwin&Vijay, PSU, 2002

    Dynamic Thermal Management (DTM)

    Initiation Mechanism:How do we enable

    technique?

    Response Mechanism:What technique do we

    enable?

    Trigger Mechanism:When do we enable

    DTM techniques?

  • 7/30/2019 VLSI Digital Circuits

    14/22

    CSE477 L26 System Power.14 Irwin&Vijay, PSU, 2002

    DTM Trigger Mechanisms

    Mechanism: How to deducetemperature?

    Direct approach: on-chiptemperature sensorsq Based on differential voltage

    change across 2 diodes of

    different sizesq May require >1 sensor

    q Hysteresis and delay areproblems

    Policy: When to beginresponding?q Trigger level set too high

    means higher packagingcosts

    q Trigger level set too lowmeans frequent triggering

    and loss in performance

    Choose trigger level toexploit difference betweenaverage and worst case

    power

  • 7/30/2019 VLSI Digital Circuits

    15/22

    CSE477 L26 System Power.15 Irwin&Vijay, PSU, 2002

    DTM Initiation and Response Mechanisms

    Operating system or microarchitectural control?

    q Hardware support can reduce performance penalty by 20-30%

    Initiation of policy incurs some delayq When using DVS and/or DFS, much of the performance penalty

    can be attributed to enabling/disabling overhead

    q Increasing policy delay reduces overhead; smarter initiationtechniques would help as well

    Thermal window (100Kcycles+)q Larger thermal windows smooth short thermal spikes

  • 7/30/2019 VLSI Digital Circuits

    16/22

    CSE477 L26 System Power.16 Irwin&Vijay, PSU, 2002

    DTM Activation and Deactivation Cycle

    Trigger

    Reached

    Response Delay Invocation time (e.g., adjust clock)

    ResponseDelay

    Policy Delay Number of cycles engaged

    PolicyDelay

    Check

    Temp

    Check

    Temp

    Turn

    ResponseOn

    InitiationDelay

    Initiation Delay OS interrupt/handler

    ShutoffDelay

    Turn

    ResponseOff

    Shutoff Delay Disabling time (e.g., re-adjust clock)

  • 7/30/2019 VLSI Digital Circuits

    17/22

  • 7/30/2019 VLSI Digital Circuits

    18/22

    CSE477 L26 System Power.18 Irwin&Vijay, PSU, 2002

    Power and Energy Design Space

    ConstantThroughput/Latency

    VariableThroughput/Latency

    Energy Design Time Non-active Modules Run Time

    Active Logic Design

    Reduced Vdd

    Sizing

    Multi-Vdd

    Clock Gating DFS, DVS

    (Dynamic Freq,VoltageScaling)

    Leakage + Multi-VT Sleep Transistors

    Multi-Vdd

    Variable VT

    + Variable VT

  • 7/30/2019 VLSI Digital Circuits

    19/22

    CSE477 L26 System Power.19 Irwin&Vijay, PSU, 2002

    Speculated Power of a 15mm P

    0.25 , 15mm die, 2V

    0% 0% 0% 0% 1% 1% 1% 2% 3%

    -

    10

    20

    30

    4050

    60

    70

    30 40 50 60 70 80 90 100

    110

    Temp (C)

    Power(Watts

    Leakage

    Active

    0.18 , 15mm die, 1.4V

    0% 0% 1% 1% 2% 3% 5%7% 9%

    -

    10

    20

    30

    40

    50

    60

    70

    30 40 50 60 70 80 90 100

    110

    Temp (C)

    Power(Watts

    Leakage

    Active

    0.13 , 15mm die. 1V

    1% 2% 3% 5%8% 11%

    15%20%

    26%

    -

    10

    20

    30

    40

    50

    60

    70

    30 40 50 60 70 80 90 100

    110

    Temp (C)

    Power

    (Watts

    Leakage

    Active

    0.1 , 15mm die, 0.7V

    6% 9%14%

    19%26%

    33%

    41% 49% 56%

    -

    10

    20

    30

    40

    50

    60

    70

    30 40 50 60 70 80 90 100

    110

    Temp (C)

    Power

    (Watts

    Leakage

    Active

  • 7/30/2019 VLSI Digital Circuits

    20/22

    CSE477 L26 System Power.20 Irwin&Vijay, PSU, 2002

    Review: Leakage as a Function of Design Time VT

    Reducing the VT

    increases the sub-threshold leakagecurrent (exponentially)

    But, reducing VT

    decreases gate delay(increases performance)

    0 0.2 0.4 0.6 0.8 1

    VGS (V)

    ID(

    A)

    VT=0.4VVT=0.1V

    Determine the critical path(s) at design time and use lowVT devices on the transistors on those paths for speed.Use a high VT on the other logic for leakage control.

  • 7/30/2019 VLSI Digital Circuits

    21/22

    CSE477 L26 System Power.21 Irwin&Vijay, PSU, 2002

    Review: Variable VT (ABB) at Run Time

    VT = VT0 + (|-2F + VSB| - |-2F|)

    where VT0 is the threshold voltage at VSB = 0VSB is the source-bulk (substrate) voltage

    is the body-effect coefficient

    0.4

    0.45

    0.5

    0.55

    0.6

    0.65

    0.7

    0.75

    0.8

    0.85

    0.9

    -2.5 -2 -1.5 -1 -0.5 0

    VSB (V)

    VT

    (V)

    For an n-channel device,

    the substrate is normally tiedto ground

    A negative bias causes VT

    to increase from 0.45V to

    0.85V Adjusting the substratebias at run time is calledadaptive body-biasing (ABB)

  • 7/30/2019 VLSI Digital Circuits

    22/22

    CSE477 L26 System Power 22 Irwin&Vijay PSU 2002

    Next Lecture and Reminders Next lecture

    q System level interconnect

    - Reading assignment Rabaey, et al, xx

    Remindersq Project final reports due December 5th

    q Final grading negotiations/correction (except for the finalexam) must be concluded by December 10th

    q Final exam scheduled

    - Monday, December 16th from 10:10 to noon in 118 and 121Thomas