recap: lectures 5 & 6 classic pipeline styles

21
1 Recap: Lectures 5 & 6 Recap: Lectures 5 & 6 Classic Pipeline Styles Classic Pipeline Styles 1. 1. Williams and Horowitz’s Williams and Horowitz’s PS0 PS0 pipeline pipeline 2. 2. Sutherland’s Sutherland’s micropipelines micropipelines

Upload: clark

Post on 25-Feb-2016

69 views

Category:

Documents


3 download

DESCRIPTION

Recap: Lectures 5 & 6 Classic Pipeline Styles. Williams and Horowitz’s PS0 pipeline Sutherland’s micropipelines. Different Points in the Design Space. Williams/Horowitz’s PS0: Dual-rail Data-dependent completion Dynamic logic No extra latches “Zero-overhead” latency - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Recap: Lectures 5 & 6 Classic Pipeline Styles

1

Recap: Lectures 5 & 6Recap: Lectures 5 & 6Classic Pipeline StylesClassic Pipeline Styles

1.1. Williams and Horowitz’s Williams and Horowitz’s PS0 PS0 pipelinepipeline

2.2. Sutherland’s Sutherland’s micropipelinesmicropipelines

Page 2: Recap: Lectures 5 & 6 Classic Pipeline Styles

2

Different Points in the Design Different Points in the Design SpaceSpaceWilliams/Horowitz’s Williams/Horowitz’s PS0:PS0:

Dual-railDual-rail Data-dependent Data-dependent

completioncompletion Dynamic logicDynamic logic No extra latchesNo extra latches ““Zero-overhead” latencyZero-overhead” latency 4-phase handshakes: 4-phase handshakes:

resetting overheadresetting overhead

Sutherland’s Sutherland’s micropipelines:micropipelines: Single-railSingle-rail Worst case matched Worst case matched

delaydelay Statuc logicStatuc logic Explicit latchesExplicit latches Latch latencies = Latch latencies =

overheadoverhead Elegant transition Elegant transition

signalingsignaling

Page 3: Recap: Lectures 5 & 6 Classic Pipeline Styles

3Precharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsComplete cycle: Complete cycle: 6 events6 events

indicates “done”indicates “done”

PRECHARGE N:PRECHARGE N: when N+1 completes evaluationwhen N+1 completes evaluationdelete data:delete data: afterafter next stage has copied it next stage has copied it

EVALUATE N:EVALUATE N: when N+1 completes prechargingwhen N+1 completes prechargingaccept new data: accept new data: after after next stage is emptiednext stage is emptied

PS0 ProtocolPS0 Protocol

11 22 33

44

55

66

evaluatesevaluates evaluatesevaluates evaluatesevaluates

indicates “done”indicates “done”

prechargesprecharges

indicates “done”indicates “done”33

Evaluate Evaluate Precharge: Precharge: 3 events3 events

NN N+1N+1 N+2N+2

Page 4: Recap: Lectures 5 & 6 Classic Pipeline Styles

4

PS0 PerformancePS0 Performance

TEVAL Evaluation TimeTPRECH Precharge TimeTDETECT Completion Detection Time

11 22 33

44

55

66

DETECTPRECHEVAL TTT 23Cycle Time =Cycle Time =

Page 5: Recap: Lectures 5 & 6 Classic Pipeline Styles

5

Drawbacks of PSO PipeliningDrawbacks of PSO Pipelining1.1. Poor throughput:Poor throughput:

long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time

2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one

spacerspacer

Our Research Goals: Our Research Goals: address both issuesaddress both issues still maintain very low latencystill maintain very low latency

Page 6: Recap: Lectures 5 & 6 Classic Pipeline Styles

6

Lecture 7: Lecture 7: Recent ApproachesRecent Approaches

Page 7: Recap: Lectures 5 & 6 Classic Pipeline Styles

7

Recent ApproachesRecent Approaches3 novel styles for high-speed async pipelining:3 novel styles for high-speed async pipelining:

““Lookahead Pipelines”Lookahead Pipelines” (LP) (LP) [Singh/Nowick, Async-00][Singh/Nowick, Async-00] ““High-Capacity Pipelines”High-Capacity Pipelines” (HC) (HC) [Singh/Nowick, [Singh/Nowick,

WVLSI-00]WVLSI-00] MOUSETRAP Pipelines MOUSETRAP Pipelines [Singh/Nowick, TAU-00][Singh/Nowick, TAU-00]

Goal:Goal: significantly improve throughput of PS0significantly improve throughput of PS0Two Distinct Strategies:Two Distinct Strategies:

LP: LP: introduceintroduce protocol optimizations protocol optimizations““shave off”shave off” components from critical cycle components from critical cycle

HC: HC: fundamentally new protocolfundamentally new protocolgreater concurrency: “loosely-coupled” stagesgreater concurrency: “loosely-coupled” stages

Page 8: Recap: Lectures 5 & 6 Classic Pipeline Styles

8

OutlineOutline New Asynchronous Pipelines: New Asynchronous Pipelines:

LLookahead ookahead PPipelines (LP)ipelines (LP) HHigh-igh-CCapacity Pipelines (HC)apacity Pipelines (HC) MOUSETRAP PipelinesMOUSETRAP Pipelines

Dynamic circuit styleDynamic circuit style

Static circuit styleStatic circuit style

Page 9: Recap: Lectures 5 & 6 Classic Pipeline Styles

9

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #1#1Use non-neighbor communication:Use non-neighbor communication:

stage receives information stage receives information from from multiple later multiple later stagesstages

allows allows “early evaluation” “early evaluation”

Benefit:Benefit: stage gets stage gets head-starthead-start on next on next cyclecycle

Page 10: Recap: Lectures 5 & 6 Classic Pipeline Styles

10

Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #2#2Use early completion detection:Use early completion detection:

completion detector completion detector moved before stagemoved before stage (not after) (not after) stage indicatesstage indicates “early done”“early done” in parallel with in parallel with

computationcomputation

Benefit:Benefit: again, stage gets again, stage gets head-starthead-start on on next cyclenext cycle

early completion detectorearly completion detector

Page 11: Recap: Lectures 5 & 6 Classic Pipeline Styles

11

Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:

““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”

““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done” LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”

Page 12: Recap: Lectures 5 & 6 Classic Pipeline Styles

12

Optimization = Optimization = “early evaluation”“early evaluation” each stage has two control inputs: from stages N+1 and N+2each stage has two control inputs: from stages N+1 and N+2

Idea: Idea: shorten precharge phaseshorten precharge phase terminate precharge terminate precharge early:early: when N+2 is done evaluating when N+2 is done evaluating

Dual-Rail Design #1: Dual-Rail Design #1: LP3/1LP3/1

Datain Data

out

PCPC EvalEval

From N+2From N+2

NN N+1N+1 N+2N+2Processing

BlockCompletion

Detector

Page 13: Recap: Lectures 5 & 6 Classic Pipeline Styles

13

LP3/1 ProtocolLP3/1 Protocol PRECHARGEPRECHARGE N:N: when N+1 completes when N+1 completes

evaluationevaluation EVALUATEEVALUATE N:N: whenwhen N+2N+2 completes completes

evaluationevaluationNew!New!

11 22 33

Enables “early evaluation!”Enables “early evaluation!”

44

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

N+2 indicates “done”N+2 indicates “done”

N+2 evaluatesN+2 evaluates

NN N+1N+1 N+2N+2

N+1 indicates “done”N+1 indicates “done”33

Page 14: Recap: Lectures 5 & 6 Classic Pipeline Styles

14

PS0PS0

LP3/1LP3/1

LP3/1: Comparison with PS0LP3/1: Comparison with PS0

55

44

4466

NN N+1N+1 N+2N+2

NN N+1N+1 N+2N+2

Enables “early evaluation!”Enables “early evaluation!”

11

11

evaluatesevaluates

evaluatesevaluates22

22

evaluatesevaluates

evaluatesevaluates33

33evaluatesevaluates

evaluatesevaluatesOnly 4 events in cycle!Only 4 events in cycle!

6 events in cycle6 events in cycle

PRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluation

33indicates “done”indicates “done”

indicates “done”indicates “done”33

EVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluation

EVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes precharging

Page 15: Recap: Lectures 5 & 6 Classic Pipeline Styles

15

11 22 33

44

LP3/1 PerformanceLP3/1 Performance

DETECTEVAL TT 3Cycle Time =Cycle Time =

saved pathsaved path

Savings over PS0:Savings over PS0: 1 Precharge + 1 Completion Detection1 Precharge + 1 Completion Detection

Page 16: Recap: Lectures 5 & 6 Classic Pipeline Styles

16

LP3/1: Inside a StageLP3/1: Inside a Stage

Precharge Precharge whenwhen PC=1PC=1(and Eval=0)(and Eval=0)

Evaluate Evaluate “early”“early” whenwhen Eval=1Eval=1(or PC=0)(or PC=0)

PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

A NAND gate mergesA NAND gate merges2 control inputs:2 control inputs:

Problem: Problem: “early”“early” Eval=1Eval=1 is non- is non-persistent!persistent!

may be de-asserted may be de-asserted beforebefore stage completes stage completes evaluation!evaluation!

Merging 2 Control Inputs:Merging 2 Control Inputs:

““early Eval”early Eval”

““old Eval”old Eval”

Page 17: Recap: Lectures 5 & 6 Classic Pipeline Styles

17

LP3/1 Timing Constraints: LP3/1 Timing Constraints: ExampleExample

Observation:Observation: PC=0PC=0 soon aftersoon after Eval=1, Eval=1, and is persistentand is persistentSolution:Solution: no change!no change!

use PC as safeuse PC as safe “takeover”“takeover” for Eval!for Eval!Timing Constraint:Timing Constraint: PC=0PC=0 must arrivemust arrive beforebefore Eval de-assertedEval de-asserted

simple one-sided timing requirementsimple one-sided timing requirementother constraints as well… all easily satisfied in practiceother constraints as well… all easily satisfied in practice

PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)

NANDNAND

Problem (cont.):Problem (cont.): “early”“early” Eval=1Eval=1 non-persistent non-persistent

Page 18: Recap: Lectures 5 & 6 Classic Pipeline Styles

18

Dual-Rail Design #2: Dual-Rail Design #2: LP2/2LP2/2Optimization = Optimization = “early done”“early done”

Idea: move completion detector Idea: move completion detector beforebefore processing processing blockblockstage indicates whenstage indicates when “about to”“about to” precharge/evaluateprecharge/evaluate

ProcessingBlock

“early” Completion

Detector

Datain

Dataout

“early done”

Page 19: Recap: Lectures 5 & 6 Classic Pipeline Styles

19

11 22

44

LP2/2 ProtocolLP2/2 ProtocolCompletion Detection:Completion Detection:

performedperformed in parallel in parallel with evaluation/precharge of with evaluation/precharge of stagestage

N evaluatesN evaluates N+1 evaluatesN+1 evaluates

NN N+1N+1 N+2N+2

22

““early done”early done”of N+1 evalof N+1 eval

33

33

““early done”early done”of N+2 evalof N+2 eval

““early done”early done”of N+1 prechof N+1 prech

Page 20: Recap: Lectures 5 & 6 Classic Pipeline Styles

20

LP2/2 PerformanceLP2/2 Performance

11 22

3344

LP2/2 savings over PS0: LP2/2 savings over PS0: 1 Evaluation + 1 Precharge1 Evaluation + 1 Precharge

DETECTEVAL TT 22Cycle Time =Cycle Time =

Page 21: Recap: Lectures 5 & 6 Classic Pipeline Styles

21

Dual-Rail Design #3: Dual-Rail Design #3: LP2/1LP2/1Hybrid of LP3/1 and LP2/2…Hybrid of LP3/1 and LP2/2…Combines:Combines:

early evaluationearly evaluation of LP3/1of LP3/1 early doneearly done of LP2/2of LP2/2

Cycle time:Cycle time: Best of our dual-rail lookahead Best of our dual-rail lookahead pipelines… pipelines…