greendroid: an architecture for the dark silicon...

41
GreenDroid: An Architecture for the Dark Silicon Age Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio, Steven Swanson, Michael Bedford Taylor University of California, San Diego

Upload: others

Post on 31-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

GreenDroid: An Architecture for the Dark Silicon Age

Nathan Goulding-Hotta, Jack Sampson, Qiaoshi Zheng, Vikram Bhatt, Joe Auricchio,

Steven Swanson, Michael Bedford Taylor

University of California, San Diego

Page 2: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

This Talk

The Dark Silicon Problem

How to use Dark Silicon to improve energy efficiency (Conservation Cores)

The GreenDroid Mobile Application Processor

Page 3: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Where does dark silicon come from? And how dark is it going to be?

The Utilization Wall:

With each successive process generation, the percentage of a chip that can switch at full frequency drops exponentially due to power constraints.

[Venkatesh, ASPLOS ‘10]

Page 4: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Scaling 101: Moore’s Law

90 65 45 32 22 16 11 8 nm

S = 22

16 = ~1.4x

Page 5: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

16 cores 64 cores

MIT Raw Tilera TILE64

180 nm 90 nm S = 2x Transistors = 4x

Scaling 101: Transistors scale as S2

Page 6: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Advanced Scaling: Dennard: “Computing Capabilities Scale by S3 = 2.8x”

S

S2

S3

1 Design of Ion-Implanted MOSFETs with Very Small Dimensions Dennard et al, 1974

If S=1.4x …

Page 7: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

S

S2

S3

1

S2 = 2x More Transistors

If S=1.4x …

Advanced Scaling: Dennard: “Computing Capabilities Scale by S3 = 2.8x”

Page 8: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

S

S2

S3

1

S2 = 2x More Transistors

S = 1.4x Faster Transistors

If S=1.4x …

Advanced Scaling: Dennard: “Computing Capabilities Scale by S3 = 2.8x”

Page 9: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

S

S2

S3

1

S2 = 2x More Transistors

S = 1.4x Faster Transistors

But wait: switching 2.8x times as many transistors per unit time – what about power??

If S=1.4x …

Advanced Scaling: Dennard: “Computing Capabilities Scale by S3 = 2.8x”

Page 10: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Dennard: “We can keep power consumption constant”

S

S2

S3

1

S2 = 2x More Transistors

S = 1.4x Faster Transistors

S = 1.4x Lower Capacitance

Page 11: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Dennard: “We can keep power consumption constant”

S

S2

S3

1

S2 = 2x More Transistors

S = 1.4x Faster Transistors

S = 1.4x Lower Capacitance

Scale Vdd by S=1.4x S2 = 2x

Page 12: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Fast forward to 2005: Threshold Scaling Problems due to Leakage Prevents Us From Scaling Voltage

S

S2

S3

1

S2 = 2x More Transistors

S = 1.4x Faster Transistors

S = 1.4x Lower Capacitance

Scale Vdd by S=1.4x S2 = 2x

Page 13: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Full Chip, Full Frequency Power Dissipation Is increasing exponentially by 2x with every process generation

S

S2

S3

1

Factor of S2

= 2X shortage!!

[ASPLOS 2010, Venkatesh]

Page 14: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Multicore has a lot of Dark Silicon in its Future

4 cores @ 1.8 GHz

4 cores @ 2x1.8 GHz (12 cores dark)

2x4 cores @ 1.8 GHz (8 cores dark, 8 dim)

(Industry’s Choice, next slide)

.…

65 nm 32 nm

.…

.…

Spectrum of tradeoffs between # of cores and frequency

Example: 65 nm 32 nm (S = 2)

[Hotchips 2010]

Page 15: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Multicore has a lot of Dark Silicon in its Future

4 cores @ 1.8 GHz

4 cores @ 2x1.8 GHz (12 cores dark)

2x4 cores @ 1.8 GHz (8 cores dark, 8 dim)

(Industry’s Choice, next slide)

.…

65 nm 32 nm

.…

.…

Spectrum of tradeoffs between # of cores and frequency

Example: 65 nm 32 nm (S = 2)

[Hotchips 2010]

The utilization wall will change the way everyone builds chips.

Page 16: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

This Talk

The Dark Silicon Problem

How to use Dark Silicon to improve energy efficiency (Conservation Cores)

The GreenDroid Mobile Application Processor

Page 17: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

17

What do we do with Dark Silicon?   Idea: Leverage dark silicon to “fight” the

utilization wall

  Insights: –  Power is now more expensive than area –  Specialized logic can improve energy efficiency by

10-1000x

  Our Approach: –  Fill dark silicon with Conservation Cores, or c-cores,

which are specialized energy-saving coprocessors that save energy on common apps –  Execution jumps from c-core to c-core –  Power-gate c-cores that are not currently in use

  Conservation Cores provide an architectural way to trade dark area for an effective increase in power budget!

Dark Silicon

Page 18: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

  Although they also tend to be more energy-efficient, accelerators focus on attaining speedup and target codes which are relatively easy to parallelize: medium or high parallelism, high regularity and a relatively small code base.

  Many applications use irregular, non-parallelizable code:

  Amdahl's Law: Overall energy efficiency depends on the fraction of the total code execution that is optimized!

  To gain large energy savings through specialization: –  We need to target irregular code as well as regular code, and –  We need many, many such coprocessors to get high coverage

•  need to solve both design effort and architectural scalability problems

C-cores compared to Accelerators

Page 19: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Conservation Cores (C-cores)   Specialized coprocessors for

reducing energy in irregular code –  Hot code implemented by c-cores,

cold code runs on host CPU; –  C-cores use up to 18x less energy –  Shared D-cache Coherent Memory –  Patching support in hardware

  Fully-automated toolchain –  No “deep” analysis or

transformations required –  C-cores automatically generated from

hot program regions –  Design-time scalable

•  Emphasize Quantity over Quality! •  Simple conversion into HW buys us big

gains, no need for heroic compiler efforts.

D-cache

Host CPU

(general-purpose processor)

I-cache

Hot code

Cold code

"Conservation Cores: Reducing the Energy of Mature Computations," Venkatesh et al., ASPLOS '10

C-core

Page 20: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

C-core Generation

for (i=0; i<N; i++) {! x = A[i];! y = B[i];! C[x] = D[y] + x+y+x*y;!}!

Start with ordinary C code. Irregular or regular is fine. Arbitrary control flow, arbitrary memory access patterns and complex data structures are supported.

Page 21: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

BB1

BB0

BB2

CFG

C-core Generation

Build a CFG; run ordinary compiler optimizations.

Page 22: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

BB1

BB0

BB2

CFG

+

+

*

LD

+ LD LD

+1 <N?

+ + +

ST + Datapath

C-core Generation

Each BB becomes a datapath; each operator turned into HW equivalent.

Memory ops mux’d into L1 cache.

Multiplier and FPUs may or may not be shared.

Page 23: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

BB1

BB0

BB2

CFG

+

+

*

LD

+ LD LD

+1 <N?

+ + +

ST + Datapath

Inter-BB State Machine

C-core Generation

Create a state machine that determines which BB (datapath) is next.

Page 24: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

BB1

BB0

BB2

CFG

+

+

*

LD

+ LD LD

+1 <N?

+ + +

ST + Datapath

Inter-BB State Machine

0.01 mm2 in 45 nm TSMC runs at 1.4 GHz

.V

Synopsys IC Compiler, P&R, CTS

C-core Generation

.V

Via verilog, run through standard

CAD flow.

Page 25: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

C-cores Experimental Data   We automatically built 21 c-cores for 9 "hard"

applications like Spec, Mediabench, etc.

–  45 nm TSMC

–  Vary in size from 0.10 to 0.25 mm2

Application # C-cores

Area (mm2)

bzip2 1 0.18 cjpeg 3 0.18 djpeg 3 0.21 mcf 3 0.17 radix 1 0.10 sat solver 2 0.20 twolf 6 0.25 viterbi 1 0.12 vpr 1 0.24

Page 26: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

D-cache 6% Datapath

3%

Energy Saved 91%

D-cache 6%

Datapath 38%

Reg. File 14%

Fetch/ Decode

19%

I-cache 23%

Typical Energy Savings

RISC baseline 91 pJ/instr.

C-cores 8 pJ/instr.

~11x

Page 27: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

This Talk

The Dark Silicon Problem

How to use Dark Silicon to improve energy efficiency

The GreenDroid Mobile Application Processor

Page 28: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Examing today’s smartphone

  Lots of irregular code (desktop mobile)

  Utilization Wall problem is even worse on mobile! –  Active Power budget is set by

(battery capacity) / (# hrs active use between recharges) rather than thermal design point.

Page 29: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Hardware

Linux Kernel

Libraries Dalvik

Applications

Android™

  Google’s OS + app. environment for mobile devices

  Java applications run on the Dalvik virtual machine

  Apps share a set of libraries (libc, OpenGL, SQLite, etc.)

Page 30: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Applying C-cores to Android   Android is well-suited for c-cores

–  Concentrated set of commonly used applications •  73% of time in top 51 apps •  33% of time just in browser

–  Lots of hot code is inside the Libraries and Dalvik VM;

c-cores that target these parts are reused across many apps.

Hardware

Linux Kernel

Libraries Dalvik

Applications

C-cores

Page 31: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Targeted

Broad-based

  Profiled common Android apps to find the hot spots, including: –  Google: Browser, Gallery, Mail, Maps, Music, Video –  Pandora –  Photoshop Mobile –  Robo Defense game

  Broad-based c-cores –  72% code sharing

  Targeted c-cores –  95% coverage with just

43,000 static instructions (approx. 7 mm2)

Android Workload Profile

Page 32: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

GreenDroid Tiled Architecture   Scalable C-core fabric   16-tiles on a NOC   Each tile contains

–  6-10 Android c-cores (~125 total)

–  32 KB D-cache (shared with CPU)

–  RISC “host” core •  32 bit, in-order,

7-stage pipeline •  16 KB I-cache •  Single-precision FPU

–  On-chip network router

CP

U

L1 L1

L1 L1

CP

U

CP

U

CP

U

CP

U

L1 L1

L1 L1

CP

U

CP

U

CP

U

CP

U

L1 L1

L1 L1 C

PU

CP

U

CP

U

CP

U

L1 L1

L1 L1

CP

U

CP

U

CP

U

Page 33: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

GreenDroid Tile Floorplan

  Norm to 45 nm: –  1.0 mm2 per tile –  1.5 GHz

  25% RISC core, I-cache, and on-chip network

  25% D-cache   50% C-core “fill”

1 mm

1 mm

OCN

D $

CP

U

I $

C C C

C

C

C

C

C

C C

Page 34: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

GreenDroid: Using c-cores to reduce energy in mobile application processors

Android workload

Automatic c-core generator

C-cores Placed-and-routed chip with 9 Android c-cores

"The GreenDroid Mobile Application Processor: An Architecture for Silicon's Dark Future," Goulding-Hotta et al., IEEE Micro Mar./Apr. 2011

Page 35: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

GreenDroid: Projected Energy   GreenDroid c-cores use 11x less energy per instruction

than an aggressive mobile application processor

  Including cold code, GreenDroid will still save ~7.5x energy

Aggressive mobile application processor (45 nm, 1.5 GHz)

GreenDroid c-cores, 45 nm

GreenDroid c-cores + cold code (est.)

91 pJ/instr.

8 pJ/instr.

12 pJ/instr.

Page 36: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Quad-core GreenDroid Prototype

  Four heterogeneous tiles with ~40 C-cores.

  Synopsys IC Compiler   28-nm Global Foundries   1.5-2 GHz   2 mm^2   In verification stages   Multiproject Tapeout w/ UCSC

Page 37: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

D$

CPU I$

ASPLOS ‘10

C-cores; Patching; Util. Wall

HOTCHIPS ‘10

GreenDroid; Dark Silicon

Selective Depipelining; Cachelets

+

HPCA ’11

C-cores for FPGAs

FPL ’11

Selective Depipelining for FPGAs

+

IEEE MICRO ‘11

GreenDroid P&R Tile

FCCM ’11

=

sum

sum i

+

=

sum

sum sum

*

=

sum

sum

alu

mux

sum i

sum += i sum *= sum

sum = alu(sum, mux(sum,i,muxControl), aluControl)

MICRO ’11

Automatic C-core General- ization

Page 38: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

The UCSD GreenDroid Team

greendroid.org

Prof. Taylor Prof. Swanson

Heroic Students

Page 39: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Conclusions

  Dark Silicon is opening up a whole new class of exciting new research areas. (Submit to DaSi!)

  Conservation cores use dark silicon to attack the utilization wall.

  GreenDroid is evaluating the benefits of c-cores for mobile application processors; a 28-nm prototype is underway.

Page 40: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use
Page 41: GreenDroid: An Architecture for the Dark Silicon Agecseweb.ucsd.edu/~mbtaylor/papers/ASPDAC_2012_slides.pdf · 2012. 7. 20. · GreenDroid: Projected Energy GreenDroid c-cores use

Patchability Payoff: Longevity

  Graceful degradation –  Lower initial efficiency –  Much longer useful lifetime

  Can support any changes, through patching –  Arbitrary insertion of code –

software exception mechanism –  Changes to program constants –  Changes to operators