karu sankaralingam university of wisconsin-madison collaborators: hadi esmaeilzadeh, emily blem,...

27
Karu Sankaralingam University of Wisconsin- Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications for Microprocessors

Upload: calvin-houston

Post on 05-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Karu SankaralingamUniversity of Wisconsin-Madison

Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee

St. Amant, and Doug Burger

The Dark Silicon Implications for Microprocessors

Page 2: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Decade?

2

We have relied on multicore scaling for over five years.

How much longer will it be our primary performance scaling technique?

2000 2005 2010 2015

Pentium Extreme

Dual-Core

Core 2 Quad-Core

i7 980x Hex-Core

?

Page 3: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Finding Optimal Multicore Designs

3

Comprehensive design space:Fixed area budgetFixed power budgetTwo sets of CMOS scaling projectionsOptimal core and diverse multicore organizationsParallel benchmarks

For next 5 technology generations, we find the best performing multicore from a

comprehensive design space search for each of the PARSEC benchmarks

Page 4: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Symmetric Multicore Projections

4

Symmetric multicores alone will not sustain the multicore era.

0 2 4 6 8 100

4

8

12

16

20

Target

Symmetric

Year

Sp

eed

up

3.4x in 10 years

18x

Page 5: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Solutions

5

Asymmetric Topologies

0 2 4 6 8 100

4

8

12

16

20Target

Sym-metric

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20TargetSymmetric

Year

Sp

eed

up

3.5x

Page 6: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Solutions

6

Dynamic Topologies

[Chakraborty (2008), Suleman et al (2009)]

0 2 4 6 8 100

4

8

12

16

20TargetSym-metricAsym-metric

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20Target

Sym-metric

Year

Sp

eed

up

3.5x

Page 7: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Solutions

7

Composed/Fused Topologies

[Ipek et al (2007), Kim et al (2007)]

0 2 4 6 8 100

4

8

12

16

20TargetSym-metricAsym-metricDynamic

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20TargetSym-metricAsym-metric

Year

Sp

eed

up

3.7x

Page 8: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Solutions

8

GPU-Style Cores

0 2 4 6 8 100

4

8

12

16

20Target

Sym-metric

Asym-metric

Dynamic

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20TargetSym-metricAsym-metricDynamic

Year

Sp

eed

up

2.7x

Page 9: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Era Projections

9

The best designs speed up 14% per year rather than the recent trend of 34% per year

0 2 4 6 8 100

4

8

12

16

20

Target

Composed

Year

Sp

eed

up

Composed

3.7x

18x

Page 10: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Why Diminishing Returns?

10

Transistor area is still scalingVoltage and capacitance scaling have

slowedResult: designs are power, not area,

limited

Page 11: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Overview

11

Page 12: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Conservative Optimistic

Area 32x 32x

Power 4.5x 8.3x

Frequency 1.3x 3.9x

[Borkar 2007]

[ITRS 2010]

Device Scaling Projections

12

From 45 nm to 8 nm:

Page 13: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30Intel Nehalem AMD Shanghai Intel Core Intel Atom

SPECmark Score

Pow

er

(TD

P,

Watt

s)

Modeling Ideal Core Power/Perf.

13

Atom

Nehalem

0 5 10 15 20 25 300

5

10

15

20

25

30Intel Nehalem AMD Shanghai Intel Core Intel Atom

SPECmark Score

Pow

er

(TD

P,

Watt

s)

Pareto Frontier includes all optimal power/performance points

Repeat using core area for optimal area/performance points

Page 14: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

SPECmark Score

Pow

er

(TD

P,

Watt

s)

Combining Device and Core Models

14

45 nm Frontie

r

32 nm Frontie

r

Device Scaling

Page 15: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Overview

15

Page 16: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

What belongs in multicore model?

16

Styles

Applications

Topologies

Area & Power / Performance Tradeoffs

Architectures

PARSEC fparallel,Data Use

Number of Threads,

Cache Sizes

Area & Power Budget

Cache & memory latencies, memory

bandwidth

Pareto Frontiers

Page 17: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Speedup Model

17

MulticoreSpeedup

11-fparallel

Serial Speedup

fparallel

Parallel Speedup

=+

Page 18: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Performance Model

18

Performance is limited by:

and

Memory bandwidthBWmax / (instructions per byte from

memory)

ComputationNcores (core frequency/CPIexe) core

utilization

[Guz et al, 2009]

Page 19: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Core Utilization Model

19

Core utilization is limited by:

Fraction of Time Core is Ready to IssueNumber of Threads in Core / Number of Threads to

Keep Busy

[Guz et al, 2009]

Page 20: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Multicore Model & Pareto Frontiers

20

0 5 10 15 20 25 30 35 400

5

10

15

20

25

30

100 points

A(q),

P(q)

q

Page 21: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Translating from SPECmark

21

1. From q, find core’s SPECmark speedup

2. Frequency linearly distributed from Atom to Nehalem

3. Recall: model predicts benchmark performance as f(benchmark chars, frequency, CPIexe)

4. Compute CPIexe such that

Benchmark Speedup = SPECmark Speedup

Page 22: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Area and Power Constraints

22

Ncores x A(q) ≤ Area Budget

Ncores x P(q) ≤ Power Budget

Dark silicon = Ncores / # of cores that fit in chip area

Page 23: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Overview

23

Page 24: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

Dark Silicon

24

blacksholes bodytrack canneal ferret streamcluster GM0%

20%

40%

60%

80%

100%ITRS Conservative

Perc

en

tag

e D

ark

Silic

on

blacksholes bodytrack canneal ferret streamcluster GM0%

20%

40%

60%

80%

100%ITRS Conservative

Perc

en

tag

e D

ark

Silic

on

Sources of Dark Silicon:Power + Limited Parallelism

At 22 nm:

At 8 nm:

17%

26%

51%

71%

Page 25: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

0 2 4 6 8 100

4

8

12

16

20

ITRS: All TopologiesITRS: Symmetric

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20Conservative: Symmet...

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20Conservative: All Topologies

Year

Sp

eed

up

0 2 4 6 8 100

4

8

12

16

20

ITRS: All TopologiesITRS: Symmetric

Year

Sp

eed

up

Overall Performance

25

Target

fparallel = 0.99

18x16x

8x6x3x

Page 26: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

ConclusionsMulticore performance gains are

limited

Need at least 18%-40% per generation from architecture alone without

additional power26

Unicore Era

Multicore Era ?

Page 27: Karu Sankaralingam University of Wisconsin-Madison Collaborators: Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, and Doug Burger The Dark Silicon Implications

27

Specialization

Shrinking chips

Pervasive

Efficiency