power, cooling, and energy consumption for the petascale and … · 2012-06-28 · 1 power,...

28
1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley National Laboratory Bill Tschudi: Lawrence Berkeley National Laboratory Stephen Elbert: Pacific Northwest National Laboratory Rob Pennington: National Center for Supercomputing Applications Andres Marques: Pacific Northwest National Laboratory Tim McCann: Silicon Graphics Inc. Tahir Cader: ISR Spray Cooling

Upload: others

Post on 19-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

1

Power, Cooling, and Energy Consumptionfor the Petascale and Beyond

John Shalf: Lawrence Berkeley National Laboratory

Bill Tschudi: Lawrence Berkeley National Laboratory

Stephen Elbert: Pacific Northwest National Laboratory

Rob Pennington: National Center for Supercomputing Applications

Andres Marques: Pacific Northwest National Laboratory

Tim McCann: Silicon Graphics Inc.

Tahir Cader: ISR Spray Cooling

Page 2: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

2

Looming Power Crisis

• New Constraints– Power limits clock rates

– Cannot squeeze moreperformance from ILP(complex cores) either!

• But Moore’s Law continues!– What to do with all of those

transistors if everything else isflat-lining?

– Now, #cores per chip doublesevery 18 months instead ofclock frequency!

• The “Free Lunch” is over!

Figure courtesy of Kunle Olukotun, LanceHammond, Herb Sutter, and Burton Smith

Page 3: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

3

Intel Desktop Processor Max Power Consumption, Pentium through P4

0

20

40

60

80

100

120

140

160

19-Sep-91 31-Jan-93 15-Jun-94 28-Oct-95 11-Mar-97 24-Jul-98 6-Dec-99 19-Apr-01 1-Sep-02 14-Jan-04 28-May-05

Date of Introduction

Po

wer

(W

)

0.090.130.180.250.280.350.50.8

Source:sandpile.org

Microprocessors: Up Against the Wall(s)

• Microprocessors are hitting apower wall

– Higher clock rates and greaterleakage increasing powerconsumption

• Reaching the limits of whatnon-heroic heat solutions canhandle

• Newer technology becomingmore difficult to produce,removing the previous trend of“free” power improvement

From Joe Gebis

Page 4: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

4

New Design Constraint: POWER

• Transistors still getting smaller– Moore’s Law is alive and well

• But Dennard scaling is dead!

– No power efficiency improvements with smaller transistors

– No clock frequency scaling with smaller transistors

– All “magical improvement of silicon goodness” has ended

• Traditional methods for extracting more performance are well-mined

– Cannot expect exotic architectures to save us from the “powerwall”

– Even resources of DARPA can only accelerate existingresearch prototypes (not “magic” new technology)!

Page 5: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

ORNL Computing Power and Cooling 2006 - 2011

• Immediate need to add 8 MW toprepare for 2007 installs of newsystems

• NLCF petascale system could requirean additional 10 MW by 2008

• Need total of 40-50 MW for projectedsystems by 2011

• Numbers just for computers: add 75%for cooling

• Cooling will require 12,000 – 15,000tons of chiller capacity 0

10

20

30

40

50

60

70

80

90

Po

wer

(M

W)

2005 2006 2007 2008 2009 2010 2011

Year

Computer Center Power Projections

Cooling

Computers

Cost estimates based on $0.05 kW/hr

$3M

$17M

$9M

$23M

$31M

OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY

Site FY 2005 FY 2006 FY 2007 FY 2008 FY 2009 FY 2010LBNL 43.70 50.23 53.43 57.51 58.20 56.40 *ANL 44.92 53.01ORNL 46.34 51.33PNNL 49.82 N/A

Annual Average Electrical Power Rates $/MWh

Data taken from Energy Management System-4 (EMS4). EMS4 is the DOE corporatesystem for collecting energy information from the sites. EMS4 is a web-basedsystem that collects energy consumption and cost information for all energysources used at each DOE site. Information is entered into EMS4 by the site andreviewed at Headquarters for accuracy.

Yikes!

Page 6: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

6

Power Consumption by Top500 Systems

Growth in Power Consumption (Top50)

0.00

500.00

1000.00

1500.00

2000.00

2500.00

3000.00

3500.00

4000.00

Jun-00

Sep-00

Dec-00

Mar-01

Jun-01

Sep-01

Dec-01

Mar-02

Jun-02

Sep-02

Dec-02

Mar-03

Jun-03

Sep-03

Dec-03

Mar-04

Jun-04

Sep-04

Dec-04

Mar-05

Jun-05

Sep-05

Dec-05

Mar-06

Jun-06

System

Po

wer (

kW

)

max pwr

avg pwr

Growth in Power Consumption (Top50)Excluding Cooling

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

Jun-00

Sep-00

Dec-00

Mar-01

Jun-01

Sep-01

Dec-01

Mar-02

Jun-02

Sep-02

Dec-02

Mar-03

Jun-03

Sep-03

Dec-03

Mar-04

Jun-04

Sep-04

Dec-04

Mar-05

Jun-05

Sep-05

Dec-05

Mar-06

Jun-06

System

Po

wer (

kW

)

Avg. Power Top50 from Top500 List

Page 7: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

7

Other Estimates of Power Requirements

• Baltimore Sun Article (Jan 23, 2007): NSA drawing 65-75 MW in Maryland

– Crisis: Baltimore Gas & Electric does not have sufficient power for city of Baltimore!

– expected to increase by 10-15 MW next year!

• LBNL IJHPCA Study for ~1/5 Exaflop for Climate Science in 2008

– Extrapolation of Blue Gene and AMD design trends

– Estimate: 20 MW for BG and 179 MW for AMD

• DOE E3 Report

– Extrapolation of existing design trends to exascale in 2016

– Estimate: 130 MW

• DARPA Study

– More detailed assessment of component technologies

– Estimate: 20 MW just for memory alone, 60 MW aggregate extrapolated from

current design trends

The current approach is not sustainable!

Page 8: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

8

Power is an Industry Wide Problem

“Hiding in Plain Sight, Google Seeks More Power”, by John Markoff, June 14, 2006

New Google Plant in The Dulles, Oregon, from NYT, June 14, 2006

Page 9: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

9

How Big is the Problem?

• Estimated Computing PowerConsumption

– 200 TWh/year

– $16 billion/year• Based on .08$/KWh, closer to $.10 now

(2005)

– Nearly 150 million tonsof CO2 per year

• Roughly equivalent to 30 millioncars!

One central baseloadpower plant(about 7 TWh/yr)

Numbers representU.S. only

Page 10: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

10

Cost of Power Will Dominate, and UltimatelyLimit Practical Scale of Future Systems

Source: Luiz André Barroso, (Google) “The Price of Performance,” ACM Queue, Vol. 2, No. 7, pp. 48-53, September 2005.(Modified with permission.)

UnrestrainedIT powerconsumptioncould eclipsehardwarecosts and putgreatpressure onaffordability,data centerinfrastructure,and theenvironment.

Page 11: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

11

Power Efficiency BoF

• Chip Architecture Trends for Power EfficientComputing

• Review Facility Design features for improvedpower and cooling efficiency

• Discuss cooling technology for future HPCsystem designs and its impact on facility design

• System architecture features to save power• Discuss emerging energy efficiency standards

and groups– ASHRAE– Green Grid– Green500

Page 12: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

12

More Information• All of the BoF Talks Online at

– http://esdc.pnl.gov

• More Information on Power Efficient Datacenters:– http://hightech.lbl.gov/datacenters

• Computer Architecture– http://view.eecs.berkeley.edu/– http://www.nersc.gov/projects/SDSA/reports

• Information / Metrics / Standards Bodies– http://www.ashrae.org/– http://www.thegreengrid.org/– http://www.green500.org/– http://www.80plus.org/– http://www.climatesaverscomputing.org/

Page 13: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

13

Some Short Remarks on Computer ArchitectureTrends

Page 14: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

14

What is Happening Now?

• Moore’s Law– Silicon lithography will improve by 2x every 18

months– Double the number of transistors per chip

every 18mo.

• CMOS PowerTotal Power = V2 * f * C + V * Ileakage active power passive power

– As we reduce feature size Capacitance ( C )decreases proportionally to transistor size

– Enables increase of clock frequency ( f )proportionally to Moore’s law lithographyimprovements, with same power use

– This is called “Fixed Voltage Clock FrequencyScaling” (Borkar `99)

• Since ~90nm– V2 * f * C ~= V * Ileakage

– Can no longer take advantage of frequencyscaling because passive power (V * Ileakage )dominates

– Result is recent clock-frequency stall reflectedin Patterson Graph at right

SPEC_Int benchmark performance since1978 from Patterson & Hennessy Vol 4.

Page 15: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

15

What is Happening Now?

• Moore’s Law– Silicon lithography will improve by 2x every 18

months– Double the number of transistors per chip

every 18mo.

• CMOS PowerTotal Power = V2 * f * C + V * Ileakage active power passive power

– As we reduce feature size Capacitance ( C )decreases proportionally to transistor size

– Enables increase of clock frequency ( f )proportionally to Moore’s law lithographyimprovements, with same power use

– This is called “Fixed Voltage Clock FrequencyScaling” (Borkar `99)

• Since ~90nm– V2 * f * C ~= V * Ileakage

– Can no longer take advantage of frequencyscaling because passive power (V * Ileakage )dominates

– Result is recent clock-frequency stall reflectedin Patterson Graph at right

SPEC_Int benchmark performance since1978 from Patterson & Hennessy Vol 4.

We are here!We are here!

Page 16: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

16

Multicore vs. Manycore

• Multicore: current trajectory– Stay with current fastest core design– Replicate every 18 months (2, 4, 8 . . . Etc…)– Advantage: Do not alienate serial workload– Example: AMD X2 (2 core), Intel Core2 Duo (2 cores), Madison (2 cores), AMD

Barcelona (4 cores)

• Manycore: converging in this direction– Simplify cores (shorter pipelines, lower clock frequencies, in-order processing)– Start at 100s of cores and replicate every 18 months– Advantage: easier verification, defect tolerance, highest compute/surface-area, best

power efficiency– Examples: Cell SPE (8 cores), Nvidia G80 (128 cores), Intel Polaris (80 cores),

Cisco/Tensilica Metro (188 cores)

• Convergence: Ultimately toward Manycore– Manycore if we can figure out how to program it!– Hedge: Heterogenous Multicore

Page 17: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

17

How Small is “Small”• Power5 (Server)

– 389mm^2– 120W@1900MHz

• Intel Core2 sc (laptop)– 130mm^2– 15W@1000MHz

• ARM Cortex A8 (automobiles)– 5mm^2– 0.8W@800MHz

• Tensilica DP (cell phones / printers)– 0.8mm^2– 0.09W@600MHz

• Tensilica Xtensa (Cisco router)– 0.32mm^2 for 3!– 0.05W@600MHz

Intel Core2

ARM

TensilicaDPXtensa x 3

Power 5

Each core operates at 1/3 to 1/10th efficiency of largest chip, but you can pack 100x more cores onto a chip and consume 1/20 the power

Page 18: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

18

From Doug CarmeanIntel Inc.

Page 19: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

19

Convergence of Platforms– Multiple parallel general-purpose processors (GPPs)– Multiple application-specific processors (ASPs)

“The Processor isthe new Transistor”

[Rowen]

Intel 4004 (1971):4-bit processor,2312 transistors,

~100 KIPS,10 micron PMOS,

11 mm2 chip

1000s ofprocessorcores per

die

Sun Niagara8 GPP cores (32 threads)

Intel®XScale

™ Core32K IC32K DC

MEv2

10

MEv2

11

MEv2

12

MEv2

15

MEv2

14

MEv2

13

Rbuf64 @128B

Tbuf64 @128BHash

48/64/128Scratch

16KB

QDRSRAM

2

QDRSRAM

1

RDRAM1

RDRAM3

RDRAM2

GASKET

PCI

(64b)66

MHz

IXP280IXP28000 16b16b

16b16b

1188

1188

1188

1188

1188

1188

1188

64b64b

SPI4orCSIX

Stripe

E/D Q E/D Q

QDRSRAM

3E/D Q1188

1188

MEv29

MEv2

16

MEv22

MEv23

MEv24

MEv27

MEv26

MEv25

MEv21

MEv28

CSRs-Fast_wr

-UART-Timers

-GPIO-BootROM/SlowPort

QDRSRAM

4E/D Q1188

1188

Intel Network Processor1 GPP Core

16 ASPs (128 threads)

IBM Cell1 GPP (2 threads)

8 ASPs

Picochip DSP1 GPP core248 ASPs

Cisco CRS-1188 Tensilica GPPs

Page 20: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

20

Power Wall Drives Concurrency Increases

Total # of Processors in Top15

0

50000

100000

150000

200000

250000

300000

350000Jun-93

Dec-93

Jun-94

Dec-94

Jun-95

Dec-95

Jun-96

Dec-96

Jun-97

Dec-97

Jun-98

Dec-98

Jun-99

Dec-99

Jun-00

Dec-00

Jun-01

Dec-01

Jun-02

Dec-02

Jun-03

Dec-03

Jun-04

Dec-04

Jun-05

Dec-05

Jun-06

List

Proc

esso

rs

Must ride exponential wave of increasing concurrency for forseeable future!You will hit 1M cores sooner than you think!

Page 21: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

21

Tension between concurrency and power efficiency

• Highly concurrent systems can be more power efficient– Dynamic power is proportional to V2fC

– Build systems with even higher concurrency?

• However, many algorithms are unable to exploit massiveconcurrency yet– If higher concurrency cannot deliver faster time to solution, then

power efficiency benefit wasted

– So we should build fewer/faster processors?

Page 22: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

22

Path to Power EfficiencyReducing Waste in Computing

• Examine methodology of low-power embedded computing market– optimized for low power, low cost, and high computational

efficiency

“Years of research in low-power embedded computing haveshown only one design technique to reduce power: reducewaste.”

Mark Horowitz, Stanford University & Rambus Inc.

• Sources of Waste– Wasted transistors (surface area)– Wasted computation (useless work/speculation/stalls)– Wasted bandwidth (data movement)– Designing for serial performance

Page 23: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

23

Designing for Efficiency isApplication Class Specific

Page 24: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

24

Consumer Electronics ConvergenceFrom: Tsugio Makimoto

Page 25: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

25

Consumer Electronics has Replaced PCs as the DominantMarket Force in CPU Design!!

AppleIntroduces

IPod

IPod+ITunesexceeds 50% of

Apple’s Net Profit

Apple IntroducesCell Phone

(iPhone)

From: Tsugio Makimoto

Page 26: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

26

BG/L—the Rise of the Embedded Processor?TOP 500 Performance by Architecture

1

10

100

1000

10000

100000

1000000

10000000

06/1993

MPPSMPClusterConstellationsSingle ProcessorSIMDOthersMPP embedded

06/1994

06/1995

06/1996

06/1997

06/1998

06/1999

06/2000

06/2001

06/2002

06/2003

06/2004

06/2005

Ag

gre

gat

e R

max

(Tfl

op

/s)

Page 27: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

27

Questions

• Is Multicore really the answer? (sounds boring)– FPGAs? Quantum computing?

– What else might be waiting in the wings

• What about advances in circuit fabrication?– SOI, Hafnium doping,

• What about memory?– Its starting to consume more memory than CPU

cores!

– Packaging changes (3D Stacking? OpticalInterfaces?)

Page 28: Power, Cooling, and Energy Consumption for the Petascale and … · 2012-06-28 · 1 Power, Cooling, and Energy Consumption for the Petascale and Beyond John Shalf: Lawrence Berkeley

28

Next UpDesigning Facilities for Power Efficiency