specpower ssj2008** characterizationfebruary 7, 2008 spec workshop january 2008 slide 2 agenda...

26
SPECpower_ssj2008** Characterization Anil Kumar, Larry Gray and Harry Li Intel ® Corporation * Other names and brands may be claimed as the property of others ** SPEC and the benchmark names are trademarks of the Standard Performance Evaluation Corporation Performance Data as of 30 January 2008. SPEC Workshop – January 27, 2008

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

SPECpower_ssj2008** CharacterizationAnil Kumar, Larry Gray and Harry Li

Intel® Corporation

* Other names and brands may be claimed as the property of others** SPEC and the benchmark names are trademarks of the Standard Performance Evaluation CorporationPerformance Data as of 30 January 2008.

SPEC Workshop – January 27, 2008

Page 2: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 2

Agenda

SPECpower_ssj2008 quick overview

SPECpower_ssj2008 initial characterization□ System resources utilization□ Impact of JVM Optimizations □ Frequency scaling□ Processor scaling□ Platform generation scaling

General observations

Summary

Page 3: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 3

SPECpower_ssj2008

Quick overview

Page 4: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 4

9.9%20.0%30.6%40.1%49.9%60.2%69.7%79.6%90.1%99.8%Actual Average Per Cent of Calibrated Peak Throughput

Seeks Peak Throughput, Runs and Reports a “Load Line”

SPECpower* - A “Graduated” Workload

First: A Calibration Phase: Run to Peak Transaction Throughput□ # warehouses or threads = # cores, scheduling is “ungated”

Next: Load Levels: Gradations Based on Calibrated ThroughputAverage of last two calibration levels = peak calibrated throughputExample Below is x10 or 10% increments – the benchmark

SPECpower_ssj2008 - TransactionsWoodcrest 3.0, 4x 1GB, 1x HDD, Pwr Mgmt On

0

20

40

60

80

100

120

140

160

1 201 401 601 801 1001 1201 1401 1601 1801 2001 2201 2401 2601 2801 3001 3201 3401 3601 3801 4001 4201

Thou

sand

s

ssj o

ps p

er s

econ

d

SPECpower_ssj2008 – Transactions Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On~ Peak

Throughput

~40%~20%~100% Active

Idle

calibration levels

~80%

SPECpower_ssj2008 - Power and Processor %Woodcrest 3.0, 4x 1GB, 1x HDD, Pwr Mgmt On

04080

120160200240280320360

1 201 401 601 801 1001 1201 1401 1601 1801 2001 2201 2401 2601 2801 3001 3201 3401 3601 3801 4001 4201seconds

wat

ts

0102030405060708090100110

CPU

%

w atts PM On CPU % PM On

SPECpower_ssj2008 – Power and Processor % Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On

buildslide

Page 5: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 5

Exit

SSJ@

10%

SSJ@

80%

SSJ@

20%

Activ

e id

le

SSJ_

2008

Initi

aliz

atio

n

SSJ_

2008

Rep

orte

r

Calibrations Graduated Load LevelsActive

Idle

SSJ@

90%

SSJ@

70%

SSJ@

100%

Cal

ibra

tion

n

Cal

ibra

tion

2

Cal

ibra

tion

1

Controlling Measurements

Each load level has a “measurement interval” of 240 seconds, plus, □ “inter-level”

(delay between load levels), □ ramp up

(pre-measurement) □ ramp down

(post-measurement)

Enables synchronization,power with performance,data captureProvides settle timeRequired for Consistent, Repeatable Measurements

pre-

mea

sure

men

t

load level

240seconds

30secs

oper

atio

ns p

er s

econ

d

time

post

-mea

sure

men

t

dela

y be

twee

n lo

ad le

vel

10secs

dela

y be

twee

n lo

ad le

vel

measurementinterval

“go” “stop”power measurement

30secs

10secs

not to scale

Page 6: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 6

SPECjbb2005 vs. SSJ_OPS@100%

SSJ_2008 derived from SPECjbb2005 - But different!Base code and transaction types are from SPECjbb2005

Substantive changes!

The two are not comparable:Notable Differences□ Different transaction mix□ Transaction scheduling and timing□ Modified throughput accounting □ Data collection via network – TCP/IP□ More logging increases disk I/O□ Plus others

Page 7: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 7

468∑ssj_ops / ∑power =01980Active Idle

11020622,64910.20%10%20721344,15719.90%20%30222166,87530.10%30%39022989,38840.20%40%465237110,22249.60%50%541245132,52559.60%60%616254156,34470.30%70%677261176,68479.50%80%746269200,86090.40%90%799276220,30699.10%100%

Average Power

(W)ssj opsActual

LoadTarget Load

Performance to Power

Ratio

PowerPerformance

SPECpower_ssj2008 - Metric Definition

(includes power at the active idle state)

SPECpower_ssj2008 Intel publication #017http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071129-00017.html

overall ssj_ops/watt = ∑ ssj_ops @ 11 pts / ∑ avg watts @11 pts

The Primary Metric for SPECpower_ssj2008:

Much more data in SPEC report !

Table from SPECpower_ssj2008 Full Disclosure Report

performance / power each level

average powereach level

ssj_ops@100%

ssj_ops each level

overall ssj_ops/watt

Page 8: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 8

Initial characterization of SPECpower_ssj2008

Page 9: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 9

Hardware and Software

SUT: Intel® “White Box”□ Dual and Quad Core Intel® Xeon® 2.0 & 3.0 GHz□ Supermicro* X7DB8/ Main Board, Super Micro 5000P (Blackford chipset)□ 4x 2GB FBDIMMs□ 1x 700W PSU□ 5U Tower Platform

Microsoft* Windows Server 2003 64 bit□ Power Options: Server Balanced Processor Power and Performance

JVM: BEA* JRockit* P27.4.0 64 bit□ JVM Command Line similar to published results

Sampling Rates: □ Power: 1 second (average from meter)

SPECpower_ssj2008 setup□ SSJ Director on SUT □ load levels 120 seconds

Page 10: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 10

Collecting OS Counters

Intel Written Daemon, “OSctrD.exe”□ Counters defined in ccs.props

Daemon runs on SUT □ Data to CCS via TCP/IP□ Can run on CCS□ CCS logs counters along with

watts, trans, etc.

Windows Only□ Linux port under consideration

“Integrated” Log□ Primary advantage

“Any” OS“Any” OS

ssj_2008instance(s)

ssj_2008director

PowerAnalyzer

Linux, Solaris*,Windows*

Linux, Solaris*,Windows*

AC PowerSource

Control & Collect

AC Power

CCSControl & Collection System

SUTSystem Under Test

PTD

TemperatureSensor

PTD

IntelDaemon

OSctrD

ccs-log.csv

“Any” OS“Any” OS

ssj_2008instance(s)ssj_2008instance(s)

ssj_2008director

PowerAnalyzer

Linux, Solaris*,Windows*

Linux, Solaris*,Windows*

AC PowerSource

Control & Collect

AC Power

CCSControl & Collection System

SUTSystem Under Test

PTDPTD

TemperatureSensor

PTDPTD

IntelDaemon

OSctrD

ccs-log.csv

. . .

. . . PF Temp Processor %............

% C1 timeampsWattsRTxactionsTime

Page 11: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 11

SSJ_2008 Memory Usage

Code footprint:□ ~1.5M (total of all methods JIT’ed and optimized)Data footprint:□ ~50MB per warehouse “database” size□ ~8KB of transient objects per transaction

JVMs□ 32 bit JVM - Max. 4GB heap□ 64 bit JVM - much larger heap (max. 264 Bytes)□ Multiple instances can/will increase memory footprint

Optimal memory size is throughput capacity dependent□ Platform and configuration specific

Example: Quad-Core Intel Xeon based Dual Processor system□ ~8GB optimal for SPECpower_ssj2008

All above specific to BEA JRockit JVM

Page 12: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 12

Transactions (SSJ OPS)

CPU % tracks load□ As expected on Intel Core 2 architecture

Other architectures will vary (SMT etc.)

Load level targets are % of SSJ_OPS@calibratedCPU utilization is no part of the benchmark

Transactions and Processor Utilization

0

40000

80000

120000

160000

200000

240000

152 352 552 752 952 1152 1352 1552 1752 1952 2152

seconds

ssj o

ps

0102030405060708090100110

Perc

ent

avg txs % CPU

Page 13: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 13

Power and Processor Utilization

Average SSJ OPS per level tracking as expected□ Throughput per sec showing desired variability within load level

Negative Exponential inter-arrival time batch scheduling

Power consumption varies with load

Power and Processor Utilization

0

50

100150

200

250

300350

400

450

152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds

wat

ts

0

20

40

60

80

100

120

Per

cent

watts % CPU

Page 14: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 14

All Three (SSJ OPS, CPU% and Power)

At all load levels including active idle:□ All three,

SSJ OPS, CPU % utilization and Average Watts

Three on One Chart – interesting capability

Transactions and Processor Utilization

0

40000

80000

120000

160000

200000

240000

152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds

ssj o

ps

0

50

100

150

200

250

300

350

400

450

Perc

ent

avg txs % CPU watts

Page 15: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 15

Memory Utilization

Java heap size remains same throughout the run □ With typical tuning; -Xmx==-Xms

Constant committed memory in use at all load levels□ including active idle – JVM(s) still active

Memory usage profile will change with heap settings – especially if “dynamic”

memory consumption

0102030405060708090

100110

150 350 550 750 950 1150 1350 1550 1750 1950 2150seconds

% p

roce

ssor

util

izat

ion

Total % Processor Time mem % Committed Bytes in Use

Page 16: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 16

Network I/O

~1.5 K Bytes/sec of network I/O at all load levels□ including active idle:

Network I/O from per sec request/response□ between Control & Collect (CCS) and SSJ_2008 Director

Network I/O

0

20

40

60

80

100

120

1 201 401 601 801 1001 1201 1401 1601 1801 2001seconds

% p

roce

ssor

util

izat

ion

0

500

1000

1500

2000

2500

3000

3500

4000

Byt

es p

er S

ec

CPU % NIC Bytes Total/sec

Page 17: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 17

Disk I/O

Disk I/O – Regular bursts of ~140K Byte writes, □ ~3.3K Bytes/sec average for all load levels□ Most disk writes related to SSJ_2008 logging

Disk reads average zeroPhysical Disk I/O

0

10

20

30

40

5060

70

80

90

100

110

152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds

% p

roce

ssor

util

izat

ion

100

20,100

40,100

60,100

80,100

100,100

120,100

140,100

160,100

180,100

Byte

s pe

r Sec

% Processor Time Disk Write Bytes/sec

Page 18: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 18

C1 state

% Time in C1 State – Inverse of CPU %□ C1/C1E Time contributes to power saving

Varies with architecture, OS and policies□ Intel EIST and C1E “enabled” in BIOS

C1 state

020000400006000080000

100000120000140000160000180000200000220000240000260000

152 352 552 752 952 1152 1352 1552 1752 1952 21520102030405060708090100110

Perc

ent

avg txs % CPU total % C1 time

Page 19: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 19

Basic system events

Interrupts: ~700 /sec at all load levelsContext switches ~800 /sec□ Below 50% declining to ~400 at active idle

Rates are OS and platform dependentMore Investigation Needed Here

Context Switches and Interrrupts

0102030405060708090

100110

152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds

% p

roce

ssor

util

izat

ion

0

200

400

600

800

1,000

1,200

1,400

per s

econ

d

% Processor Time Context Switches/sec Interrupts/sec

Page 20: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 20

Impact of JVM Optimizations

Experiment with JVM Options□ JAVAOPTIONS_SSJ=“ “ (None, default heap and optimization) □ JAVAOPTIONS_SSJ=“-Xms3000m -Xmx3000m -Xns2400m -XXaggressive -

XXlargePages -XXthroughputCompaction -XXcallprofiling -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=12k,preferred=1024k”

PerformanceLoss ~50%Power Less by0 to 3% less

Your Resultsdependent onJVM and options

Comparing JVM with and without 'options'

0

50,000

100,000

150,000

200,000

250,000

300,000

cal 1 cal 2 cal 3 cal T 100 90 80 70 60 50 40 30 20 10 0

load level

ssj_

ops

200

250

300

350

400

450

wat

ts

NoOpt avg-txs All avg-txs NoOpt Watts All Watts

Page 21: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 21

Processor scaling

Dual Core Intel Xeon Quad Core Intel Xeon: (2.0GHz / 4MBL2) (2.0GHz 2x4MBL2)

SSJ_OPS@100% increased by ~77% Similar power@100%Overall SSJ_OPS/Watt improved by ~73%73%Overall SSJ_OPS/Watt

1%Power@100%

77%SSJ_OPS@100%

%increase

Dual to Quad Core(Intel Xeon)

Comparing Dual Core and Quad Core Processors

0100

200

300

400

500

600700

800900

0 50,000 100,000 150,000 200,000 250,000ssj_ops

ssj_

ops/

wat

tDual Core Intel Xeon Quad Core Intel Xeon

Page 22: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 22

Frequency scaling

2.0 to 3.0 GHz Quad Core Intel Xeon (2x6MBL2):

Frequency increase of 50%□ SSJ_OPS@100% increases by ~24% □ Power@100% increased by ~10%□ Overall SSJ_OPS/Watt improved by ~16%

16%Overall SSJ_OPS/Watt

10%Power@100%

24%SSJ_OPS@100%

50%2GHz-->3GHz

% increaseQuad Core Intel Xeon

Comparing Frequency on Similar Processors

0100200300400500600700800900

1000

0 50,000 100,000 150,000 200,000 250,000 300,000ssj_ops

ssj_

ops/

wat

tQuad Core Intel Xeon 2.0 GHz Quad Core Intel Xeon 3.0 GHz

Page 23: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 23

Platform generation scaling

Quad Core Intel Xeon 3.0GHz versus Single Core Intel Xeon 3.6GHz:□ SSJ_OPS@100% is ~7.5x, Power@100% less by ~20% □ Overall SSJ_OPS/Watt is ~8.0x

702%-20%654%Improvement698269308,022Quad Core Intel Xeon E5450, 3.0GHz2008338291163,768Dual Core Intel Xeon 5160, 3.0GHz200687.433640,852Single Core Intel Xeon, 3.6GHz 2004

ProcessorAnnouncedOverall

ssj_ops/wattPower

watts@100%Performance

ssj_ops@100%

All data as of 30 Jan, 2008 from SPEC published results at: http://www.spec.org/power_ssj2008/results/power_ssj2008.html

Power and Performance Scaling Across Generations

0

200

400

600

800

1000

1200

1400

0 40,000 80,000 120,000 160,000 200,000 240,000 280,000 320,000 360,000

ssj_ops

ssj_

ops/

wat

t

Single Core Intel Xeon, 3.6GHz Dual Core Intel Xeon 5160, 3.0GHzQuad Core Intel Xeon E5450, 3.0GHz

Page 24: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 24

General Observations

CPU Utilization follows the load line (architecture dependent)

% Time in C1 State – Inverse of CPU %□ C1 Transitions per second highest at idle

Memory % Committed – constant across load lineDisk I/O – Regular bursts of ~140K byte writes, □ ~3.3K bytes/sec for all load levels

Network I/O - ~1.5K Bytes/sec, ~constant across load lineBasic system events require more investigationBenchmark metric and other data do effectively show scaling with frequency, cores and across platform generations

Page 25: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

February 7, 2008 SPEC Workshop January 2008 slide 25

Summary

Results are specific to the platform and OS measured, etcSPEC FDR contains unprecedented amount of dataSome system resources track graduated loadsBenchmark metric and load level data fairly reflect configuration and OS settings changesNext Steps□ We are just getting started.□ First look, more refinements required

More measurements planned for in-depth characterization

Page 26: SPECpower ssj2008** CharacterizationFebruary 7, 2008 SPEC Workshop January 2008 slide 2 Agenda SPECpower_ssj2008 quick overview SPECpower_ssj2008 initial characterization System resources

END