specpower ssj2008** characterizationfebruary 7, 2008 spec workshop january 2008 slide 2 agenda...
TRANSCRIPT
SPECpower_ssj2008** CharacterizationAnil Kumar, Larry Gray and Harry Li
Intel® Corporation
* Other names and brands may be claimed as the property of others** SPEC and the benchmark names are trademarks of the Standard Performance Evaluation CorporationPerformance Data as of 30 January 2008.
SPEC Workshop – January 27, 2008
February 7, 2008 SPEC Workshop January 2008 slide 2
Agenda
SPECpower_ssj2008 quick overview
SPECpower_ssj2008 initial characterization□ System resources utilization□ Impact of JVM Optimizations □ Frequency scaling□ Processor scaling□ Platform generation scaling
General observations
Summary
February 7, 2008 SPEC Workshop January 2008 slide 3
SPECpower_ssj2008
Quick overview
February 7, 2008 SPEC Workshop January 2008 slide 4
9.9%20.0%30.6%40.1%49.9%60.2%69.7%79.6%90.1%99.8%Actual Average Per Cent of Calibrated Peak Throughput
Seeks Peak Throughput, Runs and Reports a “Load Line”
SPECpower* - A “Graduated” Workload
First: A Calibration Phase: Run to Peak Transaction Throughput□ # warehouses or threads = # cores, scheduling is “ungated”
Next: Load Levels: Gradations Based on Calibrated ThroughputAverage of last two calibration levels = peak calibrated throughputExample Below is x10 or 10% increments – the benchmark
SPECpower_ssj2008 - TransactionsWoodcrest 3.0, 4x 1GB, 1x HDD, Pwr Mgmt On
0
20
40
60
80
100
120
140
160
1 201 401 601 801 1001 1201 1401 1601 1801 2001 2201 2401 2601 2801 3001 3201 3401 3601 3801 4001 4201
Thou
sand
s
ssj o
ps p
er s
econ
d
SPECpower_ssj2008 – Transactions Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On~ Peak
Throughput
~40%~20%~100% Active
Idle
calibration levels
~80%
SPECpower_ssj2008 - Power and Processor %Woodcrest 3.0, 4x 1GB, 1x HDD, Pwr Mgmt On
04080
120160200240280320360
1 201 401 601 801 1001 1201 1401 1601 1801 2001 2201 2401 2601 2801 3001 3201 3401 3601 3801 4001 4201seconds
wat
ts
0102030405060708090100110
CPU
%
w atts PM On CPU % PM On
SPECpower_ssj2008 – Power and Processor % Dual-Core Intel Xeon 3.0, 4x1GB, 1x HDD, Pwr Mgmt On
buildslide
February 7, 2008 SPEC Workshop January 2008 slide 5
Exit
SSJ@
10%
SSJ@
80%
SSJ@
20%
Activ
e id
le
SSJ_
2008
Initi
aliz
atio
n
SSJ_
2008
Rep
orte
r
Calibrations Graduated Load LevelsActive
Idle
SSJ@
90%
SSJ@
70%
SSJ@
100%
Cal
ibra
tion
n
Cal
ibra
tion
2
Cal
ibra
tion
1
Controlling Measurements
Each load level has a “measurement interval” of 240 seconds, plus, □ “inter-level”
(delay between load levels), □ ramp up
(pre-measurement) □ ramp down
(post-measurement)
Enables synchronization,power with performance,data captureProvides settle timeRequired for Consistent, Repeatable Measurements
pre-
mea
sure
men
t
load level
240seconds
30secs
oper
atio
ns p
er s
econ
d
time
post
-mea
sure
men
t
dela
y be
twee
n lo
ad le
vel
10secs
dela
y be
twee
n lo
ad le
vel
measurementinterval
“go” “stop”power measurement
30secs
10secs
not to scale
February 7, 2008 SPEC Workshop January 2008 slide 6
SPECjbb2005 vs. SSJ_OPS@100%
SSJ_2008 derived from SPECjbb2005 - But different!Base code and transaction types are from SPECjbb2005
Substantive changes!
The two are not comparable:Notable Differences□ Different transaction mix□ Transaction scheduling and timing□ Modified throughput accounting □ Data collection via network – TCP/IP□ More logging increases disk I/O□ Plus others
February 7, 2008 SPEC Workshop January 2008 slide 7
468∑ssj_ops / ∑power =01980Active Idle
11020622,64910.20%10%20721344,15719.90%20%30222166,87530.10%30%39022989,38840.20%40%465237110,22249.60%50%541245132,52559.60%60%616254156,34470.30%70%677261176,68479.50%80%746269200,86090.40%90%799276220,30699.10%100%
Average Power
(W)ssj opsActual
LoadTarget Load
Performance to Power
Ratio
PowerPerformance
SPECpower_ssj2008 - Metric Definition
(includes power at the active idle state)
SPECpower_ssj2008 Intel publication #017http://www.spec.org/power_ssj2008/results/res2007q4/power_ssj2008-20071129-00017.html
overall ssj_ops/watt = ∑ ssj_ops @ 11 pts / ∑ avg watts @11 pts
The Primary Metric for SPECpower_ssj2008:
Much more data in SPEC report !
Table from SPECpower_ssj2008 Full Disclosure Report
performance / power each level
average powereach level
ssj_ops@100%
ssj_ops each level
overall ssj_ops/watt
February 7, 2008 SPEC Workshop January 2008 slide 8
Initial characterization of SPECpower_ssj2008
February 7, 2008 SPEC Workshop January 2008 slide 9
Hardware and Software
SUT: Intel® “White Box”□ Dual and Quad Core Intel® Xeon® 2.0 & 3.0 GHz□ Supermicro* X7DB8/ Main Board, Super Micro 5000P (Blackford chipset)□ 4x 2GB FBDIMMs□ 1x 700W PSU□ 5U Tower Platform
Microsoft* Windows Server 2003 64 bit□ Power Options: Server Balanced Processor Power and Performance
JVM: BEA* JRockit* P27.4.0 64 bit□ JVM Command Line similar to published results
Sampling Rates: □ Power: 1 second (average from meter)
SPECpower_ssj2008 setup□ SSJ Director on SUT □ load levels 120 seconds
February 7, 2008 SPEC Workshop January 2008 slide 10
Collecting OS Counters
Intel Written Daemon, “OSctrD.exe”□ Counters defined in ccs.props
Daemon runs on SUT □ Data to CCS via TCP/IP□ Can run on CCS□ CCS logs counters along with
watts, trans, etc.
Windows Only□ Linux port under consideration
“Integrated” Log□ Primary advantage
“Any” OS“Any” OS
ssj_2008instance(s)
ssj_2008director
PowerAnalyzer
Linux, Solaris*,Windows*
Linux, Solaris*,Windows*
AC PowerSource
Control & Collect
AC Power
CCSControl & Collection System
SUTSystem Under Test
PTD
TemperatureSensor
PTD
IntelDaemon
OSctrD
ccs-log.csv
“Any” OS“Any” OS
ssj_2008instance(s)ssj_2008instance(s)
ssj_2008director
PowerAnalyzer
Linux, Solaris*,Windows*
Linux, Solaris*,Windows*
AC PowerSource
Control & Collect
AC Power
CCSControl & Collection System
SUTSystem Under Test
PTDPTD
TemperatureSensor
PTDPTD
IntelDaemon
OSctrD
ccs-log.csv
. . .
. . . PF Temp Processor %............
% C1 timeampsWattsRTxactionsTime
February 7, 2008 SPEC Workshop January 2008 slide 11
SSJ_2008 Memory Usage
Code footprint:□ ~1.5M (total of all methods JIT’ed and optimized)Data footprint:□ ~50MB per warehouse “database” size□ ~8KB of transient objects per transaction
JVMs□ 32 bit JVM - Max. 4GB heap□ 64 bit JVM - much larger heap (max. 264 Bytes)□ Multiple instances can/will increase memory footprint
Optimal memory size is throughput capacity dependent□ Platform and configuration specific
Example: Quad-Core Intel Xeon based Dual Processor system□ ~8GB optimal for SPECpower_ssj2008
All above specific to BEA JRockit JVM
February 7, 2008 SPEC Workshop January 2008 slide 12
Transactions (SSJ OPS)
CPU % tracks load□ As expected on Intel Core 2 architecture
Other architectures will vary (SMT etc.)
Load level targets are % of SSJ_OPS@calibratedCPU utilization is no part of the benchmark
Transactions and Processor Utilization
0
40000
80000
120000
160000
200000
240000
152 352 552 752 952 1152 1352 1552 1752 1952 2152
seconds
ssj o
ps
0102030405060708090100110
Perc
ent
avg txs % CPU
February 7, 2008 SPEC Workshop January 2008 slide 13
Power and Processor Utilization
Average SSJ OPS per level tracking as expected□ Throughput per sec showing desired variability within load level
Negative Exponential inter-arrival time batch scheduling
Power consumption varies with load
Power and Processor Utilization
0
50
100150
200
250
300350
400
450
152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds
wat
ts
0
20
40
60
80
100
120
Per
cent
watts % CPU
February 7, 2008 SPEC Workshop January 2008 slide 14
All Three (SSJ OPS, CPU% and Power)
At all load levels including active idle:□ All three,
SSJ OPS, CPU % utilization and Average Watts
Three on One Chart – interesting capability
Transactions and Processor Utilization
0
40000
80000
120000
160000
200000
240000
152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds
ssj o
ps
0
50
100
150
200
250
300
350
400
450
Perc
ent
avg txs % CPU watts
February 7, 2008 SPEC Workshop January 2008 slide 15
Memory Utilization
Java heap size remains same throughout the run □ With typical tuning; -Xmx==-Xms
Constant committed memory in use at all load levels□ including active idle – JVM(s) still active
Memory usage profile will change with heap settings – especially if “dynamic”
memory consumption
0102030405060708090
100110
150 350 550 750 950 1150 1350 1550 1750 1950 2150seconds
% p
roce
ssor
util
izat
ion
Total % Processor Time mem % Committed Bytes in Use
February 7, 2008 SPEC Workshop January 2008 slide 16
Network I/O
~1.5 K Bytes/sec of network I/O at all load levels□ including active idle:
Network I/O from per sec request/response□ between Control & Collect (CCS) and SSJ_2008 Director
Network I/O
0
20
40
60
80
100
120
1 201 401 601 801 1001 1201 1401 1601 1801 2001seconds
% p
roce
ssor
util
izat
ion
0
500
1000
1500
2000
2500
3000
3500
4000
Byt
es p
er S
ec
CPU % NIC Bytes Total/sec
February 7, 2008 SPEC Workshop January 2008 slide 17
Disk I/O
Disk I/O – Regular bursts of ~140K Byte writes, □ ~3.3K Bytes/sec average for all load levels□ Most disk writes related to SSJ_2008 logging
Disk reads average zeroPhysical Disk I/O
0
10
20
30
40
5060
70
80
90
100
110
152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds
% p
roce
ssor
util
izat
ion
100
20,100
40,100
60,100
80,100
100,100
120,100
140,100
160,100
180,100
Byte
s pe
r Sec
% Processor Time Disk Write Bytes/sec
February 7, 2008 SPEC Workshop January 2008 slide 18
C1 state
% Time in C1 State – Inverse of CPU %□ C1/C1E Time contributes to power saving
Varies with architecture, OS and policies□ Intel EIST and C1E “enabled” in BIOS
C1 state
020000400006000080000
100000120000140000160000180000200000220000240000260000
152 352 552 752 952 1152 1352 1552 1752 1952 21520102030405060708090100110
Perc
ent
avg txs % CPU total % C1 time
February 7, 2008 SPEC Workshop January 2008 slide 19
Basic system events
Interrupts: ~700 /sec at all load levelsContext switches ~800 /sec□ Below 50% declining to ~400 at active idle
Rates are OS and platform dependentMore Investigation Needed Here
Context Switches and Interrrupts
0102030405060708090
100110
152 352 552 752 952 1152 1352 1552 1752 1952 2152seconds
% p
roce
ssor
util
izat
ion
0
200
400
600
800
1,000
1,200
1,400
per s
econ
d
% Processor Time Context Switches/sec Interrupts/sec
February 7, 2008 SPEC Workshop January 2008 slide 20
Impact of JVM Optimizations
Experiment with JVM Options□ JAVAOPTIONS_SSJ=“ “ (None, default heap and optimization) □ JAVAOPTIONS_SSJ=“-Xms3000m -Xmx3000m -Xns2400m -XXaggressive -
XXlargePages -XXthroughputCompaction -XXcallprofiling -XXlazyUnlocking -Xgc:genpar -XXtlasize:min=12k,preferred=1024k”
PerformanceLoss ~50%Power Less by0 to 3% less
Your Resultsdependent onJVM and options
Comparing JVM with and without 'options'
0
50,000
100,000
150,000
200,000
250,000
300,000
cal 1 cal 2 cal 3 cal T 100 90 80 70 60 50 40 30 20 10 0
load level
ssj_
ops
200
250
300
350
400
450
wat
ts
NoOpt avg-txs All avg-txs NoOpt Watts All Watts
February 7, 2008 SPEC Workshop January 2008 slide 21
Processor scaling
Dual Core Intel Xeon Quad Core Intel Xeon: (2.0GHz / 4MBL2) (2.0GHz 2x4MBL2)
SSJ_OPS@100% increased by ~77% Similar power@100%Overall SSJ_OPS/Watt improved by ~73%73%Overall SSJ_OPS/Watt
1%Power@100%
77%SSJ_OPS@100%
%increase
Dual to Quad Core(Intel Xeon)
Comparing Dual Core and Quad Core Processors
0100
200
300
400
500
600700
800900
0 50,000 100,000 150,000 200,000 250,000ssj_ops
ssj_
ops/
wat
tDual Core Intel Xeon Quad Core Intel Xeon
February 7, 2008 SPEC Workshop January 2008 slide 22
Frequency scaling
2.0 to 3.0 GHz Quad Core Intel Xeon (2x6MBL2):
Frequency increase of 50%□ SSJ_OPS@100% increases by ~24% □ Power@100% increased by ~10%□ Overall SSJ_OPS/Watt improved by ~16%
16%Overall SSJ_OPS/Watt
10%Power@100%
24%SSJ_OPS@100%
50%2GHz-->3GHz
% increaseQuad Core Intel Xeon
Comparing Frequency on Similar Processors
0100200300400500600700800900
1000
0 50,000 100,000 150,000 200,000 250,000 300,000ssj_ops
ssj_
ops/
wat
tQuad Core Intel Xeon 2.0 GHz Quad Core Intel Xeon 3.0 GHz
February 7, 2008 SPEC Workshop January 2008 slide 23
Platform generation scaling
Quad Core Intel Xeon 3.0GHz versus Single Core Intel Xeon 3.6GHz:□ SSJ_OPS@100% is ~7.5x, Power@100% less by ~20% □ Overall SSJ_OPS/Watt is ~8.0x
702%-20%654%Improvement698269308,022Quad Core Intel Xeon E5450, 3.0GHz2008338291163,768Dual Core Intel Xeon 5160, 3.0GHz200687.433640,852Single Core Intel Xeon, 3.6GHz 2004
ProcessorAnnouncedOverall
ssj_ops/wattPower
watts@100%Performance
ssj_ops@100%
All data as of 30 Jan, 2008 from SPEC published results at: http://www.spec.org/power_ssj2008/results/power_ssj2008.html
Power and Performance Scaling Across Generations
0
200
400
600
800
1000
1200
1400
0 40,000 80,000 120,000 160,000 200,000 240,000 280,000 320,000 360,000
ssj_ops
ssj_
ops/
wat
t
Single Core Intel Xeon, 3.6GHz Dual Core Intel Xeon 5160, 3.0GHzQuad Core Intel Xeon E5450, 3.0GHz
February 7, 2008 SPEC Workshop January 2008 slide 24
General Observations
CPU Utilization follows the load line (architecture dependent)
% Time in C1 State – Inverse of CPU %□ C1 Transitions per second highest at idle
Memory % Committed – constant across load lineDisk I/O – Regular bursts of ~140K byte writes, □ ~3.3K bytes/sec for all load levels
Network I/O - ~1.5K Bytes/sec, ~constant across load lineBasic system events require more investigationBenchmark metric and other data do effectively show scaling with frequency, cores and across platform generations
February 7, 2008 SPEC Workshop January 2008 slide 25
Summary
Results are specific to the platform and OS measured, etcSPEC FDR contains unprecedented amount of dataSome system resources track graduated loadsBenchmark metric and load level data fairly reflect configuration and OS settings changesNext Steps□ We are just getting started.□ First look, more refinements required
More measurements planned for in-depth characterization
END