runtime processor power monitoring

65
Runtime Processor Power Monitoring VICE VERSA 07/18/2003 Talk Canturk ISCI

Upload: argyle

Post on 12-Jan-2016

96 views

Category:

Documents


0 download

DESCRIPTION

Runtime Processor Power Monitoring. VICE VERSA 07/18/2003 Talk Canturk ISCI. Detailed. Touch. INTRODUCTION. Runtime processor power Measurement with HW Estimation with Performance counters CPU Unit Power Breakdowns Runtime verification Processor thermal modeling - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Runtime Processor Power Monitoring

Runtime Processor Power Monitoring

VICE VERSA07/18/2003 Talk

Canturk ISCI

Page 2: Runtime Processor Power Monitoring

2

INTRODUCTIONINTRODUCTION

Runtime processor power Measurement with HW Estimation with

Performance countersCPU Unit Power BreakdownsRuntime verification

Processor thermal modelingPower Phase Behavior of programsMapping between power behavior

and program structure

Page 3: Runtime Processor Power Monitoring

3

THE BIG PICTURETHE BIG PICTURE

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

PowerPhases

ProgramProfiling

ProgramStructure

To Estimate component power & temperature breakdowns for P4 at runtime…

To analyze how power phase behavior relates to program structure

Bottom line…

Page 4: Runtime Processor Power Monitoring

4

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

AgendaAgenda

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Performance Monitoring P4 Performance Counters Performance Reader LKM

Real Power Measurement P4 Power Measurement Setup Examples

Power Modeling P4 Power Model Model + Measurement Sync Setup,

Verification Thermal Modeling

Brief Thermal Model Intro Ppro Thermal Model Results

Page 5: Runtime Processor Power Monitoring

5

PowerModeling

PowerPhases

ProgramProfiling

ProgramStructure

PowerPhases

ProgramProfiling

ProgramStructure

Bonus MaterialBonus Material

Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions

Profiling Execution Flow Sampling process’ execution “PCsampler” LKM

Program Structure Execution vs. Code space Power Phases Exec. Phases

<OR VICE VERSA>

Page 6: Runtime Processor Power Monitoring

6

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Performance MonitoringPerformance Monitoring

Performance Monitoring

Related Work Performance Monitoring

P4 Performance Counters Performance Reader LKM

Real Power Measurement P4 Power Measurement Setup Examples

Power Modeling P4 Power Model Model + Measurement Sync Setup,

Verification Thermal Modeling

Refined Thermal Model Ex: Ppro Thermal Model

Page 7: Runtime Processor Power Monitoring

7

Live CPU Performance Monitoring Live CPU Performance Monitoring with Hardware Counterswith Hardware Counters

Most CPUs have hardware performance counters P4 Performance Monitoring HW:

18 Event Counters 18 Counter Configuration Control Registers

Configure how to count 45 Event Selection Control Registers

Configure what to count Additional Control Registers

Page 8: Runtime Processor Power Monitoring

8

Our Event-Counter: Performance ReaderOur Event-Counter: Performance Reader

Performance Reader implemented as Linux Loadable Kernel Module Implements 6 syscalls:

select_events()reset_event_counter()start_event_counter()stop_event_counter()get_event_counts()set_replay_MSRs()

User Level Interface: Defines the events

Starts counters Stops counters

Reads counters & TSC

Event Types: 59 event classes 100s of events to count

Page 9: Runtime Processor Power Monitoring

9

Performance Reader: Performance Reader: Example ValidationExample Validation

L1_Dcache benchmark

Controls cache hit behavior

Validated against measured cache events

Vary hit rate from 0-100%

Page 10: Runtime Processor Power Monitoring

10

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Processor Power MeasurementProcessor Power Measurement

Real Power Measurement

Related Work Performance Monitoring

P4 Performance Counters Performance Reader LKM

Real Power Measurement P4 Power Measurement Setup Examples

Power Modeling P4 Power Model Model + Measurement Sync Setup,

Verification Thermal Modeling

Refined Thermal Model Ex: Ppro Thermal Model

Page 11: Runtime Processor Power Monitoring

11

P4 Power Measurement SetupP4 Power Measurement Setup

1mV/Adc conversion

Clamp ammeter on 12V lines on measured CPU

Voltage readings via RS232 to

logging machine

Serial Reader(PowerMeter)(PowerPlotter)

Convert to Power vs. time window

DMM reading clamp voltages

Page 12: Runtime Processor Power Monitoring

12Pow

erP

lott

er:

Exa

mp

leP

ower

Plo

tter

: E

xam

ple “Branch exercise”

(Taken rate: 1)“High-Low”“L1Dcache”

Array Size1/100 of L1

“L1Dcache”Array Sizex25 of L1~L2

“L1Dcache”Array Sizex4 of L2

Initialization

BenchmarkExecution

“Fast”

Page 13: Runtime Processor Power Monitoring

13

SPEC Power ExamplesSPEC Power Examples

Different programs show very different power characteristics

Timescale of interest can be huge => inaccessible via simulation

Spec GCC (O3) with specrun -a run

0

10

20

30

40

50

60

70

80

0 50 100 150 200time (s)

[W]

Spec VPR (O3) with specrun -a run

0

10

20

30

40

50

60

0 100 200 300 400 500time(s)

[W]

Page 14: Runtime Processor Power Monitoring

14

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Processor Power ModelingProcessor Power Modeling

PowerModeling

Related Work Performance Monitoring

P4 Performance Counters Performance Reader LKM

Real Power Measurement P4 Power Measurement Setup Examples

Power Modeling P4 Power Model Model + Measurement Sync Setup,

Verification Thermal Modeling

Refined Thermal Model Ex: Ppro Thermal Model

Page 15: Runtime Processor Power Monitoring

15

DefineComponents

Performance Monitoring

Real Power Measurement

PowerModeling

DefineEvents

Real Power Measurement

Verify total power against measured processor power

PowerModeling

Convert counter info into component power breakdowns

Performance Monitoring

Gather counter info with minimal power overhead and program interruption

DefineEvents

Determine combination of P4 events that represent component accesses best

DefineComponents

Define components (I.e. L1 cache, BPU, Regs, etc.), whose powers we’ll model: from annotated layout

P4 POWER MODELP4 POWER MODEL

Page 16: Runtime Processor Power Monitoring

16

Defining ComponentsDefining Components

Page 17: Runtime Processor Power Monitoring

17

Defining Events Defining Events Access Rates Access Rates We determined 24 events to approximate access rates

for 22 components Used Several Heuristics to represent each access rate

Examples:

Need to rotate counters 4 times to collect all event data Used 15 counters & 4 rotations to collect all event data

Page 18: Runtime Processor Power Monitoring

18

Access Rates Access Rates Component Powers Component Powers

“Performance Counter based Access Rate estimations are used as proxy for max component power weighting together with microarchitectural details in order to estimate processor sub-unit powers”

EX: Trace cache delivers 3 uops/cycle in deliver mode and 1 uop/cycle in build mode:

Power(TC)=[Access-Rate(TC)/3 + Access-Rate(ID)] x MaxPower(TC) + Non-gated TC CLK power

Total power is computed as the sum of all 22 component powers + measured idle power (8W):

Page 19: Runtime Processor Power Monitoring

19

Experiment Setup – Recall:Experiment Setup – Recall:

1mV/Adc conversion

Clamp ammeter on 12V lines on measured CPU

Voltage readings via RS232 to

logging machine

Serial Reader(PowerMeter)(PowerPlotter)

Convert to Power vs. time window

DMM reading clamp voltages

Page 20: Runtime Processor Power Monitoring

20

Experiment SetupExperiment Setup

Voltage readings via RS232 to logging machine

1mV/Adc conversion

Page 21: Runtime Processor Power Monitoring

21

Experiment SetupExperiment Setup

POWERCLIENT

POWERSERVER

Voltage readings via RS232 to logging machine

Convert voltage to measured powerConvert access rates to modeled powersSync together in time window

1mV/Adc conversion

Component access rates

over ethernet

Page 22: Runtime Processor Power Monitoring

22

Tuning BenchmarksTuning Benchmarks

“Fast”

“Branch exercise”(Taken rate: 1) “High-Low”“L1Dcache”

(Hit Rate : 0.1)Measured

Modeled

Page 23: Runtime Processor Power Monitoring

23Com

pon

ent

Bre

akd

own

sC

omp

onen

t B

reak

dow

ns

Component Breakdowns for “branch_exercise”

Colors for 4 CPU subsystems

Issue - RetireExecution

Page 24: Runtime Processor Power Monitoring

24

Benchmark Power BreakdownsBenchmark Power Breakdowns

High issue, exec. & branch power

High L2 Cache PowerHigh L1 Cache PowerHigh Bus Power

Page 25: Runtime Processor Power Monitoring

25

SPEC2000 Results SPEC2000 Results

VPR Elaboration:Integer benchmark2 runs: 1st Placement, 2nd Route1st run much stable power, 2nd more variablePlacement has higher miss than route < L2 pwr>Significant FPE power due to x87_SIMD_movesTwolf Elaboration:(Integer benchmark)Several loop computations traversing memory<High Memory Power>Although ~const. Total power, component powers have slight gradients

Equake Elaboration: (FP benchmark) Initialization and computation phasesFP intensive mesh computation phaseInitialization with high complex IA32 instructions

Page 26: Runtime Processor Power Monitoring

26

Average SPEC Total PowersAverage SPEC Total Powers

1st set: Overall, 2nd set: Non-idle powerAverage difference between measurement

and estimation: 3WWorst case: Equake (5.8W)

Page 27: Runtime Processor Power Monitoring

27

Stdev of SPEC Total PowersStdev of SPEC Total Powers

1st set: Overall, 2nd set: Non-idle powerAverage difference: 2WWorst case: Vortex (3.5W)

Page 28: Runtime Processor Power Monitoring

28

Desktop ApplicationsDesktop ApplicationsWe aim to track low power utilizations as

well.Desktop applications are usually low

power with intermittent power bursts3 applications, with common operations

such as open/close application, web, streaming media, text editing, save to disk, statistical computations.

Page 29: Runtime Processor Power Monitoring

29

Performance Monitoring

Real Power Measurement

PowerModeling

ThermalModeling

Thermal ModelThermal Model

ThermalModeling

Related Work Performance Monitoring

P4 Performance Counters Performance Reader LKM

Real Power Measurement P4 Power Measurement Setup Examples

Power Modeling P4 Power Model Model + Measurement Sync Setup,

Verification Thermal Modeling

Brief Thermal Model Intro Ppro Thermal Model Results

Skip Thermal Skip Thermal

Page 30: Runtime Processor Power Monitoring

30

THERMAL MODELING: A Basic ModelTHERMAL MODELING: A Basic Model

Based on lumpedR-C model from packaging

Built uponpower modeling Sampled

Component Powers

Respective component areas

Physical processor Parameters

PackagingHeat Transfer

Tb,i

Rth,i

Cth,iPi

Tb,j

Rth,j

Cth,jPj

Tb,k

Rth,k

Cth,kPk

Tb,l

Rth,l

Cth,lPl

Th

Rth,h

Cth,hPi+Pj+Pk+Pl HEATSINK

Blki Blkj

BlkkBlkl

DIETb,i

Rth,i

Cth,iPi

Tb,j

Rth,j

Cth,jPj

Tb,k

Rth,k

Cth,kPk

Tb,l

Rth,l

Cth,lPl

Th

Rth,h

Cth,hPi+Pj+Pk+Pl

Tb,i

Rth,i

Cth,iPi

Tb,i

Rth,i

Cth,iPi

Tb,j

Rth,j

Cth,jPj

Tb,j

Rth,j

Cth,jPj

Tb,k

Rth,k

Cth,kPk

Tb,k

Rth,k

Cth,kPk

Tb,l

Rth,l

Cth,lPl

Tb,l

Rth,l

Cth,lPl

Th

Rth,h

Cth,hPi+Pj+Pk+Pl HEATSINK

Blki Blkj

BlkkBlkl

DIE

Blki Blkj

BlkkBlkl

DIE

Page 31: Runtime Processor Power Monitoring

31

Physical Structure vs. Thermal Model Physical Structure vs. Thermal Model

Ambient Temperature

Heatsink

Heat Spreader

Package

Die

Th

Rh

Ch

R_hXA

TA

Tspr

Rspr

Cspr

R_grXspr

Tp,i

Rp,i

Cp,i

Tdie,i

Rdie,i

Cdie,iPi

Ptotal

Thermal Grease

Ambient Airflow

Page 32: Runtime Processor Power Monitoring

32

Simulation OutputsSimulation Outputs Thermal nodes updated every t~20ms

Component Temperatures Build up to ~350K in ~5hrs Theatsink moves very slowly as expected

Pentium Pro Thermal Simulation

01020304050607080

Ambie

nt

Heatsi

nk

Heat S

prea

der

Decod

eIss

ue

Reord

er

DMem

IMem FUs

Other

Te

mp

era

ture

(C

)

At startupAfter 5 Hours

Page 33: Runtime Processor Power Monitoring

33

PowerModeling

PowerPhases

ProgramProfiling

ProgramStructure

PowerPhases

Power Phase BehaviorPower Phase Behavior

Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions

Profiling Execution Flow Sampling process’ execution “PCsampler” LKM

Program Structure Execution vs. Code space Power Phases Exec. Phases

<OR VICE VERSA>

Page 34: Runtime Processor Power Monitoring

34

Power Vectors for SimilarityPower Vectors for SimilaritySimilar to basic block vectorsUse component power vector samples

to represent program phasesConsider Manhattan distance between 2

vectors as the ‘measure of dissimilarity’ between the corresponding execution points

Construct a similarity matrix to represent similarity among all pairs of execution points Each entry in the similarity matrix:

Page 35: Runtime Processor Power Monitoring

35

Gcc Component Power Breakdowns

0

1

2

3

4

5

6

0 50 100 150 200 250Time (s)

Po

we

r [W

att

s] L1 cache

TraceCache

RETIRE

Gcc Total Power

0

10

20

30

40

50

60

0 50 100 150 200 250

Po

wer

[W

atts

]

MEASURED POWER COUNTER ESTIMATED POWER

Gzip Total Power

0

10

20

30

40

50

60

44 88 132 176 220 264 308 352 396 440Time (s)

Pow

er [W

atts

]

MEASURED POWER COUNTER ESTIMATED POWER

Gzip Component Power Breakdowns

0

1

2

3

4

5

6

44 144 244 344 444Time (s)

Pow

er [W

atts

] L1 cache

INT Exec

RETIRE

Gcc

& G

zip

Sim

ilari

ty M

atri

ces

Gcc

& G

zip

Sim

ilari

ty M

atri

ces

Gcc Elaboration: Very variant power

Almost identical power behavior at 30, 50, 180s.

Although 88s, 110s, 140s, 210s and 230s show similar total power; 88, 210 and 230 share higher similarity.

Gzip Elaboration: Much regular power behavior

Spurious similarities such as 100-150s and 200-280 are distinguished by the similarity analysis

Page 36: Runtime Processor Power Monitoring

36

Generating representative vectorsGenerating representative vectors

Gzip has ~1000 power vectors Cluster vectors based on similarity Could we represent power behavior with reasonable

accuracy, with a small number of ‘signature’ vectors? Ex: 26 representative vectors with “Thresholding

Algorithm”:

Page 37: Runtime Processor Power Monitoring

37

PowerModeling

PowerPhases

ProgramProfiling

ProgramStructure

ProgramProfiling

Program Execution ProfileProgram Execution Profile

ProgramStructure

Power Phase Behavior Similarity Based on Power Vectors Identifying similar program regions

Profiling Execution Flow Sampling process’ execution “PCsampler” LKM

Program Structure Execution vs. Code space Power Phases Exec. Phases

<OR VICE VERSA>

Page 38: Runtime Processor Power Monitoring

38

Program Execution ProfileProgram Execution Profile

Sample program flow simultaneously with power Our LKM implementation: “PCsampler”

Generate code space similarity in parallel with power space similarity

Relative comparisons for: Complexity Accuracy Applicability, etc.

Page 39: Runtime Processor Power Monitoring

39

EOP

Page 40: Runtime Processor Power Monitoring

40

Where all this is useful?Where all this is useful? Measurement/Modeling for microarchitectural

details Compiler level power

SW power profiling Power Aware OS

Dynamic power/thermal/March. ConfigurationDynamic memory allocate, Process cruise control, etc.

Demonstrates modern processor power Need for speed! Long Timescales, thermal

constants Identify program phases w/o knowledge of code,

no basic block info whatsoever Program signatures for detailed simulation, say:

“power points rather than simpoints”

Page 41: Runtime Processor Power Monitoring

41

Page 42: Runtime Processor Power Monitoring

42

Counter Access HeuristicsCounter Access Heuristics 1) BUS CONTROL:

No 3rd Level cache BSQ allocations ~ IOQ allocations Metric1: Bus accesses from all agents

Event: IOQ_allocationCounts various types of bus transactions

Should account for BSQ as wellaccess based rather than duration

MASK:Default req. type, all read (128B) and write (64B) types, include OWN,OTHER and PREFETCH

Metric2: Bus Utilization(The % of time Bus is utilized)Event: FSB_data_activity

Counts DataReaDY and DataBuSY events on BusMask:

Count when processor or other agents drive/read/reserve the busExpression: FSB_data_activity x BusRatio / Clocks Elapsed

To account for clock ratios

Page 43: Runtime Processor Power Monitoring

43

Counter Access HeuristicsCounter Access Heuristics 2) L2 Cache:

Metric: 2nd Level cache referencesEvent: BSQ_cache_reference

Counts cache ref-s as seen by bus unitMASK:

All MESI read misses (LD & RFO)2nd level WR misses

3) 2nd Level BPU: Metric 1: Instructions fetched from L2 (predict)

Event: ITLB_ReferenceCounts ITLB translations

Mask:All hits, misses & UC hits

Metric 2: Branches retired (history update)Event: branch_retired

Counts branches retiredMask:

Count all Taken/NT/Predicted/MissP

Page 44: Runtime Processor Power Monitoring

44

Counter Access HeuristicsCounter Access Heuristics 4) ITLB & I-Fetch:

etc……… 10) FP Execution:

Metric: FP instructions executedevent1: packed_SP_uop

counts packed single precision uopsevent2: packed_DP_uop

counts packed single precision uopsevent3: scalar_SP_uop

counts scalar double precision uopsevent4: scalar_DP_uop

counts scalar double precision uopsevent5: 64bit_MMX_uop

counts MMX uops with 64bit SIMD operandsevent6: 128bit_MMX_uop

counts integer SSE2 uops with 128bit SIMD operandsevent7: x87_FP_UOP

counts x87 FP uopsevent8: x87_SIMD_moves_uop

counts x87, FP, MMX, SSE, SSE2 ld/st/mov uops Back Back

Page 45: Runtime Processor Power Monitoring

45

Page 46: Runtime Processor Power Monitoring

46

P4 DetailsP4 DetailsKarelian.ee:

P4 – 1.4GHz0.18, C4-FC-PGA-423Heatsink Folded FinM6, Al interconnectDie Size: 217 mm2

Package Size: 5.34cm x 5.17cmPower: Idle/typ./max=??/51.8/71WD$1&T$1/L2: 8K&12KUops/256KVoltage: 1.7/1.75V

Page 47: Runtime Processor Power Monitoring

47

P4 DetailsP4 Details 1st LKM: <LKM_CPUinfo & UserLevel_CPUinfo>

Implements syscall: getCPUinfo()Gathers CPU info from:

/asm/processor.hIntel control registers (CR4)CPUID instruction

Reveals:Debug Store mechanism exists for PEBSTSC existsMSRs implemented

We can read/write performance counters

EX:karelian (P4,willamette): UserLevel_CPUinfoviale (P4, Northwood): UserLevel_CPUinfo

Back Back

Page 48: Runtime Processor Power Monitoring

48

P4 Detector - Counter ClustersP4 Detector - Counter ClustersEvent Detectors Event Counters

4 bit wide bus

P4

Com

pone

nts

EV

EN

TS

Page 49: Runtime Processor Power Monitoring

49

Counters, ESCRs & CCCRsCounters, ESCRs & CCCRs

Simplified Recipe:1. Select Event to count2. Select a counter

(also defines CCCR)3. Select an ESCR4. Set ESCR fields5. Set CCCR fields6. Enable CCCR

Page 50: Runtime Processor Power Monitoring

50

Counting MechanismsCounting MechanismsCounting Types

Non-retirement: Events occur any time during execution

At-Retirement: Events at the retirement of instruction

Can count BOGUS vs NBOGUS, Tag uops to count, etc.

TerminologyMechanisms:

Front end tagging (i.e. LD/ST retired)Execution tagging (i.e. packed_DP_retired)Replay Tagging (i.e. L1 misses)No Tags (i.e. uops retired)

Also:Event Counting | IEBS | PEBS

Back Back

Page 51: Runtime Processor Power Monitoring

51

At Retirement Counting TerminologyAt Retirement Counting Terminology

Back Back

BOGUS/NBOGUS (speculative)Tagging (count uops that encounter event)Replay (Data speculation)

Page 52: Runtime Processor Power Monitoring

52

Verifying Counter ReaderVerifying Counter Reader1) L1Dcache_exercise:

Uses pointer assignment L1=8K, L2=256K Array Size = (L1 Size/Hit Rate)

i.e. for 10% Hit rate: 80K 20K entriesArray Size < L2 size

Array elements PRBS of array indices Bench loop:

new index array[old index] However, gcc puts 5 LDs in the bench loop

4 static Hit rate ~ 100%1 our load our desired hit rate

Page 53: Runtime Processor Power Monitoring

53

……Verifying Counter ReaderVerifying Counter Reader

1) L1Dcache_exercise results:

L1Dcache Experiment

-20.00%

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

0.04 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 100 1000

Desired Hit Rate

Ac

qu

ire

d R

ate

s

Acquired L1 Hit Rate

Our L1 Hit Rate From L2Accesses

Ex:L1Dcache_exerciseHit Rate = 0.25

Page 54: Runtime Processor Power Monitoring

54

……Verifying Counter ReaderVerifying Counter Reader2) branch_exercise:

Uses random number comparison Assigns 400K PRBS array outside bench loop

To avoid rand() instructions in bench loop bench loop:

Compares array index to threshodThreshold = RAND_MAX*TakenRate

Repeats 1000 reseeding each time However gcc adds 2 more branches into

bench loop:Loop exit condition (Prediction ~ 100%)Unconditional JMP (Prediction ~ 100%)

Our Branch’s Expected Mispredict Rate:~ (0.5 - |TakenRate – 0.5| )

Page 55: Runtime Processor Power Monitoring

55

……Verifying Counter ReaderVerifying Counter Reader

2) branch_exercise results:

Ex:branch_exerciseTaken Rate=0.5

Branch Prediction Experiment

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

0 0.1 0.2 0.25 0.3 0.4 0.5 0.6 0.7 0.75 0.8 0.9 1

Desired Taken Rate

Acq

uir

ed R

ates

Approximated Mispredict RateOur Branch's Taken Rate

Back Back

Page 56: Runtime Processor Power Monitoring

56

MEASUREMENT MethodMEASUREMENT Method Select Power lines that reflect CPU power

P4 uses 12 V lines Clamp the current probe over the 12V lines

1mV/Adc conversion Connect the clamp into DMM Send Voltage reading over serial Log the voltage readings

Convert to instantaneous power as:12 x Vsample x 1000

Log Power values Plot Power values

Page 57: Runtime Processor Power Monitoring

57

MEASUREMENT ToolsMEASUREMENT ToolsPoll serial port ~20ms

quicker overkill, slower overlookCompute running average sample every t you select

Easier to sync with Power ModelPowerMeter:

Convert voltage reading to power and logP=12 x Vread x 1000

PowerPlotter: Plot Power samples over sliding time

window100 s history with 1000 samples (t = 100ms)

Page 58: Runtime Processor Power Monitoring

58

Current ProbeCurrent ProbeFluke i410Uses Hall Voltage to measure current

and convert to Voltage:1mV / Adc

Range: 0.5 – 400A Accuracy: 3.5%+0.5AGenerated voltage is fed to DMMCompared against the Ppro Amoeba

shunt setup for verification

Page 59: Runtime Processor Power Monitoring

59

DMMDMMAgilent 34401A Measurement Motive:

We should sample as quick as possible (grep case)

Measurement Setup:Fast 4 digit, Autozero OFF, Display OFF

From [8], 1000 readings/s (x150 faster than fast 6 digit)

Serial Interface:From [9] 55 ASCII readings /s

Polling serial port faster than 20ms is overkill

Back Back

Page 60: Runtime Processor Power Monitoring

60

P4 Power LinesP4 Power Lines Which power lines should we cut / clamp?

[5] shows the power lines:1-CPU power connector 13-System power connectorP1 13 & P2 1

[6],[7] say P4 uses 12V lines for CPU, rather than 5V lines

Both P1 & P2 have 12, 5 and 3.3 V lines

I run branch_exercise (takenRate=1) and gzip_static obtain the current variation on the lines

 

Page 61: Runtime Processor Power Monitoring

61

Current on Power LinesCurrent on Power LinesCurrent on Connector P1

line7 (12V)

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 20 40 60 80

time (s)

I [A

]

Series1

Current on Connector P1 lines1,3,,6,18,19,20,22 (5V)

0

0.5

1

1.5

2

2.5

0 20 40 60 80

time (s)

I [A

]Series1

Current on Connector lines 11,12,23 (3.3V)

0

0.5

1

1.5

2

2.5

0 10 20 30 40 50 60 70 80

time (s)

I [A

]

Series1

Current on connector P2 line1 (3.3V)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50 60 70 80

time(s)

I(A

)

Series1

Current on connector P2 line14 (5V)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 10 20 30 40 50 60 70

time (s)

I [A

]

Series1

Current on Connector P2 line 3 (12V)

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 10 20 30 40 50 60 70

time (s)

I [A

]

Series1

Current on Connector P2 line7 (12V)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 10 20 30 40 50 60 70

time (s)

I [A

]

Series1

Current on connector P2 line 9 (5V)

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 10 20 30 40 50 60 70

time (s)

I [A

]

Series1

Reveals ALL 3 12V lines’ currents follow CPU activity All add to CPU Power! Back Back

Page 62: Runtime Processor Power Monitoring

62

About the ripplesAbout the ripplesAdd ripple stuff

here…!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 63: Runtime Processor Power Monitoring

63

P4 Architecture vs LayoutP4 Architecture vs Layout

Components to Model:

1) Bus Control2) L2 Cache3) 2nd Level BPU4) ITLB & Ifetch5) L1 Cache

6) MOB7) Mem Control8) DTLB9) Int EXE10)FP EXE11) Int RF

12)FP RF13)Decode14)Trace $15)1st Level BPU16)Microcode ROM17)Allocation

18)Rename19) Inst-n Qs20)Schedule21) Inst-n Qs22)Retirement

Back Back

Page 64: Runtime Processor Power Monitoring

64

Counter RotationsCounter Rotations

Back Back

Page 65: Runtime Processor Power Monitoring

65

EOP