energy measurement on intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... ·...

35
Energy measurement on Intel architectures Martin Golasowski [email protected] 2nd of March 2018

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Energy measurement on Intel architectures

Martin [email protected]

2nd of March 2018

Page 2: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Agenda

The HardwareIntel Turbo Boost and Enhanced SpeedStepIntel RAPL

The ToolsThe Linux way - powercapx86 adaptlikwidPAPIperftiptopIntel Xeon Phi

Interpreting results

Page 3: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

The Hardware

Page 4: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Intel Turbo BoostThermal Design Power (TDP) - maximum amount of powerrequired to dissipate by the cooling system

Intel R© Xeon R© Processor E5-2680 v3 - TDP: 120Whttps://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2 50-GHz

Page 5: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Turbo Boost & AVX Base Frequencies

With Haswell and later (also includes KNL):

(Image: Intel)

Turbo Boost MAX 3.0 - available from Broadwell-EPAVX base frequency is set per core - better granularity.

Page 6: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Intel Enhanced SpeedStep

Approximate CPU power consumption:

P = CV 2f

where:C is capacitance of the processor circuitry (fixed)V input voltagef frequency

I Exposed through ACPI since Pentium M - intel pstatemodule

I Set of P-states: voltage/frequency pairs of operating pointsI Switching between states has latencyI OS controlled policies - Linux cpufreq governors

Page 7: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Processor P and C states

C-statesI Idle modes - how deep processor sleepsI C0 - Running ⇒ C6 - Maximal voltage reduction

Transistion between states introduces latency - max. C-state canbe forced

Kernel parameter: intel_idle.max_cstate=0

Verification: $ cat /sys/module/intel_idle/parameters/max_cstate9

P-statesI Operating modes - frequency/voltage pairI Controlled by OS (ACPI)I Skylake introduces autonomy

Page 8: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Intel Speed Shift

Page 9: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

cpupower - Linux cpufreq interface

[root@cn11 ˜]# cpupower frequency-infoanalyzing CPU 0:

driver: intel_pstateCPUs which run at the same hardware frequency: 0CPUs which need to have their frequency coordinated by software: 0maximum transition latency: Cannot determine or is not supported.hardware limits: 1.20 GHz - 3.10 GHzavailable cpufreq governors: performance powersavecurrent policy: frequency should be within 1.20 GHz and 3.10 GHz.

The governor "powersave" may decide which speed to usewithin this range.

current CPU frequency: 1.30 GHz (asserted by call to hardware)boost state support:

Supported: noActive: no3000 MHz max turbo 4 active cores3000 MHz max turbo 3 active cores3100 MHz max turbo 2 active cores3100 MHz max turbo 1 active cores

Also available in: /sys/devices/system/cpu ...More information:https://www.kernel.org/doc/html/latest/admin-guide/pm/cpufreq.html

Page 10: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Running Average Power Limit - RAPL

I Software-based power meterin CPU

I Metrics available throughMSRs

I Used by Intel Turbo BoostI Introduced in Sandy Bridge

microarchitecture

Page 11: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

RAPL Domains Hierarchy

I Package - SocketI Power Plane 0 (PP0) - Individual CPU coresI Power Plane 1 (PP1) - Uncore devicesI DRAM - RAM memory

Grain of SaltAlways refer to CPU datasheet, available domains aremodel-specific. For example PP0/1 may not be available on someHaswell CPUs.

Page 12: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

RAPL Interface - Machine Specific Registers

RDMSR/WRMSR - Privileged instructions for accessing MSRs

More info: System Programming Guide, Chapter 14.9.

Page 13: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Measurement units and increments

UnitsValues from MSR XXX ENERGY STATUS are not final.Use following formula to obtain correct values:

xvalue = c · 12m

where:c is value obtained from the STATUS registerm value of multiplier provided by MSR RAPL POWER UNIT

Value overflowThe STATUS register is updated in 1ms interval and wraps in 60s,earlier in case of heavy load.

Page 14: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

The Tools

Page 15: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Linux Power Capping Framework

I Devices exposed through sysfs hierarchyI Using intel rapl kernel module

[root@cn11 ˜]# ls -R1 /sys/devices/virtual/powercap/intel-rapl/.../sys/devices/virtual/powercap/intel-rapl/intel-rapl:0:constraint_0_max_power_uwconstraint_0_nameconstraint_0_power_limit_uwconstraint_0_time_window_usconstraint_1_max_power_uw...

Page 16: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

x86 adapt

I Linux kernel module andlibrary

I Secure access to MSR andPCI registers from userspace

I Useful for building customtools

More info: https://github.com/tud-zih-energy/x86 adapt

Page 17: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Using x86 adapt

Individual MSRs are available as r/w knobs defined in the library.

I API available throughlibx86 adapt.so

I Populates/dev/x86 adapt/*

Available knobs listing:

Item 0: RESET----------------Item 1: Intel_xd_bit_disable----------------Item 2: Intel_PERF_GLOBAL_STATUS----------------Item 3: Intel_RAPL_Pckg_Energy......

Page 18: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

x86 adapt - Using C API

#i n c l u d e <s t d i o . h>#i n c l u d e <x86 adapt . h>. . .

// Lookup RAPL cpu i temconst char∗ i tem name = ” Inte l RAPL Pckg Energy ” ;i n t i t e m i d = x 8 6 a d a p t l o o k u p c i n a m e ( devtype , item name ) ;i f ( i t e m i d < 0){

f p r i n t f ( s t d e r r , ” Could not f i n d %s\n” , item name ) ;}

u i n t 6 4 t r e s u l t ;i n t r e t ;i f ( ( r e t = x 8 6 a d a p t g e t s e t t i n g ( fd cpu , i t e m i d ,& r e s u l t ) ) != 8){

f p r i n t f ( s t d e r r , ” Could not read i tem %d f o r cpu/ d i e %d\n” , i t e m i d ,CPU ) ;r e t u r n −1;

}

p r i n t f ( ”CPU: %d | MSR PKG ENERGY STATUS MSR %l l u \n” ,CPU, r e s u l t ) ;. . .

Page 19: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

likwid

I Set of CLI toolsI C APII Performance and energy

measurementI OpenMP and MPI supportI Benchmarking

Power related tools:I likwid-topologyI likwid-powermeterI likwid-perfscope

Developed by: Regionales RechenZentrum Erlangen (RRZE)

https://github.com/RRZE-HPC/likwid

Page 20: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

likwid-powermeter

Continuous measurement for selected CPU:

[root@cn11 ˜]# likwid-powermeter -c 0 -s 2s--------------------------------------------------------------------------------CPU name: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHzCPU type: Intel Xeon SandyBridge EN/EP processorCPU clock: 2.40 GHz----------------------------------------------------------------------------------------------------------------------------------------------------------------Runtime: 2.0007 sMeasure for socket 0 on CPU 0Domain PKG:Energy consumed: 32.4096 JoulesPower consumed: 16.1991 WattDomain PP0:Energy consumed: 6.12715 JoulesPower consumed: 3.0625 WattDomain DRAM:Energy consumed: 1.58621 JoulesPower consumed: 0.792828 Watt--------------------------------------------------------------------------------

Page 21: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - Performance API

I Effort to provide portable API for accessing hardwareperformance counters

I Supports CPUs, network, accelerators, etc.I Many features: custom events, timers, multiplexing, statistics

Developed by: University of Tennessee - Innovative Computing Laboratory

http://icl.cs.utk.edu/projects/papi/wiki/PAPIC:Overview

Page 22: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - Basic concepts

Components, Counters, EventsI High Level API

I Easy to use ( 10functions)

I Only preset events onthe CPU

I Low Level APII User-defined groups of

EventsI All PAPI componentsI Including native events

RAPL events available only as native!

Page 23: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - Components

Available components on a Sandy Bridge node(papi component avail):

...Name:net Linux network driver statistics

Native: 80, Preset: 0, Counters: 320

Name:rapl Linux SandyBridge RAPL energy measurementsNative: 14, Preset: 0, Counters: 14

Name:stealtime Stealtime filesystem statisticsNative: 33, Preset: 0, Counters: 33

...

Page 24: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - RAPL EventsAvailable native RAPL events on a Sandy Bridge node:(papi native avail):

Native Events in Component: raplrapl:::THERMAL_SPEC:PACKAGE0rapl:::THERMAL_SPEC:PACKAGE1rapl:::MINIMUM_POWER:PACKAGE0rapl:::MINIMUM_POWER:PACKAGE1rapl:::MAXIMUM_POWER:PACKAGE0rapl:::MAXIMUM_POWER:PACKAGE1rapl:::MAXIMUM_TIME_WINDOW:PACKAGE0rapl:::MAXIMUM_TIME_WINDOW:PACKAGE1rapl:::PACKAGE_ENERGY:PACKAGE0rapl:::PACKAGE_ENERGY:PACKAGE1rapl:::DRAM_ENERGY:PACKAGE0rapl:::DRAM_ENERGY:PACKAGE1rapl:::PP0_ENERGY:PACKAGE0rapl:::PP0_ENERGY:PACKAGE1

Page 25: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - Low level API demo

i n t e v e n t c o d e = −1;PAPI event name to code ( ” r a p l : : : PACKAGE ENERGY :PACKAGE0” , &e v e n t c o d e ) ;

P A P I e v e n t i n f o t e i n f o ;P A P I g e t e v e n t i n f o ( event code , &e i n f o ) ;

p r i n t f ( ” Event symbol : %s\n” , e i n f o . symbol ) ;p r i n t f ( ” D e s c r i p t i o n : %s\n\n” , e i n f o . l o n g d e s c r ) ;

i n t e v e n t s e t = PAPI NULL ;

// Crea te empty even t s e ti f ( ( r e t = P A P I c r e a t e e v e n t s e t (& e v e n t s e t ) ) != PAPI OK) {

h a n d l e e r r ( r e t ) ;}

// Add RAPL even t to even t s e ti f ( ( r e t = PAPI add event ( e v e n t s e t , e v e n t c o d e ) ) != PAPI OK) {

h a n d l e e r r ( r e t ) ;}

// S t a r t c o l l e c t i n g e v e n t si f ( ( r e t = P A P I s t a r t ( e v e n t s e t ) ) != PAPI OK) {

h a n d l e e r r ( r e t ) ;}

p r i n t f ( ” Doing some FLOPs . . . \ n” ) ;

Page 26: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

PAPI - Low level API demo

. . .FLOPS. . . .// Stop c o l l e c t i n g e v e n t sl ong long v a l u e s [ 1 ] ; // For one even ti f ( ( r e t = PAPI stop ( e v e n t s e t , v a l u e s ) ) != PAPI OK) {

h a n d l e e r r ( r e t ) ;}

p r i n t f ( ” Energy consumed on PKG0 : %l l d %s \n” , v a l u e s [ 0 ] , e i n f o . u n i t s ) ;

Page 27: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

perf - Linux profiling tool

I Common profiling tool inLinux

I Measuring, sampling,analysis

I Uses counters exposed bykernel

[root@cn11 ˜]# perf list

List of pre-defined events (to be used in -e):

branch-instructions OR branches [Hardware event]branch-misses [Hardware event]bus-cycles [Hardware event]cache-misses [Hardware event]cache-references [Hardware event]...power/energy-cores/ [Kernel PMU event]power/energy-pkg/ [Kernel PMU event]power/energy-ram/ [Kernel PMU event]

Page 28: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

perf - Measuring energy using RAPL

perf stat -a -e \power/energy-pkg/,\power/energy-ram/,\power/energy-cores/,\cycles [binary-to-measure]

time counts unit events0.087152594 3,24 Joules power/energy-pkg/0.087152594 0,11 Joules power/energy-ram/0.087152594 0,92 Joules power/energy-cores/0.087152594 137 374 362 cycles

Page 29: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

perf - Real time monitoring

Page 30: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

tiptop - Top for Hardware Performance counters

I Real-time diplay of IPC,cache misses, etc.

I ncurses base top-like

Developed by: Inria http://tiptop.gforge.inria.fr/

Page 31: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Power Measurement on Xeon PhiI Host: PAPI 1 & RAPL (Package, PowerPlane02 & DRAM).I Coprocessor:

I On host: Use micsmc -f to get Total PowerI On coprocessor:

I /sys/class/micras/power (∼50 msec updates), e.g.:> cat /sys/ class / micras / power113000000112000000113000000 # Total instantaneous uWatt22100000016000000 # PCIe power uWatt28000000 # 2x3 connector uWatt69000000 # 2x4 connector uWatt28000000 0 96700032000000 0 100000031000000 0 1501000

I Library libmicmgmt provides an API (see man libmicmgmt)More information here

Attention: Reading power values from KNC is”desctructive”(no idle power can be measured, but should be∼17 Watt)!

1For KNC modules micpower and host micpower can be used2All cores - except for HSW (see here )

Page 32: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

micsmc - quite useful utility

Page 33: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Static measurement with MERIC Tool

I Multi-node energymeasurement

I RAPL + x86 adaptI Readex project

$ source meric/intel2017a/set_env$ staticMERICtool/multiNodeStaticMeasureStart.sh --rapl$ ./a.out$ staticMERICtool/multiNodeStaticMeasureStop.sh --rapl

Runtime [s]: 6.59214Overall energy consumption [J]: 1167

The tool needs a special permissions on the cluster.For further info contact Ondrej Vysocky<[email protected]>.

Page 34: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Power vs. wall time tradeoff

I Reduce energy consumption via cpufreq or RAPL powercapping

I Possible to find optimal tradeoffI Highly depends on load type and cluster utilization

I Support infrastructure (DRUPS, storage, monitoring,. . . )consumes a lot of energy

Page 35: Energy measurement on Intel architecturesprace.it4i.cz/sites/prace.it4i.cz/files/files/phi-02... · 2018-03-02 · Values from MSR XXX ENERGY STATUS are not final. Use following

Thank you for your attention.