an evaluation of the intel xeon e5 processor series
DESCRIPTION
An evaluation of the Intel Xeon E5 Processor Series. Zurich Launch Event 8 March 2012 Sverre Jarp, CERN openlab CTO Technical team: A.Lazzaro, J.Leduc, A.Nowak. Mont Blanc (4,808m). Geneva (pop. 190’000). Lake Geneva (310m deep). Intense data pressure creates strong demand for computing. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/1.jpg)
An evaluation of the Intel Xeon E5 Processor Series
Zurich Launch Event8 March 2012
Sverre Jarp, CERN openlab CTO
Technical team: A.Lazzaro, J.Leduc, A.Nowak
![Page 2: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/2.jpg)
Mont Blanc (4,808m)
Lake Geneva (310m deep)Geneva (pop. 190’000)
![Page 3: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/3.jpg)
Intense data pressure creates strong demand for computing
250’000 IA computing
cores
Tens of petabytes stored per
year
Raw data: a few
petabytes per second
A rigorous selection process enables us to find that one interesting event in 10 trillion (1013)
![Page 4: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/4.jpg)
The Worldwide LHC Computing Grid
Tier-1: permanent storage, re-processing, analysis
Tier-0 (CERN): data recording, reconstruction and distribution
Tier-2: Simulation,end-user analysis
> 1 million jobs/day
~250’000 cores
173 PB of storage
nearly 160 sites
10 Gb links
![Page 5: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/5.jpg)
The CERN openlabA unique research partnership of CERN and the industryObjective: The advancement of cutting-edge computing solutions to be used by the worldwide LHC community
• Partners support manpower and equipment in dedicated competence centers
• openlab delivers published research and evaluations based on partners’ solutions – in a very challenging setting
• Created robust hands-on training program in various computing topics, including international computing schools; summer student programme
• Past involvement: Enterasys Networks, IBM, Voltaire, F-secure, Stonesoft, EDS; New contributor: Huawei
• Just started phase IV: 2012-2014
http://cern.ch/openlab
![Page 6: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/6.jpg)
6
Benchmarking: A complex affair• In modern servers, at least the following
elements need to be controlled:– Hardware:
• Processor generation• Socket count• Core count• CPU frequency• Turbo boost• SMT• Cache sizes• Memory size and type• Power configuration
– Software:• Operating System version• Compiler version and flags
8 March 2012
![Page 7: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/7.jpg)
7
Xeon E5 in some detail• Advanced Vector eXtensions (AVX)
– 256 bit registers which can hold 4 doubles/8 floats– AVX instruction set
• More execution units– Two load units, for instance
• Enhanced Hyper-threading and Turbo-boost technology
• Larger on-die L3 cache• Integrated PCI Express 3.0 I/O
8 March 2012
![Page 8: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/8.jpg)
8
Our Xeon E5 testing• System tested:
– Beta-level white box; Dual-socket server.– Xeon E5-2680 @ 2.7 GHz, 8 cores, 130W TDP
• 32 GB memory (1333 MHz)• C1 stepping
– Code name: “Sandy Bridge EP”• Benchmarks used:
– HEPSPEC– HEPSPEC/W– MT-Geant4– MLfit
8 March 2012
![Page 9: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/9.jpg)
9
HEPSPEC• Throughput test from SPEC 2006
– All the C++ jobs (INT as well as FP); As many copies as cores– Scientific Linux CERN (SLC) 5.7/gcc 4.1.2/64-bit mode/Turbo off/SMT on– Compared to 6-core “Westmere-EP” Xeon X5670 (@2.93 GHz)
• Frequency-scaled
8 March 2012
0
22
44
73 83
134
156
177
198
219
284
349
0 4 8 12 16 20 24 32
HE
PS
PE
C
#CPUs
Sandy Bridge-EP E5-2680Westmere-EP X5670 (frequency scaled)
Using only the “real” cores:Speed-up per core: 1.2xCore count: 1.33xTotal: 1.6x
SMT gain (for both): 1.23x
![Page 10: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/10.jpg)
10
Energy efficiency• For CERN and most W-LCG sites, energy
efficiency is paramount– Our centres have (more or less) a fixed amount of electric
energy– Ideally, we would like to double the throughput/watt from
generation to generation– This was relatively easy when core count increased
geometrically:• 1 2 4
– Recently, however, it has been increasing arithmetically:• 4 (Xeon 5500) 6 (Xeon 5600) 8 (Xeon E5-2600)
8 March 2012
![Page 11: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/11.jpg)
11
HEPSPEC/Watt• Great news: Bigger jump than foreseen in energy efficiency!
– Now reaching 1 HEPSPEC/W which is 1.7x compared to Xeon X5670• Xeon E5 options: SLC 5.7, 64-bit mode, SMT on, Turbo on• Xeon 5600 options: SLC 5.4
8 March 2012
0
0.2
0.4
0.8
0.925
1.039
SP
EC
/ W
E5-2680 HEP performance per WattTurbo-on running SLC5
E5-2680 SMT-offE5-2680 SMT-on
0
0.2
0.4
0.5059
0.611
0.8
SP
EC
/ W
X5670 HEP performance per Watt(extrapolated from 12GB to 24GB)
X5670 SMT-offX5670 SMT-on
Bigger is better!
Xeon 5600
Xeon E5-2600
STOP PRESS: With SLC 6 (gcc 4.4.6) we further lower the power consumption by 5% and increase the HEPSPEC results by 3%: 1.083x in total !
![Page 12: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/12.jpg)
12
MT Geant4• Our favourite benchmark for testing weak scaling:• A threaded version of CERN’s detector simulation
program– Speed-up compared to previous generation ([email protected]):
• Both with Turbo-off, SMT-on (L5640 frequency-adjusted): 1.46x
8 March 2012
SLC 5.7, gcc 4.3.3, pinning of threads
Xeon E5-2600 SMT speed-up: 1.25x
![Page 13: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/13.jpg)
13
MLFit• Our favourite benchmark for testing strong scaling:• A threaded/vectorised data analysis program
– Single core (Turbo off, using SSE): 1.19x– Single core, moving to AVX: 1.12x– All the “real” cores w/SSE: (1.33 * 1.19) 1.59x– All the “real” cores & AVX: (1.59 *1.12) 1.78x
8 March 2012
1.33x
Xeon E5-2600 SMT speed-up: 1.29x
SLC 6.2, icc 12.1.0, pinning of threads
![Page 14: An evaluation of the Intel Xeon E5 Processor Series](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c5f550346895dca686d/html5/thumbnails/14.jpg)
14
Conclusion• The Intel Xeon E5 Processor Series confirms Intel’s
desire to improve both absolute performance and performance per watt
• CERN and W-LCG will appreciate both– In particular, the HEPSPEC/W value– Now reaching 1 HEPSPEC/W which is 1.7x compared to previous
generation (Xeon X5670)
• A full openlab evaluation report will be published at launch time– http://www.cern.ch/openlab – The Xeon X5670 report is available since April 2010
8 March 2012