the hpc challenge benchmark: a candidate for replacing ... · computing in 1974 ♦ high...
TRANSCRIPT
![Page 1: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/1.jpg)
1
The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate for Replacing LINPACK in the TOP500?LINPACK in the TOP500?
Jack Dongarra University of Tennessee
andOak Ridge National Laboratory
2007 SPEC Benchmark WorkshopJanuary 21, 2007Radisson Hotel Austin North
![Page 2: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/2.jpg)
2
Outline Outline -- The HPC Challenge Benchmark: The HPC Challenge Benchmark: A Candidate for Replacing A Candidate for Replacing LinpackLinpack in the TOP500?in the TOP500?
♦ Look at LINPACK
♦ Brief discussion of DARPA HPCS Program
♦ HPC Challenge Benchmark
♦ Answer the Question
![Page 3: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/3.jpg)
3
What Is LINPACK?What Is LINPACK?
♦ Most people think LINPACK is a benchmark.
♦ LINPACK is a package of mathematical software for solving problems in linear algebra, mainly dense linear systems of linear equations.
♦ The project had its origins in 1974
♦ LINPACK: “LINear algebra PACKage”Written in Fortran 66
![Page 4: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/4.jpg)
4
Computing in 1974Computing in 1974♦ High Performance Computers:
IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030
♦ Fortran 66
♦ Run efficiently
♦ BLAS (Level 1)Vector operations
♦ Trying to achieve software portability
♦ LINPACK package was released in 1979About the time of the Cray 1
![Page 5: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/5.jpg)
5
The Accidental The Accidental BenchmarkerBenchmarker♦ Appendix B of the Linpack Users’ Guide
Designed to help users extrapolate execution time for Linpack software package
♦ First benchmark report from 1977; Cray 1 to DEC PDP-10
Dense matricesLinear systemsLeast squares problemsSingular values
![Page 6: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/6.jpg)
6
LINPACK Benchmark?LINPACK Benchmark?♦ The LINPACK Benchmark is a measure of a computer’s
floating-point rate of execution for solving Ax=b. It is determined by running a computer program that solves a dense system of linear equations.
♦ Information is collected and available in the LINPACK Benchmark Report.
♦ Over the years the characteristics of the benchmark has changed a bit.
In fact, there are three benchmarks included in the LinpackBenchmark report.
♦ LINPACK Benchmark since 1977Dense linear system solve with LU factorization using partial pivotingOperation count is: 2/3 n3 + O(n2)Benchmark Measure: MFlop/sOriginal benchmark measures the execution rate for a Fortran program on a matrix of size 100x100.
![Page 7: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/7.jpg)
7
For For LinpackLinpack with n = 100with n = 100♦ Not allowed to touch the code.♦ Only set the optimization in the compiler and run.♦ Provide historical look at computing♦ Table 1 of the report (52 pages of 95 page report)
http://www.netlib.org/benchmark/performance.pdf
![Page 8: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/8.jpg)
8
Linpack Benchmark Over TimeLinpack Benchmark Over Time♦ In the beginning there was only the Linpack 100 Benchmark (1977)
n=100 (80KB); size that would fit in all the machinesFortran; 64 bit floating point arithmetic No hand optimization (only compiler options); source code available
♦ Linpack 1000 (1986)n=1000 (8MB); wanted to see higher performance levelsAny language; 64 bit floating point arithmetic Hand optimization OK
♦ Linpack Table 3 (Highly Parallel Computing - 1991) (Top500; 1993)Any size (n as large as you can; n=106; 8TB; ~6 hours); Any language; 64 bit floating point arithmetic Hand optimization OK
Strassen’s method not allowed (confuses the operation count and rate)Reference implementation available
♦ In all cases results are verified by looking at:♦ Operations count for factorization ; solve
|| || (1)|| || || ||
Ax b OA x n ε
−=
3 22 1
3 2n n−
22n
![Page 9: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/9.jpg)
9
Motivation for Additional BenchmarksMotivation for Additional Benchmarks
♦ From Linpack Benchmark and Top500: “no single number can reflect overall performance”
♦ Clearly need something more than Linpack
♦ HPC Challenge BenchmarkTest suite stresses not only the processors, but the memory system and the interconnect. The real utility of the HPCC benchmarks are that architectures can be described with a wider range of metrics than just Flop/s from Linpack.
Linpack Benchmark♦ Good
One numberSimple to define & easy to rankAllows problem size to change with machine and over timeStresses the system with a run of a few hours
♦ BadEmphasizes only “peak” CPU speed and number of CPUsDoes not stress local bandwidthDoes not stress the networkDoes not test gather/scatterIgnores Amdahl’s Law (Only does weak scaling)
♦ UglyMachoFlopsBenchmarketeering hype
![Page 10: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/10.jpg)
10
At The Time The At The Time The LinpackLinpack Benchmark Was Benchmark Was Created Created ……
♦ If we think about computing in late 70’s♦ Perhaps the LINPACK benchmark was a
reasonable thing to use. ♦ Memory wall, not so much a wall but a step.♦ In the 70’s, things were more in balance
The memory kept pace with the CPUn cycles to execute an instruction, n cycles to bring in a word from memory
♦ Showed compiler optimization♦ Today provides a historical base of data
![Page 11: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/11.jpg)
11
Many ChangesMany Changes
♦ Many changes in our hardware over the past 30 years
Superscalar, Vector, Distributed Memory, Shared Memory, Multicore, …
♦ While there has been some changes to the Linpack Benchmark not all of them reflect the advances made in the hardware.
♦ Today’s memory hierarchy is much more complicated.
0
100
200
300
400
500
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Const.
Cluster
MPP
SMP
SIMD
Single Proc.
Top500 Systems/Architectures
![Page 12: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/12.jpg)
High Productivity Computing SystemsHigh Productivity Computing SystemsGoal:
Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010; started in 2002)
Goal:Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010; started in 2002)
Fill the Critical Technology and Capability GapToday (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)
Fill the Critical Technology and Capability GapToday (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)
Applications:Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology
Analysis &
Analysis &
Assessment
Assessment
PerformanceCharacterization
& Prediction
SystemArchitecture
SoftwareTechnology
HardwareTechnology
Programming Models
Industry R&D
Industry R&D
HPCS Program Focus Areas
Focus on:Real (not peak) performance of critical national security applications
Intelligence/surveillanceReconnaissanceCryptanalysisWeapons analysisAirborne contaminant modelingBiotechnology
Programmability: reduce cost and time of developing applicationsSoftware portability and system robustness
![Page 13: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/13.jpg)
13
team
HPCS RoadmapHPCS Roadmap
Phase 1$20M (2002)
Phase 2$170M (2003-2005)
Phase 3(2006-2010)
~$250M each
ConceptStudy
AdvancedDesign &Prototypes
Full ScaleDevelopment
TBD
New EvaluationFramework
Test EvaluationFramework
team
5 vendors in phase 1; 3 vendors in phase 2; 1+ vendors in phase 3MIT Lincoln Laboratory leading measurement and evaluation team
Validated ProcurementEvaluation Methodology
Today
Petascale Systems
team
![Page 14: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/14.jpg)
14
Predicted Performance Levels for Top500Predicted Performance Levels for Top500
6,2674,6483,447
557405294
594433
5.462.86 3.95
1
10
100
1,000
10,000
100,000
Jun-0
3Dec-
03Ju
n-04
Dec-04
Jun-0
5Dec-
05Ju
n-06
Dec-06
Jun-0
7Dec-
07Ju
n-08
Dec-08
Jun-0
9
TFlo
p/s
Linp
ack
Total#1
#10
#500
TotalPred. #1
Pred. #10
Pred. #500
![Page 15: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/15.jpg)
15
A A PetaFlopPetaFlop Computer by the End of the Computer by the End of the DecadeDecade
♦ At least 10 Companies developing a Petaflop system in the next decade.
CrayIBMSunDawningGalacticLenovoHitachi NECFujitsuBull
Japanese Japanese ““Life SimulatorLife Simulator”” (10 (10 Pflop/sPflop/s))KeisokuKeisoku project $1B 7 yearsproject $1B 7 years
} Chinese Chinese CompaniesCompanies
}
}
2+ Pflop/s Linpack6.5 PB/s data streaming BW 3.2 PB/s Bisection BW64,000 GUPS
![Page 16: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/16.jpg)
16
PetaFlopPetaFlop Computers in 2 Years!Computers in 2 Years!
♦ Oak Ridge National Lab Planned for 4th Quarter 2008 (1 Pflop/s peak)From Cray’s XT familyUse quad core from AMD
23,936 ChipsEach chip is a quad core-processor (95,744 processors)Each processor does 4 flops/cycleCycle time of 2.8 GHz
Hypercube connectivityInterconnect based on Cray XT technology6MW, 136 cabinets
♦ Los Alamos National LabRoadrunner (2.4 Pflop/s peak)Use IBM Cell and AMD processors75,000 cores
![Page 17: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/17.jpg)
17
HPC Challenge GoalsHPC Challenge Goals♦ To examine the performance of HPC architectures
using kernels with more challenging memory access patterns than the Linpack Benchmark
The Linpack benchmark works well on all architectures ―even cache-based, distributed memory multiprocessors due to1. Extensive memory reuse 2. Scalable with respect to the amount of computation3. Scalable with respect to the communication volume4. Extensive optimization of the software
♦ To complement the Top500 list♦ Stress CPU, memory system, interconnect♦ Allow for optimizations
Record effort needed for tuningBase run requires MPI and BLAS
♦ Provide verification & archiving of results
![Page 18: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/18.jpg)
Tests on Single Processor and System
● Local - only a single processor (core) is performing computations.
● Embarrassingly Parallel - each processor (core) in the entire system is performing computations but they do no communicate with each other explicitly.
● Global - all processors in the system are performing computations and they explicitly communicate with each other.
![Page 19: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/19.jpg)
19
HPC Challenge Benchmark HPC Challenge Benchmark Consists of basically 7 benchmarks;
Think of it as a framework or harness for adding benchmarks of interest.
1. LINPACK (HPL) ― MPI Global (Ax = b)
2. STREAM ― Local; single CPU *STREAM ― Embarrassingly parallel
3. PTRANS (A A + BT) ― MPI Global
4. RandomAccess ― Local; single CPU *RandomAccess ― Embarrassingly parallelRandomAccess ― MPI Global
5. BW and Latency – MPI
6. FFT - Global, single CPU, and EP
7. Matrix Multiply – single CPU and EP
proci prock
Random integerread; update; & write
![Page 20: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/20.jpg)
April 18, 2006 Oak Ridge National Lab, CSM/FT 20
HPCS Performance TargetsHPCS Performance Targets
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
Cache(s)Cache(s)
Local MemoryLocal Memory
RegistersRegisters
Remote MemoryRemote Memory
DiskDisk
TapeTape
Instructions
Memory HierarchyMemory Hierarchy
Operands
Lines Blocks
Messages
Pages
![Page 21: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/21.jpg)
April 18, 2006 Oak Ridge National Lab, CSM/FT 21
HPCS Performance TargetsHPCS Performance Targets
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
● LINPACK: linear system solveAx = b
Cache(s)Cache(s)
Local MemoryLocal Memory
RegistersRegisters
Remote MemoryRemote Memory
DiskDisk
TapeTape
Instructions
Memory HierarchyMemory Hierarchy
Operands
Lines Blocks
Messages
Pages
![Page 22: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/22.jpg)
April 18, 2006 Oak Ridge National Lab, CSM/FT 22
HPCS Performance TargetsHPCS Performance Targets
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
HPC ChallengeHPC Challenge
● LINPACK: linear system solveAx = b
● STREAM: vector operationsA = B + s * C
● FFT: 1D Fast Fourier TransformZ = fft(X)
● RandomAccess: integer updateT[i] = XOR( T[i], rand)
Cache(s)Cache(s)
Local MemoryLocal Memory
RegistersRegisters
Remote MemoryRemote Memory
DiskDisk
TapeTape
Instructions
Memory HierarchyMemory Hierarchy
Operands
Lines Blocks
Messages
Pages
![Page 23: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/23.jpg)
April 18, 2006 Oak Ridge National Lab, CSM/FT 23
HPCS Performance TargetsHPCS Performance Targets
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
● HPCC was developed by HPCS to assist in testing new HEC systems● Each benchmark focuses on a different part of the memory hierarchy● HPCS performance targets attempt to
� Flatten the memory hierarchy� Improve real application performance� Make programming easier
HPC ChallengeHPC Challenge Performance TargetsPerformance Targets
● LINPACK: linear system solveAx = b
● STREAM: vector operationsA = B + s * C
● FFT: 1D Fast Fourier TransformZ = fft(X)
● RandomAccess: integer updateT[i] = XOR( T[i], rand)
Cache(s)Cache(s)
Local MemoryLocal Memory
RegistersRegisters
Remote MemoryRemote Memory
DiskDisk
TapeTape
Instructions
Memory HierarchyMemory Hierarchy
Operands
Lines Blocks
Messages
Pages
Max Relative8x
40x200x
64000 GUPS 2000x
2 Pflop/s6.5 Pbyte/s0.5 Pflop/s
![Page 24: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/24.jpg)
Computational Resources and HPC Challenge Benchmarks
Computational resources
Computational resources
CPUcomputational
speed
Memorybandwidth
NodeInterconnect
bandwidth
![Page 25: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/25.jpg)
Computational resources
Computational resources
CPUcomputational
speed
Memorybandwidth
NodeInterconnect
bandwidth
HPLMatrix Multiply
STREAMRandom & Natural Ring Bandwidth & Latency
Computational Resources and HPC Challenge Benchmarks
PTrans, FFT, Random Access
![Page 26: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/26.jpg)
26
How Does The Benchmarking Work?How Does The Benchmarking Work?♦ Single program to download and run
Simple input file similar to HPL input♦ Base Run and Optimization Run
Base run must be madeUser supplies MPI and the BLAS
Optimized run allowed to replace certain routinesUser specifies what was done
♦ Results upload via website (monitored)♦ html table and Excel spreadsheet generated with
performance resultsIntentionally we are not providing a single figure of merit (no over all ranking)
♦ Each run generates a record which contains 188 pieces of information from the benchmark run.
♦ Goal: no more than 2 X the time to execute HPL.
![Page 27: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/27.jpg)
27
http://icl.cs.utk.edu/hpcc/http://icl.cs.utk.edu/hpcc/ webweb
![Page 28: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/28.jpg)
28
![Page 29: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/29.jpg)
29
![Page 30: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/30.jpg)
30
HPCC HPCC KiviatKiviat Chart Chart
http://icl.cs.utk.edu/hpcc/
![Page 31: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/31.jpg)
31
![Page 32: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/32.jpg)
32
![Page 33: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/33.jpg)
33
Different Computers are Better at Different Different Computers are Better at Different Things, No Things, No ““FastestFastest”” Computer for All Computer for All ApsAps
![Page 34: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/34.jpg)
34
HPCC Awards Info and RulesHPCC Awards Info and Rules
Class 1 (Objective)♦ Performance
1.G-HPL $5002.G-RandomAccess $5003.EP-STREAM system $5004.G-FFT $500
♦ Must be full submissions through the HPCC database
Class 2 (Subjective)♦ Productivity (Elegant
Implementation) Implement at least two tests from Class 1$1500 (may be split)Deadline:
October 15, 2007Select 3 as finalists
♦ This award is weighted 50% on performance and 50% on code elegance, clarity, and size.
♦ Submissions format flexible
Winners (in both classes) will be announced at SC07 HPCC BOF
Winners (in both classes) will be announced at SC07 HPCC BOFSponsored by:
![Page 35: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/35.jpg)
35
Class 2 Awards Class 2 Awards
♦ Subjective ♦ Productivity (Elegant Implementation)
Implement at least two tests from Class 1$1500 (may be split)Deadline:
October 15, 2007Select 5 as finalists
♦ Most "elegant" implementation with special emphasis being placed on:
♦ Global HPL, Global RandomAccess, EP STREAM (Triad) per system and Global FFT.
♦ This award is weighted 50% on performance and 50% on code elegance, clarity, and size.
![Page 36: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/36.jpg)
36
5 Finalists for Class 2 5 Finalists for Class 2 –– November 2005November 2005
♦ Cleve Moler, MathworksEnvironment: Parallel Matlab PrototypeSystem: 4 Processor Opteron
♦ Calin Caseval, C. Bartin, G. Almasi, Y. Zheng, M. Farreras, P. Luk, and R. Mak, IBM
Environment: UPCSystem: Blue Gene L
♦ Bradley Kuszmaul, MITEnvironment: CilkSystem: 4-processor 1.4Ghz AMD Opteron 840 with 16GiB of memory
♦ Nathan Wichman, CrayEnvironment: UPCSystem: Cray X1E (ORNL)
♦ Petr Konency, Simon Kahan, and John Feo, Cray
Environment: C + MTA pragmasSystem: Cray MTA2
Winners!
![Page 37: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/37.jpg)
37
2006 Competitors2006 Competitors♦ Some Notable Class 1 Competitors
SGI (NASA)Columbia
10,000 CPUs
NEC (HLRS)SX-8 512 CPUs
IBM (DOE LLNL)BG/L 131,072 CPUPurple 10,240 CPU
Cray (DOE ORNL)X1 1008 CPUs
Jaguar XT3 5200 CPUs
DELL (MIT LL)300 CPUs
LLGrid
♦ Class 2: 6 FinalistsCalin Cascaval (IBM) UPC on Blue Gene/L [Current Language]Bradley Kuszmaul (MIT CSAIL) Cilk on SGI Altix [Current Language]Cleve Moler (Mathworks) Parallel Matlab on a cluster [Current Language]Brad Chamberlain (Cray) Chapel [Research Language]Vivek Sarkar (IBM) X10 [Research Language]Vadim Gurev (St. Petersburg, Russia) MCSharp [Student Submission]
Cray (DOD ERDC)XT3 4096 CPUs
Sapphire
Cray(Sandia)
XT3 11,648 CPURed Storm
![Page 38: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/38.jpg)
38
The Following are the Winners of the 2006 The Following are the Winners of the 2006 HPC Challenge Class 1 AwardsHPC Challenge Class 1 Awards
![Page 39: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/39.jpg)
39
The Following are the Winners of the 2006 The Following are the Winners of the 2006 HPC Challenge Class 2 AwardsHPC Challenge Class 2 Awards
![Page 40: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/40.jpg)
40
2006 Programmability2006 ProgrammabilitySpeedup vs Relative Code Size
10-1 100 1010-3
10-2
10-1
100
101
102
103
Relative Code Size
Spee
dup
“All too often”Java, Matlab, Python, etc.
TraditionalHPC
Ref♦Class 2 Award
50% Performance50% Elegance
♦21 Codes submitted by 6 teams
♦Speedup relative to serial C on workstation
♦Code size relative to serial C
♦Class 2 Award50% Performance50% Elegance
♦21 Codes submitted by 6 teams
♦Speedup relative to serial C on workstation
♦Code size relative to serial C
![Page 41: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/41.jpg)
41
2006 Programming Results Summary2006 Programming Results Summary
♦ 21 of 21 smaller than C+MPI Ref; 20 smaller than serial♦ 15 of 21 faster than serial; 19 in HPCS quadrant♦ 21 of 21 smaller than C+MPI Ref; 20 smaller than serial♦ 15 of 21 faster than serial; 19 in HPCS quadrant
![Page 42: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/42.jpg)
42
Top500 and HPC Challenge RankingsTop500 and HPC Challenge Rankings♦ It should be clear that the HPL (Linpack
Benchmark - Top500) is a relatively poor predictor of overall machine performance.
♦ For a given set of applications such as:Calculations on unstructured gridsEffects of strong shock wavesAb-initio quantum chemistryOcean general circulation modelCFD calculations w/multi-resolution gridsWeather forecasting
♦ There should be a different mix of components used to help predict the system performance.
![Page 43: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/43.jpg)
43
Will the Top500 List Go Away?Will the Top500 List Go Away?
♦ The Top500 continues to serve a valuable role in high performance computing.
Historical basisPresents statistics on deploymentProjection on where things are goingImpartial viewIts simple to understandIts fun
♦ The Top500 will continue to play a role
![Page 44: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/44.jpg)
44
No Single Number for HPCC?No Single Number for HPCC?♦ Of course everyone wants a single number.♦ With HPCC Benchmark you get 188 numbers per system run!♦ Many have suggested weighting the seven tests in HPCC to come up
with a single number.LINPACK, MatMul, FFT, Stream, RandomAccess, Ptranspose, bandwidth & latency
♦ But your application is different than mine, so weights are dependent on the application.
♦ Score = W1*LINPACK + W2*MM + W3*FFT+ W4*Stream +
W5*RA + W6*Ptrans + W7*BW/Lat
♦ Problem is that the weights depend on your job mix.
♦ So it make sense to have a set of weights for each user or site.
![Page 45: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/45.jpg)
45
Tools Needed to Help With Performance Tools Needed to Help With Performance ♦ A tools that analyzed an application perhaps
statically and/or dynamically.♦ Output a set of weights for various sections of
the application [ W1, W2, W3, W4, W5, W6, W7, W8 ]The tool would also point to places where we were missing a benchmarking component for the mapping.
♦ Think of the benchmark components as a basis set for scientific applications
♦ A specific application has a set of "coefficients" of the basis set.
♦ Score = W1*HPL + W2*MM + W3*FFT+ W4*Stream +
W5*RA + W6*Ptrans + W7*BW/Lat + …
![Page 46: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/46.jpg)
46
Future DirectionsFuture Directions♦ Looking at reducing execution time♦ Constructing a framework for benchmarks♦ Developing machine signatures♦ Plans are to expand the benchmark
collectionSparse matrix operationsI/OSmith-Waterman (sequence alignment)
♦ Port to new systems♦ Provide more implementations
Languages (Fortran, UPC, Co-Array)Environments Paradigms
![Page 47: The HPC Challenge Benchmark: A Candidate for Replacing ... · Computing in 1974 ♦ High Performance Computers: ¾IBM 370/195, CDC 7600, Univac 1110, DEC PDP-10, Honeywell 6030 ♦](https://reader036.vdocuments.site/reader036/viewer/2022070807/5f05a4977e708231d413fb12/html5/thumbnails/47.jpg)
Collaborators• HPC Challenge
– Piotr Łuszczek, U of Tennessee– David Bailey, NERSC/LBL– Jeremy Kepner, MIT Lincoln Lab– David Koester, MITRE– Bob Lucas, ISI/USC– Rusty Lusk, ANL– John McCalpin, IBM, Austin– Rolf Rabenseifner, HLRS
Stuttgart– Daisuke Takahashi, Tsukuba,
Japan
http://icl.cs.utk.edu/hpcc/
• Top500– Hans Meuer, Prometeus– Erich Strohmaier, LBNL/NERSC– Horst Simon, LBNL/NERSC