the sc14 test of time award. about the test of time award the “test of time” award recognizes an...

30
The SC14 Test of Time Award

Upload: benjamin-dalton

Post on 25-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

The SC14 Test of Time Award

About the Test of Time Award• The “Test of Time” award recognizes an outstanding paper

from a past SC Conference that has deeply influenced the HPC discipline. It recognizes the historical impact of authors and the clear expression that the paper has changed HPC trends. The award is also an incentive for researchers and students to send their best work to SC and a tool to understand why and how results last in the HPC discipline.

• Eligible papers are those published in the SC Proceedings between 10 and 25 years ago.

Brief History of the Award• The Test of Time Award was established and first

awarded at the SC13 Conference for the conference’s 25th anniversary.

• The first winner was William Pugh, for “The Omega test: a fast and practical integer programming algorithm for dependence analysis,” Proc. SC91.

The ToTA Committee for SC14• Ewing Lusk, co-chair• Katherine Yelick, co-chair• Franck Cappello• Michael Heroux• Jeffrey Hollingsworth• Lennart Johnsson• Ken Miura• Leonid Oliker• Vivak Sarkar• Rob Shreiber• Mateo Valero

The SC14 Test of Time Award Winners

• Bruce Hendrickson and Rob Leland, for “A multilevel algorithm for partitioning graphs,” Proc. SC95.

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Graphs and HPC in the '90s:A Distant Mirror or Merely Distant?

Bruce HendricksonSenior Manager for Extreme-Scale Computing

Sandia National Laboratories, Albuquerque, NM

University of New Mexico, Computer Science Dept.

Robert LelandVice President for Research and CTO

Sandia National Laboratories, Albuquerque, NM

The REAL Test of Time!

Outline

• Context:– HPC in the early 90s

• Content:– Multilevel graph partitioning

• Thread 1:– Direct Impact

• Thread 2:– Combinatorial Scientific Computing

• Thread 3:– Abstractions

• Conclusions

Merely Distant?

• “Scaled Speedup” concept introduced in ’87– by Gustafson, Montry & Benner at Sandia

• Eugene Brooks’ “Attack of the Killer Micros” talk at SC’90

• Draft MPI standard presented at SC’93

• First Top 500 list in 1993– Of top 10 machines, only one had as many as 1024 processors

• Four were vector machines with fewer than 20 processors

• 1993 Gordon Bell Prize– Solving Boltzmann’s Equation on a 1024 processor CM5– 60 Gflops

A Distant Mirror?

• Parallelism was replacing vectors for HPC, but the details

weren’t at all clear

• Technology and economic drivers were understood, but

multiple visions were competing for the future– Remember Thinking Machines?, NCube?, Kendall Square?

• The community was messily groping towards clarity about

the right fundamental perspectives and questions for

massive parallelism

Computational Physics

Chemistry

EngineeringFluid Dynamics

HPC in the Early 90’s

Parallelism ExposedNew Algorithmic Challenges

• Efficient collective communication operations• Bulk-synchronous processing• Load balancing

• Horst Simon had proposed a graph partitioning model for load balancing for mesh-based computational science problems

– Vertices are computations. Edges encode data dependencies.– Cut few edges to evenly divide the vertices– Model is flawed, but broadly applicable– “All models are wrong, but some are useful.” – George Box

• Building on work with Alex Pothen, Horst championed spectral partitioning – using an eigenvector of a matrix to partition graph

– Intuition from structural analysis

Multilevel Graph Partitioning

• Eigenvectors are expensive to compute, so Horst and Steve

Barnard devised a multigrid-based algorithm– We couldn’t improve on their method

• We hit upon the idea of adapting the multi-level concept to refine

partitions rather than refining numerical values– Used simple graph notions like matching and edge contraction– Discrete algorithm techniques from computer science– Popularized by our Chaco software

Contract Partition Expand & Refine

An Idea Whose Time Had Come

• Graph partitioning is used as an abstraction in multiple domains

– Natural concept in many divide-and-conquer settings

• Researchers in two other communities independently proposed essentially the same algorithm at about the same time.

– Bui and Jones for sparse matrix reordering– Cong and Smith for VLSI placement

Thread 1: Direct Impact

• Excellent cost/performance tradeoff made multilevel

partitioning the algorithm-of-choice. Embraced and

enhanced by many others– Metis (Kumar & Karypis), JOSTLE (Walshaw), SCOTCH

(Pellegrini) and nearly all subsequent tools.

• Better load balancing abstraction proposed by Umit

Çatalyürek and Cevdet Aykanat in late 90’s using a

hypergraph model

• Longevity of utility enabled by a remarkable 20 years of

stability in the way we design and program HPC machines– 6 orders of magnitude improvement in HPC performance.

A Virtuous Circle…

Architectures

ProgrammingModels

Algorithms

Software

Commodity Clusters

ExplicitMessagePassing

Bulk SynchronousParallel

MPI

… is Coming to an End

Moore’s Law continues Transistor count still doubles every 24 months

Dennard scaling stalls – key parameters flatline: Voltage Clock Speed Power Performance/clock

Thread 2:Combinatorial Scientific Computing

• Graph algorithms play important niche roles in many areas

of computational science– Parallelism, Sparse matrices, Multigrid, Mesh generation,

Computational biology, Chemistry, Statistical physics, etc.

• Alex Pothen and I helped stand up a community on this

theme – Combinatorial Scientific Computing

• The community is thriving– 6 SIAM workshops– Several journal special issues

• Discrete algorithms have become widely recognized as

playing an important role in computational science

Thread 3:Abstractions

• Graph partitioning is an abstraction to simplify thinking about

algorithm / machine interplay– Supports performance portability across machines– Imperfectly but usefully represents both algorithm and machine

• Future machines are vastly more complex (and still unknown)– Heterogeneous nodes, more prone to errors, complex memory

hierarchies, etc.

• How do we shield application developers from this complexity!?– Need good abstractions at multiple layers!– Note: these could be software interfaces, but could also be

conceptual instead

Needed Abstractions

• Simplified machine model for programmers– Simple interface for managing (or hiding) node heterogeneity– Managing resilience– Dealing with complex memory hierarchies

• Performance portability across diverse architectures

• These need to intersect in a natural way with our

application / library / runtime software stack

Promising Abstractions

• Task-based programming models– E.g. Charm++, Legion, Uintah, PaRSEC– Create many more tasks than processors– Schedule tasks at runtime– Allows for performance portability of high-level code

• Kokkos memory abstractions– Polymorphic Multidimensional Arrays– Decouple array layout and what memory space from algorithm– Match layout to architecture without modifying algorithm’s

implementation– Supports performance portability across different architectures– Employs template metaprogramming (another CS contribution)– By Carter Edwards and others at Sandia Labs

Multicore

DisruptiveProgrammingModels

More ComplexApplications

What Happens Next?

• Virtuous circle will not survive the coming disruptions in

high performance computing

• But existing codes cannot be allowed to die– Billions of dollars in investment in software

• Computer science will have to play an ever larger role

• New programming models, algorithms and abstractions will

be needed

Conclusions

• Everything has changed– C++ instead of Fortran– Unstructured instead of structured– Multi-physics instead of single physics– UQ and optimization instead of forward solution– Template metaprogramming instead of huh?

• Yet some key things are the same– Technology changes are forcing a major paradigm shift– We need to prepare for a future we can’t yet discern

• Computer science will play a pivotal role in the challenges ahead

• Objects in the mirror are closer than they appear!

Thanks

• Fred Howes and the DOE Office of Science Applied Math Program for funding this work

• Ed Barsis, Bill Camp & Dick Allen for a great research environment

• Collaborators from early 90’s not already mentioned:– Bob Benner, Karen Devine, John Gilbert, Mike Heath– Scott Hutchinson, John Lewis, Steve Plimpton– John Shadid, Ray Tuminaro, Courtenay Vaughan– David Womble