overview of hpc computer architecture...– big-compute (performance demand on massively...

34
9/24/12 NSF-ME-08-2012 1 1 Overview of HPC Computer Architecture: A Long March Toward Exa-Scale Computing and Beyond \ August 16, 2012 Guang R. Gao ACM Fellow and IEEE Fellow Distinguished Professor, Dept. of ECE University of Delaware

Upload: others

Post on 18-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 1 1

Overview of HPC Computer Architecture:

A Long March Toward Exa-Scale Computing and Beyond \

August 16, 2012

Guang R. Gao

ACM Fellow and IEEE Fellow

Distinguished Professor, Dept. of ECE University of Delaware

Page 2: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 2 2

Toward A Codelet Based Execution Model and Its Memory Semantics

-- For Future Extreme-Scale Computing Systems

\

August 16, 2012

Guang R. Gao

ACM Fellow and IEEE Fellow Distinguished Professor, Dept. of ECE

University of Delaware

Page 3: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 3

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Memory semantics the codelet model •  Conclusions and Future Directions

3

Page 4: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

K (“KEI”) Computer •  "K” draws upon the Japanese word "Kei"

for 1016 •  3 times faster than Chinese Tianhe 1A •  8.162 Pflops Rmax, 8.777 Pflops Rpeak •  80,000 8-core 2GHz SPARC64 VIIIfx to

deliver a total of more than 640,000 processing cores

•  1 PB memory •  4th most energy-efficient system in the

500, with a performance-per-watt rating of 825 megaflops per Watt.

•  Tofu : A 6D Mesh/Torus Interconnect NSF-ME-08-2012 4 9/24/12

Page 5: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Tianhe-1A 2.566 Petaflops Rmax

DEPARTMENT OF COMPUTER SCIENCE @ LOUISIANA STATE UNIVERSITY 5

Page 6: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Current Big Themes in Supercomputing

•  Multi-core Many-core – Exa-Scale is on horizon

•  Heterogeneity and Accelerators •  Data-Intensive (big-data) •  Others ?

9/24/12 NSF-ME-08-2012 6

Page 7: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Challenges

•  Challenges: – Big-compute (performance demand on

massively parallelism) – Big-data (massive, irregular, unstructured data

need big analytics) – Big chips with architecture heterogeneity –  Energy efficiency and resiliency

9/24/12 NSF-ME-08-2012 7 7

Page 8: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

A Fundamental Challenge - Parallel Program Execution

Models

9/24/12 NSF-ME-08-2012 8

Page 9: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 9

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

9nn

Page 10: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

A Quiz: Have you heard the following terms ?

Actors (dataflow) ?

9/24/12 NSF-ME-08-2012 10

strand ?

fiber ? codelet ?

Page 11: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

What is a Program Execution Model?   Application Code   Software Packages   Program Libraries   Compilers   Utility Applications

(API) PXM

User Code

  Hardware   Runtime Code and Libraries   Operating System

System

Curtsey: JB Dennis, PEM-2, 4/72011

Page 12: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

NSF-ME-08-2012 12

CPU

Memory

Fine-Grain non-preemptive thread- The “hotel” model

Thread Unit

Executor Locus

Coarse-Grain vs. Fine-Grain Multithreading

A Pool Thread

CPU

Memory

Executor Locus

A Single Thread

Coarse-Grain thread- The family home model

Thread Unit

[Gao: invited talk at Fran Allen’s Retirement Workshop, 07/2002]

9/24/12

Page 13: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Execution Model API

Abstract Machine Models

Programming Environment Platforms

Users Users

Exe

cutio

n M

odel

Programming Models

Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 13

Page 14: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 14

Outline •  Background and motivation •  Program execution and abstract machine models •  Codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet concept and

SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

14nn

Page 15: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Execution Model API

Abstract Machin e Models

Programming Environment Platforms

Users Users

Exe

cutio

n M

odel

Programming Models

Execution Model and Abstract Machines 9/24/12 NSF-ME-08-2012 15

Page 16: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Abstract Machine Models May Be Heterogeneous!

NSF-ME-08-2012 16 9/24/12

Page 17: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

High-Level Programming API (MPI, Open MP, CnC, Xio, Chapel, etc.)

Software packages Program libraries Utility applications

Compilers Tools/SDK

API

Abstract Machine

Hardware Architecture

Programming Models/ Environment

Users

Users

Exe

cutio

n M

odel

Runtime System

Runtime System

Execution Model and Abstract Machines NSF-ME-08-2012 17 9/24/12

Page 18: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

EARTH Architecture

PE PE PE

EU

SU

Loca

l Mem

ory

Memory Bus

From RQ To EQ

RQ EQ

Inte

rcon

nect

Net

wor

knode

node

node

... ...

9/24/12 NSF-ME-08-2012 18

Page 19: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

The EARTH Multithreaded Execution Model (1993 – 200x)

NSF-ME-08-2012 19

fiber within a frame Aync. function invocation

A sync operation Invoke a threaded func

Two Level of Fine-Grain Threads: - threaded procedures - fibers

2 2 1 2

1 2 2 4

Signal Token

Total # signals

Arrived # signals

9/24/12 Fibers 2-level of threads

Page 20: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 20

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

20

Page 21: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 21

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Semantics of the codelet model •  Conclusions and Future Directions

21

Page 22: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 22

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM –  DOE X-Stack (2012-2015): Continue the codelet path

•  Semantics of Codelet Models •  Conclusions and Future Directions

22nn

Page 23: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object

Data Objects

Supports Massively Parallel Computation!

Courtesy: Prof. Jack Dennis, 2001

Page 24: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

The Codelet: A Fine-Grain Piece of Computing

Codelet

Result Object

Data Objects

This Looks Like Data Flow!!

Courtesy: Prof. Jack Dennis, 2001

Page 25: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Concept of Codelet (Feb. 4th, 2011)

-  Codelets are the principal scheduling quantum under our codelet based execution model. A codelet, once allocated and scheduled, will be kept usefully busy - since it is non-preemptive

-  The underline hardware architecture and system software (e.g. compiler, etc.) are optimized to ensure such non-preemption features can be productively utilized.

9/24/12 NSF-ME-08-2012 25

Page 26: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 26

Outline •  Background and motivation •  Program execution models •  Codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  Memory semantics of codelet models •  Conclusions and Future Directions

Page 27: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

What is A Shared Memory Execution Model?

NSF-ME-08-2012 27

Thread Model A set of rules for creating, destroying and managing threads

Memory Model Dictate the ordering of memory operations

Synchronization Model Provide a set of mechanisms to protect from data races

Execution Model

The Thread Abstract Machine 9/24/12

Page 28: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

“Memory Coherence” A Basic Assumption of SC-Derived

Memory Models

“…All writes to the same location are serialized in some order and are performed in that order with respect to any processor…”

[Gharacharloo Et Al 90]

9/24/12 NSF-ME-08-2012 28

Page 29: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Can We Break The Memory Coherence Barrier ?

9/24/12 NSF-ME-08-2012 29

No ?

Yes ?

Page 30: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

Four Key Question on Memory Models

•  What happens when two (or more) concurrent load/store operations happen (arrives) at the same memory location?

•  Answers ?

9/24/12 NSF-ME-08-2012 30

Page 31: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

A Conjecture

The LC (Location Consistency) memory model belongs to the group of memory models that iss weakest while still do not violate the causality constraint!

9/24/12 NSF-ME-08-2012 31

Page 32: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 32

Outline •  Background and motivation •  Program execution models •  Evolution of codelet based execution models

–  The EARTH project (1994 – 2004) –  IBM Cyclops-64 project (2004 – 2010+ ): The TNT

Experience –  Intel-led UHPC/Runnemede (2010 – 2012): The codelet

concept and SWARM

•  The memory semantics of codelets •  Conclusions and Future Directions

Page 33: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

DOE X-Stack Project July 2012 – June 2015

Traleika Glacier

(Team Lead: Intel Universities: UIUC, UD, UCSD, Rice U)

Other Industries (ETI, Reservoir) DOE Labs: (PNNL, Sandia, ORNL, ..)

9/24/12 NSF-ME-08-2012 33

Page 34: Overview of HPC Computer Architecture...– Big-compute (performance demand on massively parallelism) – Big-data (massive, irregular, unstructured data need big analytics) – Big

9/24/12 NSF-ME-08-2012 34

Acknowledgements

•  Our Sponsors •  Members of CAPSL •  Members of ETI •  Other Collaborators (T. Sterling, V. Sarkar, etc.) •  My Mentor - Prof. Jack B. Dennis •  My Host