1 scidac high-end computer system performance: science and engineering jack dongarra innovative...

19
1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennessee http://www.cs.utk.edu/~dongarra/ http://www.cs.utk.edu/~dongarra/

Upload: dora-randall

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

1

SciDAC

High-End Computer System Performance:

Science and Engineering

Jack DongarraInnovative Computing LaboratoryUniversity of Tennessee

http://www.cs.utk.edu/~dongarra/http://www.cs.utk.edu/~dongarra/

Page 2: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

2

Four Components for the Four Components for the University of Tennessee’s University of Tennessee’s

Performance Capturing Tools PAPI

Self adapting numerical software Automatic performance enhancementSANS/AEOS/ATLAS

Performance repository for apps, kernels, machines, etcNETLIB, Repository in a Box (RIB)

Modeling, predictability

Page 3: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

3

Tools for Tools for Performance EvaluationPerformance Evaluation

Timing and performance evaluation has been an artResolution of the clock Issues about cache effectsDifferent systemsCan be cumbersome and inefficient with

traditional tools Situation about to change

Today’s processors have internal counters

Page 4: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

4

Performance CountersPerformance Counters Almost all high performance processors

include hardware performance counters. Some are easy to access, others not

available to users. On most platforms the APIs, if they exist,

are not appropriate for the end user or well documented.

Existing performance counter APIs Compaq Alpha EV 6 & 6/7 SGI MIPS R10000 IBM Power Series CRAY T3E Sun Solaris Pentium Linux and Windows

IA-64 HP-PA RISC Hitachi Fujitsu NEC

Page 5: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

5

OverviewOverview ofof PAPI PAPI

Performance Application Programming Interface

The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors

Page 6: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

6

Performance Data from PAPIPerformance Data from PAPI Execution Rate (MIPS, Flop/s) Bandwidth Utilization

Main Memory L2 cache L1 cache

Cache Miss Statistics: Icache, Dcache, and L2 cache

TLB misses Mispredicted Branches Instruction Mix (FP, branch, LD/ST, other) Load/store instruction issue rate

Page 7: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

7

ImplementationImplementation

Counters exist as a small set of registers that count events.

PAPI provides three interfaces to the underlying counter hardware: 1. The low level interface manages

hardware events in user defined groups called EventSet.

2. The high level interface simply provides the ability to start, stop and read the counters for a specified list of events.

3. Graphical tools to visualize information.

Page 8: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

8

PAPI - Supported ProcessorsPAPI - Supported Processors Intel Pentium,Pro,II,III,4

Linux 2.4, 2.2, 2.0 and perf kernel patch IBM Power 3,604,604e

For AIX 4.3 and pmtoolkit (in 4.3.4 available) ([email protected])

Sun UltraSparc I, II, & IIISolaris 2.8

MIPS R10K, R12K AMD Athlon

Linux 2.4 and perf kernel patch Cray T3E, SV1, SV2 Soon: Windows 2K, Compaq Alpha EV6 & 67 and Intel IA-64

Page 9: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

9

Go To DemoGo To Demo

Page 10: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

10

PAPI’s Parallel InterfacePAPI’s Parallel Interface

Page 11: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

11

PAPI DevelopmentPAPI Development Extensions to PAPI to support collection and analysis of

hardware performance counter data in the context of shared and distributed memory parallel programs Allowing for straightforward instrumentation of

multithreaded and multiprocessor applications. Tools will include graphical tools extended with dynamic

instrumentation capabilities.  Framework for using Dyninst with parallel programs,

the Free Probe Class Server (FPCS) and IBM’s Dynamic Probe Class Library (DPCL)

Port PAPI to Compaq Alpha and HP machines Summary information on problem spots within

applications Integration with other tools, SvPablo, Dyninst, etc Help with setting up PAPI at various sites.

Page 12: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

12

Repository DevelopmentRepository Development Repository of Tools and Data on

Performance Evaluation A network-based catalog that will serve

as a “road map” to important Performance Evaluation enabling technologies

A methodology for evaluation and measurement of the success of the tools.

SciDAC outreach: Start a community effort for the collection and dissemination of performance data

Page 13: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

13

Self-Adapting Numerical Self-Adapting Numerical Software (SANS)Software (SANS)

Today’s processors can achieve high-performance, but this requires extensive machine-specific hand tuning.

Simple operations like Matrix-Vector ops require many man-hours / platform• Software lags far behind hardware introduction• Only done if financial incentive is there

Compilers not up to optimization challenge Hardware, compilers, and software have a large design space

w/many parameters Blocking sizes, loop nesting permutations, loop unrolling depths,

software pipelining strategies, register allocations, and instruction schedules.

Complicated interactions with the increasingly sophisticated micro-architectures of new microprocessors.

Need for quick/dynamic deployment of optimized routines. ATLAS - Automatic Tuned Linear Algebra Software

Page 14: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

14

SANS ExtensionsSANS Extensions

BLAS Sparse matrix operations Message passing Algorithm selection at a higher

level

Page 15: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

15

Repository In a Box (RIB)Repository In a Box (RIB)

Metadata objects are stored in repositories.

A repository automatically generates a web site for displaying customizable views of its metadata - search, browse, join, etc.

Metadata objects are also made available to network applications via the RIB API.

Page 16: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

16

Repository InteroperationRepository Interoperation

My Repository

OurVirtual

Repository

Metadata objects

Your Repository

Metadata objects

HTMLCatalog

Page 17: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

17

Tools IntegrationTools Integration

PAPI, Dyninst, SVPablo Intelligent Adaptation

Rose and SANS (ATLAS) Repository-in-a-Box effort

provides a toolkit for building and maintaining meta-data repositories

Page 18: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

18

Interaction with Other EffortsInteraction with Other Efforts SciDAC - TOPS

David Keyes, ICASE/ODU/LLNL SciDAC - Astrophysics

Tony Mezzacappa, ORNL DOE - Cross-Platform Infrastructure

for Scalable Runtime Application Performance AnalysisBart Miller, U Wisc Jeff H., U of Maryland

Page 19: 1 SciDAC High-End Computer System Performance: Science and Engineering Jack Dongarra Innovative Computing Laboratory University of Tennesseedongarra

19

High-End Computer System Performance:High-End Computer System Performance:Science and EngineeringScience and Engineering

Activities for UTennessee Performance Capturing Tools

PAPIAutomatic performance

enhancementSANS/AEOS/ATLAS

Performance repository for apps, kernels, machines, etcNETLIB, RIB

Modeling, predictability