1 scidac high-end computer system performance: science and engineering jack dongarra innovative...
TRANSCRIPT
1
SciDAC
High-End Computer System Performance:
Science and Engineering
Jack DongarraInnovative Computing LaboratoryUniversity of Tennessee
http://www.cs.utk.edu/~dongarra/http://www.cs.utk.edu/~dongarra/
2
Four Components for the Four Components for the University of Tennessee’s University of Tennessee’s
Performance Capturing Tools PAPI
Self adapting numerical software Automatic performance enhancementSANS/AEOS/ATLAS
Performance repository for apps, kernels, machines, etcNETLIB, Repository in a Box (RIB)
Modeling, predictability
3
Tools for Tools for Performance EvaluationPerformance Evaluation
Timing and performance evaluation has been an artResolution of the clock Issues about cache effectsDifferent systemsCan be cumbersome and inefficient with
traditional tools Situation about to change
Today’s processors have internal counters
4
Performance CountersPerformance Counters Almost all high performance processors
include hardware performance counters. Some are easy to access, others not
available to users. On most platforms the APIs, if they exist,
are not appropriate for the end user or well documented.
Existing performance counter APIs Compaq Alpha EV 6 & 6/7 SGI MIPS R10000 IBM Power Series CRAY T3E Sun Solaris Pentium Linux and Windows
IA-64 HP-PA RISC Hitachi Fujitsu NEC
5
OverviewOverview ofof PAPI PAPI
Performance Application Programming Interface
The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors
6
Performance Data from PAPIPerformance Data from PAPI Execution Rate (MIPS, Flop/s) Bandwidth Utilization
Main Memory L2 cache L1 cache
Cache Miss Statistics: Icache, Dcache, and L2 cache
TLB misses Mispredicted Branches Instruction Mix (FP, branch, LD/ST, other) Load/store instruction issue rate
7
ImplementationImplementation
Counters exist as a small set of registers that count events.
PAPI provides three interfaces to the underlying counter hardware: 1. The low level interface manages
hardware events in user defined groups called EventSet.
2. The high level interface simply provides the ability to start, stop and read the counters for a specified list of events.
3. Graphical tools to visualize information.
8
PAPI - Supported ProcessorsPAPI - Supported Processors Intel Pentium,Pro,II,III,4
Linux 2.4, 2.2, 2.0 and perf kernel patch IBM Power 3,604,604e
For AIX 4.3 and pmtoolkit (in 4.3.4 available) ([email protected])
Sun UltraSparc I, II, & IIISolaris 2.8
MIPS R10K, R12K AMD Athlon
Linux 2.4 and perf kernel patch Cray T3E, SV1, SV2 Soon: Windows 2K, Compaq Alpha EV6 & 67 and Intel IA-64
9
Go To DemoGo To Demo
10
PAPI’s Parallel InterfacePAPI’s Parallel Interface
11
PAPI DevelopmentPAPI Development Extensions to PAPI to support collection and analysis of
hardware performance counter data in the context of shared and distributed memory parallel programs Allowing for straightforward instrumentation of
multithreaded and multiprocessor applications. Tools will include graphical tools extended with dynamic
instrumentation capabilities. Framework for using Dyninst with parallel programs,
the Free Probe Class Server (FPCS) and IBM’s Dynamic Probe Class Library (DPCL)
Port PAPI to Compaq Alpha and HP machines Summary information on problem spots within
applications Integration with other tools, SvPablo, Dyninst, etc Help with setting up PAPI at various sites.
12
Repository DevelopmentRepository Development Repository of Tools and Data on
Performance Evaluation A network-based catalog that will serve
as a “road map” to important Performance Evaluation enabling technologies
A methodology for evaluation and measurement of the success of the tools.
SciDAC outreach: Start a community effort for the collection and dissemination of performance data
13
Self-Adapting Numerical Self-Adapting Numerical Software (SANS)Software (SANS)
Today’s processors can achieve high-performance, but this requires extensive machine-specific hand tuning.
Simple operations like Matrix-Vector ops require many man-hours / platform• Software lags far behind hardware introduction• Only done if financial incentive is there
Compilers not up to optimization challenge Hardware, compilers, and software have a large design space
w/many parameters Blocking sizes, loop nesting permutations, loop unrolling depths,
software pipelining strategies, register allocations, and instruction schedules.
Complicated interactions with the increasingly sophisticated micro-architectures of new microprocessors.
Need for quick/dynamic deployment of optimized routines. ATLAS - Automatic Tuned Linear Algebra Software
14
SANS ExtensionsSANS Extensions
BLAS Sparse matrix operations Message passing Algorithm selection at a higher
level
15
Repository In a Box (RIB)Repository In a Box (RIB)
Metadata objects are stored in repositories.
A repository automatically generates a web site for displaying customizable views of its metadata - search, browse, join, etc.
Metadata objects are also made available to network applications via the RIB API.
16
Repository InteroperationRepository Interoperation
My Repository
OurVirtual
Repository
Metadata objects
Your Repository
Metadata objects
HTMLCatalog
17
Tools IntegrationTools Integration
PAPI, Dyninst, SVPablo Intelligent Adaptation
Rose and SANS (ATLAS) Repository-in-a-Box effort
provides a toolkit for building and maintaining meta-data repositories
18
Interaction with Other EffortsInteraction with Other Efforts SciDAC - TOPS
David Keyes, ICASE/ODU/LLNL SciDAC - Astrophysics
Tony Mezzacappa, ORNL DOE - Cross-Platform Infrastructure
for Scalable Runtime Application Performance AnalysisBart Miller, U Wisc Jeff H., U of Maryland
19
High-End Computer System Performance:High-End Computer System Performance:Science and EngineeringScience and Engineering
Activities for UTennessee Performance Capturing Tools
PAPIAutomatic performance
enhancementSANS/AEOS/ATLAS
Performance repository for apps, kernels, machines, etcNETLIB, RIB
Modeling, predictability