towards scalable performance analysis and visualization through data reduction
DESCRIPTION
Towards Scalable Performance Analysis and Visualization through Data Reduction. Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign. Motivation. Why?. Event trace-based performance tools help applications scale well. - PowerPoint PPT PresentationTRANSCRIPT
Towards Scalable Performance Analysis and Visualization through Data Reduction
Chee Wai Lee, Celso Mendes, L. V. Kale
University of Illinois at Urbana-Champaign
Motivation
Event trace-based performance tools help applications scale well.
As applications scale, so must performance tools.
Why?
Nature of Event Traces
Tend to be thread or processor-centric.
Volume of data per thread proportional to number of performance events encountered.
Number of performance events per thread depends on duration of run and frequency of events.
Strong Scaling: More threads, more communication events.
Weak Scaling: More threads, more communication events, more work per thread.
More events = more work for Performance Tools.
Reducing the data: Part 1
Baseline: Record events of the entire run.
What are simple ways of reducing the volume of performance data?Cut inconsequential event-blocks (e.g. initialization/end)
Keep important snapshots (e.g. important iteration blocks)
NAMDStartup
First 300 steps with Load Balancing
Steps 300-500with
a load refinement
Quantifying the Problem
92k Atoms 327k Atoms 1000k Atoms
512 cores 827 MB 1,800 MB 2,800 MB
1024 cores 938 MB 2,200 MB 3,900 MB
2048 cores 1,200 MB 2,800 MB 4,800 MB
4096 cores 5,700 MB
NAMD molecular dynamics simulations and event trace volume as generated by Projections performance tool over 200 (“interesting”) time steps.
Weak ScalingStrong Scaling
Reducing the data: Part 2
Drop “uninteresting” or some specific classes of events.
Compress and/or characterize event patterns.
Our Approach:Drop “uninteresting” processors (Threads)
Our Approach
Choose a subset of processors:
Representatives Outliers
Employ k-Means Clustering for Equivalence-Class discovery.
Chosen processors’ performance data are written to disk at end of run.
Which?
Why?How?
Equivalence Class Discovery
Metr
ic Y
Metric X
Euclidean Distance
Outliers
Representatives
Things to Consider
Distance measures may require normalization.
Whether certain metrics are strongly correlated to one another.
Number of initial seeds.
Placement of initial seeds.
Number of representatives chosen.
Number of outliers chosen.
Experimental Methodology
NAMD (NAnoscale Molecular Dynamics) task grain-size performance problem (2002).
Roll-back a performance improvement we made in 2002 to address this problem.
Tuned NAMD Problem Injected
Experimental Methodology (2)
1 million atom simulation of the Satellite Tabacco Mosaic Virus.
512 processors to 4096 processors on PSC’s Bigben Cray XT3 supercomputer.
Two criteria for validation:
Amount of data reduced.
Quality of the reduced dataset.
Histogram Quality Measure
Bariorig… … Barireduced… …
Original Data: 1000 pe
Reduced Data: 100 pe
Hoi
Hri
How close is Hri/Ho
i to 0.100 on average?
Results: Data Reduction
512 1,024 2,048 4,0960
1,000
2,000
3,000
4,000
5,000
6,000
Original DatasetReduced Dataset
Processor Cores
Data
Volu
me (
meg
ab
yte
s)
Results: QualityPo Pr Pr/Po Average H Std Dev
512
25 0.0488 0.0641 0.00732
51 0.0996 0.1180 0.00768
102 0.1992 0.2237 0.00732
1024
51 0.0498 0.0511 0.00168
102 0.0996 0.1008 0.00157
204 0.1992 0.1921 0.00264
2048
102 0.0498 0.0487 0.00122
204 0.0996 0.0977 0.00216
408 0.1992 0.1883 0.00575
4096
204 0.0498 0.0501 0.00170
409 0.0998 0.0981 0.00203
818 0.1997 0.1975 0.00163
Conclusion
Approach offers a potential way of controlling volume of performance data generated.
Heuristics have been reasonably good at capturing performance characteristics of the NAMD grain-size problem.
Future Work
Conduct experiments on more problem types and classes for verification.
Find better (more practical) ways for equivalence class discovery.