towards scalable performance analysis and visualization through data reduction

16
Towards Scalable Performance Analysis and Visualization through Data Reduction Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign

Upload: leo-valdez

Post on 04-Jan-2016

46 views

Category:

Documents


4 download

DESCRIPTION

Towards Scalable Performance Analysis and Visualization through Data Reduction. Chee Wai Lee, Celso Mendes, L. V. Kale University of Illinois at Urbana-Champaign. Motivation. Why?. Event trace-based performance tools help applications scale well. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Towards Scalable Performance Analysis and Visualization through Data Reduction

Towards Scalable Performance Analysis and Visualization through Data Reduction

Chee Wai Lee, Celso Mendes, L. V. Kale

University of Illinois at Urbana-Champaign

Page 2: Towards Scalable Performance Analysis and Visualization through Data Reduction

Motivation

Event trace-based performance tools help applications scale well.

As applications scale, so must performance tools.

Why?

Page 3: Towards Scalable Performance Analysis and Visualization through Data Reduction

Nature of Event Traces

Tend to be thread or processor-centric.

Volume of data per thread proportional to number of performance events encountered.

Number of performance events per thread depends on duration of run and frequency of events.

Strong Scaling: More threads, more communication events.

Weak Scaling: More threads, more communication events, more work per thread.

More events = more work for Performance Tools.

Page 4: Towards Scalable Performance Analysis and Visualization through Data Reduction

Reducing the data: Part 1

Baseline: Record events of the entire run.

What are simple ways of reducing the volume of performance data?Cut inconsequential event-blocks (e.g. initialization/end)

Keep important snapshots (e.g. important iteration blocks)

NAMDStartup

First 300 steps with Load Balancing

Steps 300-500with

a load refinement

Page 5: Towards Scalable Performance Analysis and Visualization through Data Reduction

Quantifying the Problem

92k Atoms 327k Atoms 1000k Atoms

512 cores 827 MB 1,800 MB 2,800 MB

1024 cores 938 MB 2,200 MB 3,900 MB

2048 cores 1,200 MB 2,800 MB 4,800 MB

4096 cores 5,700 MB

NAMD molecular dynamics simulations and event trace volume as generated by Projections performance tool over 200 (“interesting”) time steps.

Weak ScalingStrong Scaling

Page 6: Towards Scalable Performance Analysis and Visualization through Data Reduction

Reducing the data: Part 2

Drop “uninteresting” or some specific classes of events.

Compress and/or characterize event patterns.

Our Approach:Drop “uninteresting” processors (Threads)

Page 7: Towards Scalable Performance Analysis and Visualization through Data Reduction

Our Approach

Choose a subset of processors:

Representatives Outliers

Employ k-Means Clustering for Equivalence-Class discovery.

Chosen processors’ performance data are written to disk at end of run.

Which?

Why?How?

Page 8: Towards Scalable Performance Analysis and Visualization through Data Reduction

Equivalence Class Discovery

Metr

ic Y

Metric X

Euclidean Distance

Outliers

Representatives

Page 9: Towards Scalable Performance Analysis and Visualization through Data Reduction

Things to Consider

Distance measures may require normalization.

Whether certain metrics are strongly correlated to one another.

Number of initial seeds.

Placement of initial seeds.

Number of representatives chosen.

Number of outliers chosen.

Page 10: Towards Scalable Performance Analysis and Visualization through Data Reduction

Experimental Methodology

NAMD (NAnoscale Molecular Dynamics) task grain-size performance problem (2002).

Roll-back a performance improvement we made in 2002 to address this problem.

Tuned NAMD Problem Injected

Page 11: Towards Scalable Performance Analysis and Visualization through Data Reduction

Experimental Methodology (2)

1 million atom simulation of the Satellite Tabacco Mosaic Virus.

512 processors to 4096 processors on PSC’s Bigben Cray XT3 supercomputer.

Two criteria for validation:

Amount of data reduced.

Quality of the reduced dataset.

Page 12: Towards Scalable Performance Analysis and Visualization through Data Reduction

Histogram Quality Measure

Bariorig… … Barireduced… …

Original Data: 1000 pe

Reduced Data: 100 pe

Hoi

Hri

How close is Hri/Ho

i to 0.100 on average?

Page 13: Towards Scalable Performance Analysis and Visualization through Data Reduction

Results: Data Reduction

512 1,024 2,048 4,0960

1,000

2,000

3,000

4,000

5,000

6,000

Original DatasetReduced Dataset

Processor Cores

Data

Volu

me (

meg

ab

yte

s)

Page 14: Towards Scalable Performance Analysis and Visualization through Data Reduction

Results: QualityPo Pr Pr/Po Average H Std Dev

512

25 0.0488 0.0641 0.00732

51 0.0996 0.1180 0.00768

102 0.1992 0.2237 0.00732

1024

51 0.0498 0.0511 0.00168

102 0.0996 0.1008 0.00157

204 0.1992 0.1921 0.00264

2048

102 0.0498 0.0487 0.00122

204 0.0996 0.0977 0.00216

408 0.1992 0.1883 0.00575

4096

204 0.0498 0.0501 0.00170

409 0.0998 0.0981 0.00203

818 0.1997 0.1975 0.00163

Page 15: Towards Scalable Performance Analysis and Visualization through Data Reduction

Conclusion

Approach offers a potential way of controlling volume of performance data generated.

Heuristics have been reasonably good at capturing performance characteristics of the NAMD grain-size problem.

Page 16: Towards Scalable Performance Analysis and Visualization through Data Reduction

Future Work

Conduct experiments on more problem types and classes for verification.

Find better (more practical) ways for equivalence class discovery.