on-line automated performance diagnosis on thousands of processors
DESCRIPTION
On-line Automated Performance Diagnosis on Thousands of Processors. Philip C. Roth. Future Technologies Group Computer Science and Mathematics Division Oak Ridge National Laboratory. Paradyn Research Group Computer Sciences Department University of Wisconsin-Madison. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/1.jpg)
1
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
On-line Automated Performance Diagnosis on Thousands of Processors
Philip C. Roth
Future Technologies GroupComputer Science and Mathematics Division
Oak Ridge National LaboratoryParadyn Research Group
Computer Sciences DepartmentUniversity of Wisconsin-Madison
![Page 2: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/2.jpg)
2
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
High Performance Computing Today
Large parallel computing resources Tightly coupled systems (Earth Simulator,
BlueGene/L, XT3) Clusters (LANL Lightning, LLNL Thunder) Grid
Large, complex applications ASCI Blue Mountain job sizes (2001)
512 cpus: 17.8% 1024 cpus: 34.9% 2048 cpus: 19.9%
Small fraction of peak performance is the rule
![Page 3: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/3.jpg)
3
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Achieving Good Performance
Need to know what and where to tune Diagnosis and tuning tools are critical for realizing
potential of large-scale systems On-line automated tools are especially desirable
Manual tuning is difficult Finding interesting data in large data volume Understanding application, OS, hardware interactions
Automated tools require minimal user involvement; expertise is built into the tool
On-line automated tools can adapt dynamically Dynamic control over data volume Useful results from a single run
But: tools that work well in small-scale environments often don’t scale
![Page 4: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/4.jpg)
4
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Tool Front End
d0 d1 d2 d3 dP-4 dP-3 dP-2 dP-1
a0 a1 a2 a3 aP-4 aP-3 aP-2 aP-1
Tool Daemons
App Processes
•Managing performance data volume•Communicating efficiently between distributed tool components
•Making scalable presentation of data and analysis results
Barriers to Large-Scale Performance Diagnosis
![Page 5: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/5.jpg)
5
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Our Approach for Addressing These Scalability Barriers
MRNet: multicast/reduction infrastructure for scalable tools
Distributed Performance Consultant: strategy for efficiently finding performance bottlenecks in large-scale applications
Sub-Graph Folding Algorithm: algorithm for effectively presenting bottleneck diagnosis results for large-scale applications
![Page 6: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/6.jpg)
6
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline
Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary
![Page 7: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/7.jpg)
7
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Automated performance diagnosis Search for application performance
problems Start with global, general experiments (e.g., test
CPUbound across all processes) Collect performance data using dynamic
instrumentation Collect only the data desired Remove the instrumentation when no longer
needed Make decisions about truth of each experiment Refine search: create more specific experiments
based on “true” experiments (those whose data is above user-configurable threshold)
Performance Consultant
![Page 8: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/8.jpg)
8
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Performance Consultant
c002.cs.wisc.educ001.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
![Page 9: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/9.jpg)
9
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
CPUbound
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
main
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
……
…
…
c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Performance Consultant
![Page 10: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/10.jpg)
10
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
CPUbound
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
main
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
……
…
…
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Performance Consultant
![Page 11: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/11.jpg)
11
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline
Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary
![Page 12: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/12.jpg)
12
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
MRNet: Multicast/Reduction Overlay Network
Parallel tool infrastructure providing: Scalable multicast Scalable data synchronization and transformation
Network of processes between tool front-end and back-ends
Useful for parallelizing and distributing tool activities Reduce latency Reduce computation and communication load at tool
front-end Joint work with Dorian Arnold (University of
Wisconsin-Madison)
![Page 13: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/13.jpg)
13
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Typical Parallel Tool Organization
Tool Front End
d0 d1 d2 d3
a0 a1 a2 a3
dP-4 dP-3 dP-2 dP-1
aP-4 aP-3 aP-2 aP-1
Tool Daemons
App Processes
![Page 14: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/14.jpg)
14
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
MRNet-based Parallel Tool Organization
Tool Front End
d0 d1 d2 d3
a0 a1 a2 a3
dP-4 dP-3 dP-2 dP-1
aP-4 aP-3 aP-2 aP-1
Tool Daemons
App Processes
Multicast/ Reduction Network
Internal Process
Filter
![Page 15: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/15.jpg)
15
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline
Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary
![Page 16: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/16.jpg)
16
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Performance Consultant: Scalability Barriers
MRNet can alleviate scalability problem for global performance data (e.g., CPU utilization across all processes)
But front-end still processes local performance data (e.g., utilization of process 5247 on host mcr398.llnl.gov)
![Page 17: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/17.jpg)
17
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
CPUbound
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
main
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
……
…
…
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Performance Consultant
![Page 18: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/18.jpg)
18
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
CPUbound
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
main
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
……
…
…
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Distributed Performance Consultant
![Page 19: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/19.jpg)
19
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Distributed Performance Consultant: Variants
Natural steps from traditional centralized approach (CA)
Partially Distributed Approach (PDA) Distributed local searches, centralized global search Requires complex instrumentation management
Truly Distributed Approach (TDA) Distributed local searches only Insight into global behavior from combining local
search results (e.g., using Sub-Graph Folding Algorithm)
Simpler tool design than PDA
![Page 20: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/20.jpg)
20
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
CPUbound
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
main
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
……
…
…
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Distributed Performance Consultant: PDA
![Page 21: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/21.jpg)
21
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
…… …
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Distributed Performance Consultant: TDA
![Page 22: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/22.jpg)
22
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
c002.cs.wisc.educ001.cs.wisc.edu
c001.cs.wisc.edu
main
myapp{367}
Do_row Do_col
Do_mult
c128.cs.wisc.edu
main
myapp{27549}
Do_row Do_col
Do_mult
c002.cs.wisc.edu
main
myapp{4287}
Do_row Do_col
Do_mult
…
… …
…
…… …
cham.cs.wisc.edu c128.cs.wisc.edu
myapp367 myapp4287 myapp27549
Distributed Performance Consultant: TDA
Sub-Graph Folding Algorithm
![Page 23: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/23.jpg)
23
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline
Paradyn and the Performance Consultant
MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary
![Page 24: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/24.jpg)
24
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Search History Graph Example
CPUbound
c34.cs.wisc.edu
myapp{7624}
main
A B
C
D
main
A B
C
D
myapp{1272}
main
A B
C
D
myapp{1273}
main
A B
C
D E
myapp{7625}
main
A B
C
D
c33.cs.wisc.edu
![Page 25: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/25.jpg)
25
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Search History Graphs
Search History Graph is effective for presenting search-based performance diagnosis results…
…but it does not scale to a large number of processes because it shows one sub-graph per process
![Page 26: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/26.jpg)
26
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Sub-Graph Folding Algorithm
Combines host-specific sub-graphs into composite sub-graphs
Each composite sub-graph represents a behavioral category among application processes
Dynamic clustering of processes by qualitative behavior
![Page 27: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/27.jpg)
27
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
SGFA: Example
CPUbound
c34.cs.wisc.edu
myapp{7624}
main
A B
C
D
main
A B
C
D
myapp{1272}
main
A B
C
D
myapp{1273}
main
A B
C
D E
myapp{7625}
main
A B
C
D
c33.cs.wisc.edu
myapp{*}
D E
c*.cs.wisc.edu
![Page 28: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/28.jpg)
28
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
SGFA: Implementation
Custom MRNet filter Filter in each MRNet process keeps
folded graph of search results from all reachable daemons
Updates periodically sent upstream By induction, filter in front-end holds
entire folded graph Optimization for unchanged graphs
![Page 29: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/29.jpg)
29
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Outline
Performance Consultant MRNet Distributed Performance Consultant Sub-Graph Folding Algorithm Evaluation Summary
![Page 30: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/30.jpg)
30
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC + SGFA: Evaluation
Modified Paradyn to perform bottleneck searches using CA, PDA, or TDA approach
Modified instrumentation cost tracking to support PDA Track global, per-process instrumentation cost
separately Simple fixed-partition policy for scheduling global
and local instrumentation Implemented Sub-Graph Folding Algorithm as
custom MRNet filter to support TDA (used by all)
Instrumented front-end, daemons, and MRNet internal processes to collect CPU, I/O load information
![Page 31: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/31.jpg)
31
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC + SGFA: Evaluation
su3_rmd QCD pure lattice gauge theory code C, MPI Weak scaling scalability study
LLNL MCR cluster 1152 nodes (1048 compute nodes) Two 2.4 GHz Intel Xeons per node 4 GB memory per node Quadrics Elan3 interconnect (fat tree) Lustre parallel file system
![Page 32: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/32.jpg)
32
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC + SGFA: Evaluation
PDA and TDA: bottleneck searches with up to 1024 processes so far, limited by partition size
CA: scalability limit at less than 64 processes
Similar qualitative results from all approaches
![Page 33: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/33.jpg)
33
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 34: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/34.jpg)
34
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 35: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/35.jpg)
35
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 36: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/36.jpg)
36
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 37: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/37.jpg)
37
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 38: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/38.jpg)
38
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 39: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/39.jpg)
39
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 40: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/40.jpg)
40
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 41: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/41.jpg)
41
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
DPC: Evaluation
![Page 42: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/42.jpg)
42
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
SGFA: Evaluation
![Page 43: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/43.jpg)
43
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
Summary
Tool scalability is critical for effective use of large-scale computing resources
On-line automated performance tools are especially important at large scale
Our approach: MRNet Distributed Performance Consultant (TDA)
plus Sub-Graph Folding Algorithm
![Page 44: On-line Automated Performance Diagnosis on Thousands of Processors](https://reader036.vdocuments.site/reader036/viewer/2022062722/56813965550346895da10056/html5/thumbnails/44.jpg)
44
OAK RIDGE NATIONAL LABORATORYU. S. DEPARTMENT OF ENERGY
References
P.C. Roth, D.C. Arnold, and B.P. Miller, “MRNet: a Software-Based Multicast/Reduction Network for Scalable Tools,” SC 2003, Phoenix, Arizona, November 2003
P.C. Roth and B.P. Miller, “The Distributed Performance Consultant and the Sub-Graph Folding Algorithm: On-line Automated Performance Diagnosis on Thousands of Processes,” in submission
Publications available from http://www.paradyn.org
MRNet software available from http://www.paradyn.org/mrnet