scaling to new heights
DESCRIPTION
Scaling to New Heights. Retrospective IEEE/ACM SC2002 Conference Baltimore, MD. Introduction. More than 80 researchers from universities, research centers, and corporations around the country attended the first "Scaling to New Heights" workshop, May 20 and 21, at PSC. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/1.jpg)
Scaling to New HeightsRetrospective
IEEE/ACM SC2002 Conference
Baltimore, MD
![Page 2: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/2.jpg)
Introduction• More than 80 researchers from universities, research
centers, and corporations around the country attended the first "Scaling to New Heights" workshop, May 20 and 21, at PSC.
• Sponsored by the NSF leading-edge centers (NCSA, PSC, SDSC) together with the Center for Computational Sciences (ORNL) and NERSC, the workshop included a poster session, invited and contributed talks, and a panel.
• Participants examined issues involved in adapting and developing research software to effectively exploit systems comprised of thousands of processors.
The following slides represent a collection of ideas from the workshop
![Page 3: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/3.jpg)
Basic Concepts
• All application components must scale• Control granularity; Virtualize• Incorporate latency tolerance• Reduce dependency on synchronization• Maintain per-process load; Facilitate balance
Only new aspect is the degree to which these things matter
![Page 4: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/4.jpg)
Issues and Remedies
• Granularity
• Latencies
• Synchronization
• Load Balancing
• Heterogeneous Considerations
![Page 5: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/5.jpg)
Granularity
Define problem in terms of a large number of small objects independent of the process count
• Object design considerations– Caching and other local effects– Communication-to-computation ratio
• Control granularity through virtualization– Maintain per-process load level– Manage comms within virtual blocks, e.g. Converse– Facilitate dynamic load balancing
![Page 6: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/6.jpg)
Latencies• Network
– Latency reduction lags improvement in flop rates; Much easier to grow bandwidth
– Overlap communications and computations; Pipeline larger messages
– Don’t wait – Speculate!
• Software Overheads– Can be more significant than network delays– NUMA architectures
Scalable designs must accommodate latencies
![Page 7: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/7.jpg)
Synchronization
• Cost increases with the process count– Synchronization doesn’t scale well– Latencies come into play here too
• Distributed resource exacerbates problems– Heterogeneity another significant obstacle
• Regular communication patterns are often characterized by many synchronizations– Best suited to homogeneous co-located clusters
Transition to asynchronous models?
![Page 8: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/8.jpg)
Load Balancing
• Static load balancing– Reduces to granularity problem– Differences between processors and network
segments are determined a priori
• Dynamic process management requires distributed monitoring capabilities– Must be scalable– System maps objects to processes
![Page 9: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/9.jpg)
Heterogeneous Considerations
• Similar but different processors or network components configured within a single cluster– Different clock rates, NICs, etc.
• Distinct processors, networking segments, and operating systems operating at a distance– Grid resources
Elevates significance of dynamic load balancing; Data-driven objects immediately adaptable
![Page 10: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/10.jpg)
Poor Scalability?
Processors
Speedup
![Page 11: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/11.jpg)
Good Scalability?
Processors
Speedup
![Page 12: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/12.jpg)
Processors
Speedup
Performance Comparison
![Page 13: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/13.jpg)
Tools
• Automated algorithm selection and performance tuning by empirical means, e.g. ATLAS– Generate space of algorithms and search for fastest
implementations by running them
• Scalability prediction, e.g. PMaC Lab– Develop performance models (machine profiles;
application signatures) and trending patterns
Identify/fix bottlenecks; choose new methods?
![Page 14: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/14.jpg)
Case Study:NAMD Scalable Molecular Dynamics
• Three-dimensional object-oriented code
• Message-driven execution capability
• Fixed problem sizes determined by biomolecular structures
• Embedded PME electrostatics processor
• Asynchronous communications
![Page 15: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/15.jpg)
0.001
0.01
0.1
1
10
4 8 16 32 64 128 256 512 1024
4 procs/node
perfect
![Page 16: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/16.jpg)
0.001
0.01
0.1
1
64 128 256 512 1024
4 procs/node
perfect
![Page 17: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/17.jpg)
0.01
0.1
1
10
4 8 16 32 64 128 256 512 1024 2048
4 procs/node
perfect
![Page 18: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/18.jpg)
0.001
0.01
0.1
1
128 256 512 1024 2048
4 procs/node
perfect
![Page 19: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/19.jpg)
Case Study:Summary
• As more processes are used to solve the given fixed-size problems, benchmark times decrease to a few milliseconds– PME communication times and operating system
loads are significant in this range
• Scaling to many thousands of processes is almost certainly achievable now given a large enough problem– 700 atoms/process x 3,000 processes = 2.1M atoms
![Page 20: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/20.jpg)
Contacts and References
• David O’Neal [email protected]
• John Urbanic [email protected]
• Sergiu Sanielevici [email protected]
Workshop materials:www.psc.edu/training/scaling/workshop.html
![Page 21: Scaling to New Heights](https://reader035.vdocuments.site/reader035/viewer/2022062217/5681463f550346895db348dd/html5/thumbnails/21.jpg)
Topics for Discussion
• How should large, scalable computational science problems be posed?
• Should existing algorithms and codes be modified or should new ones be developed?
• Should agencies explicitly fund collaborations to develop industrial-strength, efficient, scalable codes?
• What should cyber-infrastructure builders and operators do to help scientists develop and run good applications?