supercomputing the next century talk to the max-planck-institut fuer gravitationsphysik...
TRANSCRIPT
Supercomputing the Next Century
• Talk to the Max-Planck-Institut fuer Gravitationsphysik Albert-Einstein-Institut, Potsdam, Germany
• June 15, 1998
NCSA is the Leading Edge Site for the National Computational Science Alliance
www.ncsa.uiuc.edu
The Alliance Team Structure
• Leading Edge Center
• Enabling Technology– Parallel Computing– Distributed Computing– Data and Collab. Computing
• Partners for Advanced Computational Services– Communities– Training– Technology Deployment– Comp. Resources & Services
• Strategic Industrial and Technology Partners
• Application Technologies– Cosmology– Environmental Hydrology– Chemical Engineering– Nanomaterials– Bioinformatics– Scientific Instruments
• EOT– Education– Evaluation– Universal Access– Government
Alliance‘98 Hosts 1000 AttendeesWith Hundreds On-Line!
The “Experts” Are Not Always Right!Seek a New Vision and Stick to It
We do not believe that workstation-class systems (even if they offer more processing power than the old Cray Y-MPs) can become the mainstay of a National center. At $500K purchase prices, departments should be able to afford the SGI Power Challenge systems on their own. For these reasons, we wonder whether the role of the proposed SGI systems in NCSA’s plan might be different from that of a “production machine”.
-Program Plan Review Panel Feb 1994
The SGI Power Challenge Array as NCSA’s Production Facility for Four Years
0
100
200
300
400
500
Sep
94
Nov94
Jan
95
Mar9
5
May95
Jul9
5
Sep
95
Nov95
Jan
96
Mar9
6
May96
Jul9
6
Sep
96
Nov96
Jan
97
Mar9
7
May97
Jul9
7
Sep
97
Nov97
Jan
98
Mar9
8
May98
Month
Nu
mb
er
of
Users
Per
Mon
th
SGI Power Challenge ArrayCM5
Convex C3880HP/Convex SPP-1200
Cray Y-MPSGI Origin
HP/Convex SPP-2000
C3880 (retired 10/95)
SPP-1200
Y-MP(retired 12/94)
Origin
SPP-2000
CM-5(retired 1/97)
PCA
The NCSA Origin Array Doubles Again This Month
Let’s Blow This Up!Let’s Blow This Up!
The Growth Rate of the National Capacity is Slowing Down again
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
1986
1988
1990
1992
1994
1996
1998
2000
2002
Fiscal Year
No
rmal
ized
CP
U H
ou
rs
Total NU
70% Annual Growth This Year
Source: Quantum Research; Lex Lane, NCSA
Major Gap Developing in National Usage at NSF Supercomputer Centers
70% Annual Growth Rate is the Historical Rate of National Usage Growth.
It is Also Slightly Greater Than the Rate of Moore’s Law, So Lesser Growth Means Desktops Gain on Supers
Projection
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000O
ct-
95
Ap
r-9
6
Oc
t-9
6
Ap
r-9
7
Oc
t-9
7
Ap
r-9
8
Oc
t-9
8
Ap
r-9
9
Oc
t-9
9
NU
s U
se
d P
er
Ye
ar
Grand Total
70% Growth Rate
-
500,000
1,000,000
1,500,000
2,000,000
2,500,000
Oc
t-9
5
Fe
b-9
6
Ju
n-9
6
Oc
t-9
6
Fe
b-9
7
Ju
n-9
7
Oc
t-9
7
Fe
b-9
8
Ju
n-9
8
Oc
t-9
8
Fe
b-9
9
Ju
n-9
9
Oc
t-9
9
NU
s P
er
Mo
nth
Monthly National Usage at NSF Supercomputer Centers
FY96 FY97 FY98 FY99Projection
Capacity Level NSF Proposed at 3/97 NSB Meeting
Transition Period
Accelerated Strategic Computing Initiative is Coupling DOE DP Labs to Universities
• Access to ASCI Leading Edge Supercomputers• Academic Strategic Alliances Program • Data and Visualization Corridors
http://www.llnl.gov/asci-alliances/centers.html
Comparison of the DoE ASCI and the NSF PACI Origin Array Scale Through FY99
www.lanl.gov/projects/asci/bluemtn/Hardware/schedule.html
Los Alamos Origin System FY995-6000 processors
NCSA Proposed System FY996x128 and 4x64=1024 processors
• NEC SX-5– 32 x 16 vector processor SMP– 512 Processors– 8 Gigaflop Peak Processor
• IBM SP– 256 x 16 RISC Processor SMP– 4096 Processors– 1 Gigaflop Peak Processor
• SGI Origin Follow-on – 32 x 128 RISC Processor DSM– 4096 Processors– 1 Gigaflop Peak Processor
High-End Architecture 2000-Scalable Clusters of Shared Memory Modules
Each is 4 Teraflops Peak
Emerging Portable Computing Standards
• HPF• MPI• OpenMP• Hybrids of MPI and OpenMP
Top500 Shared Memory Systems
Vector Processors Microprocessors
TOP500 Reports: http://www.netlib.org/benchmark/top500.html
PVP Systems
0
100
200
300
Ju
n-9
3
No
v-93
Ju
n-9
4
No
v-94
Ju
n-9
5
No
v-95
Ju
n-9
6
No
v-96
Ju
n-9
7
No
v-97
Ju
n-9
8
Nu
mb
er o
f S
yste
ms Europe
Japan
USA
SMP + DSM Systems
0
100
200
300
Ju
n-9
3
No
v-93
Ju
n-9
4
No
v-94
Ju
n-9
5
No
v-95
Ju
n-9
6
No
v-96
Ju
n-9
7
No
v-97
Ju
n-9
8
Nu
mb
er o
f S
yste
ms
USA
The Exponential Growth of NCSA’s SGI Shared Memory Supercomputers
1
10
100
1000
10000
Jan
-94
Jan
-95
Jan
-96
Jan
-97
Jan
-98
Jan
-99
Jan
-00
Jan
-01
SG
I Pro
cess
ors
Doubling Every Nine Months!
Challenge
Power Challenge
Origin
SN1
1
10
100
1000
10000
100000
1000000
1
16
31
46
61
76
91
10
6
12
1
13
6
15
1
16
6
18
1Rank
CP
U-H
ou
rs B
urn
ed 100k to 1 M
10k to 100k
1k to 10k
100 to 1k
10 to 1001 to 10
Extreme and Large PIs Dominant Usage of NCSA Origin
January thru April, 1998
Disciplines Using the NCSA Origin 2000CPU-Hours in March 1995
Particle Physics
Chemistry
Materials Sciences
Engineering CFD
Astronomy
Physics
Industry
Molecular Biology Other
0
20
40
60
80
100
1200
20
40
60
80
10
0
12
0
Processors
Sp
ee
du
p
QMC
Martrix-Vector
ZEUS-MP Blast
RIEMANN-HPF
Laplace-DSM
ZEUS-MP Radn.
Freeman-Elec. Struc.
Woodward-PPM
Perfect Scaling
Source: Mitas, Hayes, Tafti, Saied, Balsara, NCSA; Wilkins, OSU;Woodward, UMinn; Freeman, NWNCSA 128-processor Origin IRIX 6.5
Users, NCSA, SGI, and Alliance Parallel Team Working to Make Better Scaling Routine
0
1
2
3
4
5
6
70
10
20
30
40
50
60
Processors
Gig
afl
op
s
Origin-DSM
Origin-MPI
NT-MPI
SP2-MPI
T3E-MPI
SPP2000-DSM
Solving 2D Navier-Stokes Kernel - Performance of Scalable Systems
Source: Danesh Tafti, NCSA
Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
A Variety of Discipline Codes -Single Processor Performance Origin vs. T3E
0
20
40
60
80
100
120
140
160
Origin T3E
Sin
gle
Pro
ce
ss
or
MF
LO
PS
QMC
RIEMANN
Laplace
QCD
PPM
PIMC
ZEUS
Alliance PACS Origin2000 Repository
http://scv.bu.edu/SCV/Origin2000/
Kadin Tseng, BU, Gary Jensen, NCSA, Chuck Swanson, SGIJohn Connolly, U Kentucky Developing Repository for HP Exemplar
Simulation of the Evolution of the Universe on a Massively Parallel Supercomputer
12 Billion Light Years 4 Billion Light Years
Virgo Project - Evolving a Billion Pieces of Cold Dark Matter in a Hubble Volume -688-processor CRAY T3E at Garching Computing Centre of the Max-Planck-Society
http://www.mpg.de/universe.htm
Limitations of Uniform Grids for Complex Scientific and Engineering Problems
Source: Greg Bryan, Mike Norman, NCSA
512x512x512 Run on 512-node CM-5
Gravitation Causes Continuous
Increase in Density Until There is a Large Mass in a
Single Grid Zone
Use of Shared Memory Adaptive Grids To Achieve Dynamic Load Balancing
Source: Greg Bryan, Mike Norman, John Shalf, NCSA
64x64x64 Run with Seven Levels of Adaption on SGI Power Challenge,Locally Equivalent to 8192x8192x8192 Resolution
NCSA Visualization --VRML Viewers
John Shalf on Greg Bryan Cosmology AMR Data
http://infinite-entropy.ncsa.uiuc.edu/Projects/AmrWireframe/
NT Workstation Shipments Rapidly Surpassing UNIX
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1995 1996 1997
Wo
rkst
atio
ns
Sh
ipp
ed (
Mill
ion
s)
UNIX
NT
Source: IDC, Wall Street Journal, 3/6/98
Current Alliance LES NT Cluster Testbed -Compaq Computer and Hewlett-Packard
• Schedule of NT Supercluster Goals– 1998 Deploy First Production Clusters
– Scientific and Engineering Tuned Cluster– Andrew Chien, Alliance Parallel Computing Team– Rob Pennington, NCSA C&C– Currently 256-processors of HP and Compaq Pentium II SMPs
– Data Intensive Tuned Cluster
– 1999 Enlarge to 512-Processors in Cluster– 2000 Move to Merced– 2002-2005 Achieve Teraflop Performance
• UNIX/RISC & NT/Intel will Co-exist for 5 Years – 1998-2000 Move Applications to NT/Intel– 2000-2005 Convergence toward NT/Merced
First Scaling Testing of ZEUS-MP on CRAY T3E and Origin vs. NT Supercluster
“Supercomputer performance at mail-order prices”-- Jim Gray, Microsoftaccess.ncsa.uiuc.edu/CoverStories/SuperCluster/super.html
Zeus-MP Hydro Code Running Under MPI
• Alliance Cosmology Team• Andrew Chien, UIUC • Rob Pennington, NCSA
0
20
40
60
80
100
120
140
T3
E
Orig
in
NT
Sin
gle
Pro
cesso
r S
peed
o
n Z
EU
S-M
P (
MF
LO
PS
)
0
1
2
3
4
5
6
7
8
0
20
40
60
80
100
120
140
160
180
200
Processors
GFLO
PS
T3E
Origin
NT/Intel
NCSA NT Supercluster Solving Navier-Stokes Kernel
Preconditioned Conjugate Gradient Method With Multi-level Additive Schwarz Richardson Pre-conditioner
(2D 1024x1024)
Single Processor Performance:MIPS R10k 117 MFLOPSIntel Pentium II 80 MFLOPS
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Processors
Sp
ee
du
p
NT MPI
Origin MPI
Origin SM
Perfect
0
1
2
3
4
5
6
7
0
10
20
30
40
50
60
70
Processors
Gig
afl
op
s
NT MPI
Origin MPI
Origin SM
Near Perfect Scaling of Cactus - 3D Dynamic Solver for the Einstein GR Equations
0
20
40
60
80
100
1200
20
40
60
80
10
0
12
0Processors
Sc
alin
g
Origin
NT SC
Ratio of GFLOPsOrigin = 2.5x NT SC
Danesh Tafti, Rob Pennington, Andrew Chien NCSA
Cactus was Developed by Paul Walker, MPI-PotsdamUIUC, NCSA
NCSA Symbio - A Distributed Object Framework Bringing Scalable Computing to NT Desktops
http://access.ncsa.uiuc.edu/Features/Symbio/Symbio.html
• Parallel Computing on NT Clusters– Briand Sanderson, NCSA– Microsoft Co-Funds Development
• Features– Based on Microsoft DCOM– Batch or Interactive Modes– Application Development Wizards
• Current Status & Future Plans– Symbio Developer Preview 2 Released– Princeton University Testbed
The Road to Merced
http://developer.intel.com/solutions/archive/issue5/focus.htm#FOUR
vBNS Connected Alliance Site
vBNS Alliance Site Scheduled for Connection
NCSANCSA
FY98 Assembling the Links in the Gridwith NSF’s vBNS Connections Program
StarTAPStarTAP
27 Alliance sites running...
…16 more in progress.
vBNS Backbone Node
1999: Expansion via AbilenevBNS & Abilene at 2.4 Gbit/s
Source: Charlie Catlett, Randy Butler, NCSA
NCSA Distributed Applications Support Team for vBNS
Globus Ubiquitous Supercomputing Testbed (GUSTO)
• Alliance Middleware for the Grid-Distributed Computing Team• GII Next Generation Winner• SF Express -- NPACI / Alliance DoD Mod. Demonstration
– Largest Distributed Interactive Simulation Ever Performed• The Grid: Blueprint for a New Computing Infrastructure
– Edited by Ian Foster and Carl Kesselman, July 1998• IEEE Symposium on High Performance Distributed Computing
– July 29-31, 1998 Chicago Illinois• NASA IPG Most Recent Funding Addition
Alliance National Technology GridWorkshop and Training Facilities
Powered by Silicon GraphicsLinked by the NSF vBNS
Jason Leigh and Tom DeFanti, EVL; Rick Stevens, ANL
Using NCSA Virtual Director to Explore Structure of Density Isosurfaces of 2563 MHD Star Formation
Simulation by Dinshaw Balsara, NCSA, Alliance Cosmology TeamVisualization by Bob Patterson, NCSA
Red Iso = 4x Mean DensityYellow Iso = 8x Mean DensityRed Iso = 12x Mean Density
Isosurface models generated by Vis5dChoreographed with Cave5D/VirDir
Rendered with Wavefront on SGI Onyx
Linking CAVE to Vis5D = CAVE5D Then Use Virtual Director to Analyze Simulations
Donna Cox, Robert Patterson, Stuart Levy, NCSAVirtual Director Team
Java 3D API HPC Application: VisADEnviron. Hydrology Team, (Bill Hibbard, Wisconsin)
Steve Pietrowicz, NCSA Java TeamStandalone or CAVE-to-Laptop-Collaborative
Environmental Hydrology Collaboration: From CAVE to Desktop
NASA IPG is Adding Funding To Collaborative Java3D
Caterpillar’s Collaborative Virtual Prototyping Environment
Data courtesy of Valerie Lehner, NCSA
Real Time Linked VR and Audio-Video Between NCSA and Germany
Using SGI Indy/Onyx and HP Workstations