cs 770g - parallel algorithms in scientific computing introductioncs770g/handout/... · 2001. 5....
TRANSCRIPT
CS 770G - Parallel Algorithms in Scientific Computing
May 7 , 2001Lecture 1
Introduction
2
Course Information
• Instructor: Justin Wan ([email protected])
DC 3129• Class homepage: http://www.student.math.uwaterloo.ca/~cs770g• Office hours
– To be determined
• Prerequisites– Familiarity with basic numerical computations such as materials
covered in CS 370.– Experience with programming languages such as C, C++, or
Fortran.
3
Why Need Powerful Computers ?
• Traditional scientific & engineering paradigm:– Do theory or paper design.– Perform experiments or build systems.
• Supplement both by numerical experiments– Real phenomena too complicated to model by hand.– Real experiments are: Too hard: e.g. build large wind tunnels. Too expensive: e.g. build a throw-away passenger jet. Too slow: e.g. wait for tornado to come. Too dangerous: e.g. weapons, drug design.
4
High Performance Computing
• Units/Notation: 1 Mflop 1 Megaflop 106 flop/sec 1 Gflop 1 Gigaflop 109 flop/sec 1 Tflop 1 Teraflop 1012 flop/sec 1 MB 1 Megabyte 106 bytes 1 GB 1 Gigabyte 109 bytes 1 TB 1 Terabyte 1012 bytes 1 PB 1 Petabyte 1015 bytes
5
High Performance Computers
• High end: – ASCI White IBM SP3– ASCI Red Intel Pentium II Xeon– ASCI Blue-Pacific IBM SP– ASCI Blue Mountain SGI.
• Powerful:– Cray T3E, SGI Origin 2000, IBM SP, HP, Hitachi, Fujitsu,
NEC, SUN.
• History:– Thinking Machines, MasPar, nCube, Meiko…
6
TOP 500 List
• Statistics on high-performance computers• Top 500 most powerful computer systems• Performance measure: Linpack benchmark• Updated twice a year
7
Highlights from Top 10
• No.1: ASCI White 4.9 TF on Linpack benchmark.• DOE ASCI systems hold the first 4 positions.• 7 systems have Linpack performance above 1TF.• 18 systems have peak performance above 1TF,
including 1 commercial system (#15 at Charles Schwab).
• 0.89 TF is the entry point for the Top 10
8
Highlights from Top 500
• 231 systems dropped off TOP500 since last June.• Accumulated performance: 65.2 TF -> 88.1 TF.• Entry level: 43.8 GF -> 55.1 GF.• Pure SMP: 121 -> 17.• 112 systems are clusters of SMPs.• Network of workstations: 11 -> 28.
9
More Statistics
TOP500 11/00
Systems installed
Industry49%
Academic17%
Classified5%
Govern't2%Vendor
3%Research24%
Total: 500
10
More Statistics
TOP500 11/00
Performance Development
88.0 TF/s
1.167 TF/s
59.7 GF/s
4.94 TF/s
0.4 GF/s
55.1 GF/s
Jun
93No
v 93
Jun
94No
v 94
Jun
95No
v 95
Jun
96No
v 96
Jun
97No
v 97
Jun
98No
v 98
Jun
99No
v 99
Jun
00No
v 00
Intel XP/S140
Sandia
Fujitsu
'NWT' NAL
SNI VP200EX
Uni Dresden
Hitachi/Tsukuba
CP-PACS/2048
Intel
ASCI Red
Sandia
IBM
ASCI White
LLNL
N=1
N=500
SUM
IBM SP PC604e
130 processors
Alcatel
1 Gflop/s
1 Tflop/s
100 Mflop/s
100 Gflop/s
100 Tflop/s
10 Gflop/s
10 Tflop/s
11
More Statistics
TOP500 11/00
Architectures
Single Processor
SMP
MPP
SIMDConstellati Cluster - NOW
0
100
200
300
400
500
Jun-
93No
v-93
Jun-
94No
v-94
Jun-
95No
v-95
Jun-
96No
v-96
Jun-
97No
v-97
Jun-
98No
v-98
Jun-
99No
v-99
Jun-
00No
v-00
CluMPs
Y-MP C90
Sun HPC
Paragon
CM5T3D
T3E
SP2
Cluster of Sun HPC
ASCI Red
CM2
VP500
SX3
12
More Statistics
TOP500 11/00
Chip Technology
Alpha
Power
HP
intel
MIPS
SUNother COTS
proprietary
0
100
200
300
400
500
Jun-
93No
v-93
Jun-
94No
v-94
Jun-
95No
v-95
Jun-
96No
v-96
Jun-
97No
v-97
Jun-
98No
v-98
Jun-
99No
v-99
Jun-
00No
v-00
13
More Statistics
TOP500 11/00
Manufacturer
Cray
SGI
IBM
Sun
Convex/HP
TMC
Intel
FujitsuNEC
Hitachiothers
0
100
200
300
400
500
Jun
93No
v 93
Jun
94No
v 94
Jun
95No
v 95
Jun
96No
v 96
Jun
97No
v 97
Jun
98No
v 98
Jun
99No
v 99
Jun
00No
v 00
14
More StatisticsTOP500 11/00
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
HP
Compaq
Cray Inc.
Fujitsu
Hitachi
IBM
Intel
NEC
SGI
SUN
Self Made
Number of systems installed
Total: 500
15
More Statistics
TOP500 11/00
198019
8119
8219
8319
8419
8519
8619
8719
8819
8919
9019
9119
9219
9319
9419
9519
9619
9719
9819
9920
00
Cray
Cray Computer
SGI
CDC/ETA
Fujitsu
NEC
Hitachi
Convex/HP
TMC
Intel
nCUBE
Alliant
FPS
Meiko
Parsytec
MasPar
DEC/Compaq
KSR
IBM
Sun
16
ASCI• Accelerated Strategic Computing Initiative (ASCI).• An initiative of defense programs at DOE, US.
– Shift from nuclear test-based methods to compute-based methods.
• Design nuclear weapons, analyze their performance, predict their safety and reliability, etc, without underground nuclear testing → vitual testings / computer simulations.
• Require higher-resolution, 3D, full-physics, full system capabilities → high performance computing.
17
C3• Canadian high performance computing organization.• 50 member institutions, 15 resources providers.
18
Other Grand Challenge Computations• Global climate modeling.• Dyna3D-crash simulation.• Astrophysical modeling.• Earthquake modeling.• Heart simulation.• Web search.• Transaction processing.• Drug design.
19
Why Need High Performance Computing ?
• Example: 24 hr weather prediction for North America.• Cover the region (2 x 107 km2 x 20 km) with grid
points, mesh size = 1 km → 4 x 1015 grid pts.• Suppose it takes 1 flop to calculate the weather at each
grid pt every hour → 1015 flops.• For a PC with 1Gflops → 12 days.• For a high performance computer with 1Tflops
→ 17 min.• How about weather prediction for the entire earth?!
20
Why Parallel Computers ?
• Suppose a CPU can perform 1Tflops.• Data must at least travel some distance r from memory
to CPU.
• 1 data per cycle → 1012 times per sec.• Suppose data travels at the speed of light (3 x 108 m/s) ⇒ r < 0.3 mm.
• Suppose 1 TB data is stored in 0.3mm x 0.3mm square ⇒ each side contains 106 data.
• Each byte occupies about 3 x 10-7 mm ≈ size of an atom!
21
Why Parallel Algorithms ?
• Fast methods on sequential machines may not be easily parallelized.
• Relatively slow methods on sequential machines may be highly parallel.
• Need redesign exisiting algorithms and/or design entirely new approaches.
• Need new theory to provide theoretical foundation for the new methods.