high peformance computer
TRANSCRIPT
-
8/10/2019 High Peformance Computer
1/43
High Perfo rm an c e ParallelSupercompute r
Dien Taufan LessyMCSCE
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Spring Semester 2014
-
8/10/2019 High Peformance Computer
2/43
g p p
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Contents
IntroductionParallelismFuture ResearchConclusion
ReferencesLiteratures
-
8/10/2019 High Peformance Computer
3/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
The first SupercomputerIBM Naval Ordnance Research Calculator
15000 operations/s ADD(15 s), MUL(31 s), DIV(227 s)
[1]
-
8/10/2019 High Peformance Computer
4/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
The first SupercomputerControl Data Corporation 66001 MFLPOS
[2]
-
8/10/2019 High Peformance Computer
5/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
Today (November 2013)Tianhe-2 (MilkyWay-2)
[3]
Cores: 3.120.000
Rmax: 33.862,7 (TFLOPs)Power: 17.808 kW
-
8/10/2019 High Peformance Computer
6/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
Todays Ranking
[3]
-
8/10/2019 High Peformance Computer
7/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
HPC Vendor
[3]
-
8/10/2019 High Peformance Computer
8/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
Processor Generation
[3]
-
8/10/2019 High Peformance Computer
9/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
Segment user
[3]
-
8/10/2019 High Peformance Computer
10/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
OS
[3]
-
8/10/2019 High Peformance Computer
11/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
[3]
-
8/10/2019 High Peformance Computer
12/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
History
-
8/10/2019 High Peformance Computer
13/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
I ntr oduction
[3]
Concept and Terminology
-
8/10/2019 High Peformance Computer
14/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
The von Neumman Computer
Walk-Through: c=a+b
1. Get next instruction2. Decode: Fetch a3. Fetch a to internal register4. Get next instruction
5. Decode: fetch b6. Fetch b to internal register7. Get next instruction8. Decode: add a and b (c in register)9. Do the addition in ALU10. Get next instruction11. Decode: store c in main memory12. Move c from internal register to main memory
Note: Some units are idle while others are workingwaste of cycles. Pipelining (modularization) & Cashing (advance decoding)parallelism
-
8/10/2019 High Peformance Computer
15/43
-
8/10/2019 High Peformance Computer
16/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Increasing Cycle TimeMoores Law
-
8/10/2019 High Peformance Computer
17/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Increasing Cycle TimeCore Voltage increase with frequency
[6]
-
8/10/2019 High Peformance Computer
18/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
~
~
~
Power cost of Frequency
-
8/10/2019 High Peformance Computer
19/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
High Performance Serial Processor needs
high power
-
8/10/2019 High Peformance Computer
20/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Processor Memory GAP (Bootleneck)
-
8/10/2019 High Peformance Computer
21/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Definition
Concurrent vs Parallel
a parallel computer is collection of processingelements that communicate and cooperate to solvelarge problems quickly - Almasi and Gottlieb 1989
-
8/10/2019 High Peformance Computer
22/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Speedup vs Efficiency
For given problem:
speedup(using P Processors) =
10 Processor with 2 times Speedup?
exec. time (P Processor)
exec. time (1 Processor)
-
8/10/2019 High Peformance Computer
23/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Serial vs Parallel
[3]
-
8/10/2019 High Peformance Computer
24/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Serial vs Parallel
[3]
-
8/10/2019 High Peformance Computer
25/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Serial vs Parallel
[3]
-
8/10/2019 High Peformance Computer
26/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Serial vs Parallel
[3]
-
8/10/2019 High Peformance Computer
27/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Processor Type
Scalar processorCISC: Complex Instruction Set Computer
Intel 80x86 (IA32)
RISC: Reduced Instruction Set ComputerSun SPARC, IBM Power #, SGI MIPSVLIW: Very Long Instruction Word; Explicitly parallelinstruction computing (EPIC); Probably dying
Intel IA64 (Itanium)
Vector processor;Cray X1/T90; NEC SX#; Japan Earth Simulator; EarlyCray machines; Japan Life Simulator (hybrid)
-
8/10/2019 High Peformance Computer
28/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
CISC vs RISC vs VLWI
[5]
-
8/10/2019 High Peformance Computer
29/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
Flynns Classical Taxonomy
-
8/10/2019 High Peformance Computer
30/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
SISD
-
8/10/2019 High Peformance Computer
31/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
SIMD
-
8/10/2019 High Peformance Computer
32/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
MISD
-
8/10/2019 High Peformance Computer
33/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
MIMD
-
8/10/2019 High Peformance Computer
34/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Memory ArchitectureShared Memory
Superscalar processors with L2cache connected to memorymodules through a bus or crossbar
All processors have access to allmachine resources including
memory and I/O devicesSMP (symmetric multiprocessor): ifprocessors are all the same andhave equal access to machineresources, i.e. it is symmetric.SMP are UMA (Uniform Memory
Access) machinese.g., A node of IBM SP machine;SUN Ultraenterprise 10000
-
8/10/2019 High Peformance Computer
35/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Memory ArchitectureShared Memory
If bus,Only one processor can access thememory at a time.Processors contend for bus toaccess memory
If crossbar,Multiple processors can accessmemory through independent pathsContention when differentprocessors access same memorymoduleCrossbar can be very expensive.
Processor count limited by memorycontention and bandwidth
Max usually 64 or 128
-
8/10/2019 High Peformance Computer
36/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Memory ArchitectureDistributed Memory
Superscalar processors withlocal memory connectedthrough communicationnetwork.Each processor can only workon data in local memory
Access to remote memoryrequires explicitcommunication.Present-day largesupercomputers are all somesort of distributed-memorymachines
-
8/10/2019 High Peformance Computer
37/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
Memory ArchitectureHybrid Distributed-Shared Memory
Overall distributedmemory, SMP nodesMost modern
supercomputers andworkstation clusters areof this typeMessage passing; orhybrid messagepassing/threading.
-
8/10/2019 High Peformance Computer
38/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
Amdahls Law
Suppose only part of an application seems parallel Amdahl
s law
let s be the fraction of work done sequentially, so(1-s) is fraction parallelizable
P = number of processors
Speedup(P) = Time(1)/Time(P)
-
8/10/2019 High Peformance Computer
39/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Parallelism
[3]
Amdahls Law
-
8/10/2019 High Peformance Computer
40/43
-
8/10/2019 High Peformance Computer
41/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
Conclusion
Improvement of Single-instruction-streamrequires a lot of effort for little gainParallel computing the only way to achievehigher performance in the foreseeablefutureSupercomputer combines all of parallelcomputing technology such as Paralle CPU,Multicore, Scalar, Vector, etc
-
8/10/2019 High Peformance Computer
42/43
Dien Taufan Lessy (3011464)High Performace Parallel Supercomputer
References
[1]: http://www.columbia.edu/cu/computinghistory/norc.html, May 2014[2]: http://en.wikipedia.org/wiki/File:CDC_6600.jc.jpg[3]: http://www.top500.org/, May 2014[4]: https://computing.llnl.gov/tutorials/parallel_comp/[5]: http://15418.courses.cs.cmu.edu/spring2014/lecture/whyparallelism[6]: http://www.intel.com
[7]: http://discovermagazine.com/galleries/zen-photo/m/moores-law[8]: http://people.cs.clemson.edu/~mark/464/acmse_epic.pdf
-
8/10/2019 High Peformance Computer
43/43
Dien Taufan Lessy (3011464)
Literature
Parallel Computer Architecture: A Hardware / Software
Approach, D.E. Culler, J.P. Singh Computer Architecture: A Quantitative Approach, J.L.
Hennessy, D.A. Patterson https://computing.llnl.gov/tutorials/parallel_comp/
https://computing.llnl.gov/tutorials/parallel_comp/OverviewRecentSupercomputers.2008.pdf http://www-users.cs.umn.edu/~karypis/parbook/ http://www.top500.org/ http://15418.courses.cs.cmu.edu