dr. wei chen ( ), professor tennessee state university 2013 8 1 lectures on parallel and distributed...

26
Dr. Wei Chen ( 陈陈 ), Professor Tennessee State University 于于于于于于于于于于于2013 于 8 于 1 Lectures on Parallel and Distributed Computing 于于于于于于于于于于于于

Upload: makayla-lilley

Post on 30-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

1

Dr. Wei Chen (陈慰 ), Professor

Tennessee State University

于上海海事大学信息学院, 2013 年 8 月

Lectures on Parallel and Distributed Computing 并行及分布式计算系列讲座

Page 2: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

2

Lecture 1: Introduction to parallel computing

Lecture 2: Parallel computational models

Lecture 3: Parallel algorithm design and analysis

Lecture 4: Distributed-memory programming with PVM/MPI

Lecture 5: Shared-memory programming with Open MP

Lecture 6: Shared-memory programming with GPU

Lecture 7: Introduction to distributed systems

Lecture 8: Synchronous network algorithms

Lecture 9: Asynchronous shared-memory/network algorithms

Outline

Page 3: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

3

Reference:

(1) Lecture 1 & 2 & 3: Joseph Jaja, “Introduction to Parallel Algorithms,” Addison Wesley, 1992.

(2) Lecture 4 & 5 & 6: Peter S. Pacheco: An Introduction to Parallel Programming, Morgan Kaufmann Publishers, 2011.

(3) Lecture 7 & 8: Nancy A. Lynch, “Distributed Algorithms,” Morgan Kaufmann Publishers, 1996.

Page 4: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

4

Lecture1: Introduction to Parallel Computing

Page 5: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

5

Problems with large computing complexity   Computing hard problems (NP-complete problems)

exponential computing time. Problems with large scale of input size

quantum chemistry, statistic mechanics, relative physics, universal physics, fluid

mechanics, biology, genetics engineering, …

For example, it costs 1 hour using the current computer to simulate the procedure

of 1 second reaction of protein molecule and water molecule. It costs

reaction.hour 1 of procedure thesimulate to years1014.1 8

Why Parallel Computing

Page 6: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

6

Why parallel Computing

Physical limitation of CPU computational power

  In past 50 years, CPU was speeded up double every 2.5 years. But, there are a physical limitation. Light speed is

Therefore, the limitation of the number of CPU clocks is expected to be about 10GHz.

To solve computing hard problems •   Parallel processing•   DNA computer •   Quantum computer

.sec/103 8m

Page 7: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

7

What is parallel computing

Using a number of processors to process one task Speeding up the processing by distributing it to the processors

One problem

Page 8: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

8

Classification of parallel computers

Two kinds of classification1. Flann’s Classification   SISD (Single Instruction stream, Single Data stream)   MISD (Multiple Instruction stream, Single Data stream)   SIMD (Single Instruction stream, Multiple Data stream)   MIMD (Multiple Instruction stream, Multiple Data stream)

2. Classification by memory status Share memory Distributed memory  

Page 9: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

9

Flynn’s classification(1)

SISD (Single Instruction Single Data) computer Von Neuman’s one processor computer

Control Processor MemoryInstruction Stream

Data Stream

Page 10: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

10

Flynn’s classification (2)

MISD (Multi Instructions Single Data) computer  All processors share a common memory, have their own control devices and execute their own instructions on same data.

Memory

Control ProcessorInstruction

Stream

Data Stream

Control Processor

Instruction Stream

Control Processor

Instruction Stream

Page 11: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

11

Flynn’s classification (3)

SIMD (Single Instructions Multi Data) computer• Processors execute the same instructions on different data • Operations of processors are synchronized by global clock.

Shared Memory or Inter- connection Nerwork

ProcessorData

Stream

ControlProcessor

Instruction Stream

Processor

Data Stream

Data Stream

Page 12: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

12

Flynn’s classification (4)

MIMD (Multi Instruction Multi Data) Computer • Processors have their own control devices, and execute different

instructions on different data.• Operations of processors are executed asynchronously in most time. • It is also called as distributed computing system.

Shared Memory or Inter- connection Nerwork

ProcessorData

StreamProcessor

ControlInstruction

Stream

Processor

Data Stream

Data Stream

ControlInstruction

Stream

ControlInstruction

Stream

Page 13: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

13

Classification by memory types (1)

1. Parallel computer with a shared common memory •   Communication based on shared memory

For example, consider the case that processor i sends some data to processor j. First, processor i writes the data to the share memory, then processor j reads the data from the same address of the share memory.

Shared Memory

Processor

Processor

Processor

Page 14: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

14

Classification by memory types (2)

Features of parallel computers with shared common memory  

Programming is easy.

Exclusive control is necessary for the access to the same memory cell.

 

Realization is difficult when the number of processors is large.

( Reason) The number of processors connected to shared memory is

limited by the physical factors such as the size and voltage of units, and

the latency caused in memory accessing.

Page 15: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

15

Classification by memory tapes (3)

2. Parallel computers with distributed memoryCommunication style is one to one based on interconnection network.

For example, consider the case that processor i sends data to processor j. First, processor i issues a send command such as “processor j sends xxx to process i”, then processor j gets the data by a receiving command.

Inter- connection Nerwork

Processor Local Memory

Processor Local Memory

Processor Local Memory

Page 16: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

16

Classification by memory types (4)

Features of parallel computers using distributed memory There are various architectures of interconnection networks (Generally,

the degree of connectivity is not large.) Programming is difficult since comparing with shared common memory

the communication style is one to one. It is easy to increase the number of processors.

Page 17: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

17

Types of parallel computers with distributed memory

Complete connection type Any two processors are connected

Features

Strong communication ability, but not practical

(each processor has to be connected to many processors).

Mash connection type Processors are connected as a two-dimension lattice.

Features

- Connected to few processors. Easily to increase

the number of processors.– Large distance between processors:  – Existence of processor communication bottleneck.

P1

P2 P5

P3 P4

P0,0 P0,1 P0,2 P0,3

P1,0 P1,1 P1,2 P1,3

P2,0 P2,1 P2,2 P2,3

P3,0 P3,1 P3,2 P3,3

n

Page 18: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

18

 Hypercube connection type

Processors connected as a hypercube (each processor has a binary number. (Processors are connected if only if one bit of their number are different. )

Features• Small distance between processors: log n.• Balanced communication load because of its symmetric structure.• Easy to increase the number of processors.

0100

0101

0110

0111

1100

1101

1110

1111

0000

0001

0010

0011

1000

1001

1010

1011

Types of parallel computers with distributed memory

Page 19: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

19

Other connected type

Tree connection type, butterfly connection type, bus

connection type.

Criterion for selecting an inter-connection network •   Small diameter (the largest distance between processors)

for small communication delay. •   Symmetric structure for easily increasing the number of

processors.

The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors.

The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors.

Types of parallel computers with distributed memory

Page 20: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

20

Real parallel processing system (1)

Early days parallel computer (ILLIAC IV)•   Built in 1972•   SIMD type with distributed memory, consisting of

64 processors

P0,0 P0,1 P0,2 P0,3

P1,0 P1,1 P1,2 P1,3

P2,0 P2,1 P2,2 P2,3

P3,0 P3,1 P3,2 P3,3

Transformed mash connection type, equipped with common data bus, common control bus, and one controlUnit.

Transformed mash connection type, equipped with common data bus, common control bus, and one controlUnit.

Page 21: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

21

Real parallel processing system (2)

Parallel computers in recent 1990s•   Shared common memory type

Workstation, SGI Origin2000 and other with 2-8 processors. •   Distributed memory type

Name Maker Processor num Processing type Network type

CM-2 TM 65536 SIMD hypercube

CM-5 TM 1024 MIMD fat tree

nCUBE2 NCUBE 8192 MIMD hypercube

iWarp CMU, Intel 64 MIMD 2D torus

Paragon Intel 4096 MIMD 2D torus

SP-2 IBM 512 MIMD HP switch

AP1000 Fujitsu 1024 MIMD 2D torus

SR2201 Hitachi 1024 MIMD crossbar

Page 22: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

22

Real parallel processing system (3)Deep blue•   Developed by IBM for chess game only •   Defeating the chess champion •   Based on general parallel computer SP-2

memory

RS/6000 Bus Interface

Microchannel Bus

Deep blue chip

Deep blue chip

Deep blue chip ( 8 )

RS/6000

node

memory

RS/6000

node

RS/6000

node

( 32 nodes)

Inter-connection network (Generalized hypercube)

memory memory

Page 23: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

23

K ( 京 )Computer - Fujitsu

Architecture: 88,128 SPARC64 VIIIfx 2.0 GHz 8 cores processors, 864 cabinets of each with 96 computing nodes and 6 I/O nodes, 6 dimension Tofu/torus interconnect, Linux-based enhanced operating system, open-source Open MPI libaray,12.6 MW, 10.51 petaflops, ranked as # 3 in 2011.

Page 24: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

24

Architecture: 18,688 AMD Opteron 6247 16-coresCPUs, Cray Linux, 8.2 MW, 17.59 petaflops, GPU based, Torus topology, ranked #1 in 2012.

Titan –Oak Ridge Lab

Page 25: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

25

天河一号 – 国家计算中心,天津

Architecture:

14,336 Xeon X5670 processors, 7168 Nvidia Tesla M2050 GPUs, 2048 FeiTeng 1000 SPARC-based processors, 4.7 petaflops. 112 computer cabinets, 8 I/O cabinets, 11D hypercube topology with IB QDR/DDR , Linux, #2 in 2011

Page 26: Dr. Wei Chen ( ), Professor Tennessee State University 2013 8 1 Lectures on Parallel and Distributed Computing

26

ExercisesCompare top three supercomputers in the world.