2005 ©erik f. dirkx limits of parallel/distributed computing prof.dr.ir. erik dirkx vrije...

29
2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel [email protected] http://parallel.vub.ac.be

Upload: branden-hoover

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Limits of

Parallel/Distributed Computing

Prof.dr.ir. Erik DIRKXVrije Universiteit Brussel

[email protected]://parallel.vub.ac.be

Page 2: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Introduction• (Cluster)Computers : a tool for a new way

of doing science & engineering (cheap:BYO !!!)

• “Hardware” ?

• “Software” ?

Page 3: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Need for “Speed”• Processing :

signal : structured (e.g. MP3)dynamic : unstructured

• Data : pictures, movie, simulation, …

• Interconnect :

bandwidth ><latency

Page 4: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

A law of nature ??•

Fundamental Observation : (Erik’s law)

(remember : 20%+ of earth = Si …we are C based …)

• Only general purpose programmable devices will survive in the long term yet …

“programmable” = ??

??*)cost(:

0)$(lim:cost

npriceprofit

chipofcopynthn

Page 5: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose Hardware• P(rocessor)

=> ALU (compute) + CU (control)

• M(emory)=> as much as possible=> as fast as possible

• S(witch)=> throughput (telecom !)=> latency (telecom ?)

Page 6: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose “Computer”

• Amplifying elements

transistor (n*1000 atoms + quantum

mechanics)

• Connecting Elements

wire/fibre/wireless(Maxwell equations)

Page 7: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

The original cluster (neo-cortex)• 10**11 general purpose neurons

=> compute & memory = “gray” matter

• 10**5 connections / neuron=> interconnect = “white” matter

• Switching time >1ms (digital PPM)• Input ~100 Mbps

(pre-thalamus)

• Output<<Input

storage : ~ 10**17 bits(do not drink & think …)

• ~20 W , Electro-Chemical,Carbo-Hydrate powered

Page 8: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

General Purpose (neo)Cortex• General purpose

“cellular columns”(e.g. blind musician)

• 6 layer : 1 in,1out,4 compute

• 4 A4 pages constant density

• Tuned by “emotional”subsystem: real time, pre-emptive priorities

• Hierarchy root = “prefrontal cortex” (L=+, R=-)

Page 9: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Where is our “software” ?• Human (general purpose)

speed - - [12km/h]

endurance - - [42.195 km]

power - - [200w@120km]

force - - [52*13 ???]

accuracy - -

(re?)-configurability +++

=> Learning (Software ?)

• Other predator (special purpose)

speed ++ (e.g. cheetah)

endurance ++ (e.g. orca)

power ++ (e.g. hyena)

force ++ (e.g. shark)

accuracy ++ (e.g. eagle)

++ @ price of general purposeness

=> Genetics (Hardware?)

Page 10: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Generic (Multi)processor

Processors/Memory

Interconnect (1)

Front-end Processors/Memory

Interconnect (2)

Intra/Internet

Interconnect 1 : High Bandwidth, Low Latency, DL-free (!!)Interconnect 2 : OTS TCP/IP

Page 11: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Concrete Examples

• Cluster computers : 2.0 .. 100k CPU • “Field Programmable Gate Array”

Satisfies description, Fundamental Limits !“Program”/(Re)configure ??

• Hybrid : 1) COW cluster + accelerator in each node

e.g. Deep Blue : 32 * ( 1 + 8) => Variable Granularity …

2) “Cell” type of compute engines (“software” = interesting PhD/career topic ??)

Page 12: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

VUB INFO : BYO • Design & Test

! Experience! Students

• Bottom Up & Top Down

CE/EE/Photonics/Applied Math

>< CS/Deductive

Engineering <> Specification/Design

Page 13: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Cluster BYO (2)• The “front-end” =>

Hierarchical Control (yang) (remember pre-frontal cortex…)

Bottleneck !!!!

><

Coordination/Distributed

(yin) (also PFC !!)

“order out of chaos” …

Page 14: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Alternative

• Special purpose device

=> temporary=> point solution

??? $$$$ [design, debug, …]??? Dynamic environment

!? Power (cf. context)?! Patent (cf. EU software patent dispute …)

Page 15: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Page 16: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Page 17: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Fundamental Bound (to Your enthousiasm ?) (physical)

technology

problem

)lg(**)$(

*),$(

2

1

max

1

nnkS

nkMP

SS

T

TSSpeedup

yet

n

0dn

dS0

dn

dS

0dn

dS

Page 18: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

• Critical Parameter

informally

=> Hard <> Easy Problems

Granularity

eCommunicat

Compute

TT

GGray > (Compute cap.)

White < (Communication cap.)

yang yin

Page 19: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (II)• Experience : situation, optimum :

too coarse => sub-optimal : ! [US-football]

too fine => comm bottleneck: ! [EU-soccer]

• Tcomp = # instr * CPI * 1/f = Rproblem* Rmachine

• Tcomm = latency + #bits/bandwidth?=? Cproblem * Cmachine

• Cproblem = #databits

• Cmachine = …

Page 20: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (III)

• Cmachine =

• bandwidth ~ 1012 b/s => bw-1~1 ps

• latency 10Ghz = 0.1ns= 3 cm (vacuum);3mm (si)=> 1ps ~ 30m

bandwidthbits

latency 1

#

Page 21: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Granularity (IV)

• Generalization of “Amdahl section”

• ?? How to construct a “compiler”/computersystem with

dynamically tunable machine granularity to adapt to

dynamically varying demands on R and C from application(s)

• Structured ?! [ad hoc] / Unstructured ??

Page 22: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Lookahead Accumulation inDiscrete Event Simulation

• Improve Gproblem through compile time aggregation

• A-synchronous synchronization system (!)

Page 23: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Fine Grain Parallellism

• FPGA implementation (NOT automatic)

• ATM switch sim @ faster than real time …

• Speed-up = traffic pattern dependent

Page 24: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Application Software

• BYO

• Public Domain Packages e.g. ScaLAPack=> granularity !!

• Numerical, well structured >< non-Numerical, dynamic, ill structureddatabases (e.g. Google)

• How to optimize in a multidimensional space => “optimal”

Computer system price/performance …

Page 25: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

System Software• Compute => Sequential Languages : e.g. p-STL

(?? Non-determinism, synchronization)• Storage => Virtual Memory, RAID, MRAM,…• Communication => Communication Library

e.g. “Parallel Virtual Machine” : Opene.g. “Message Passing Interface” : Standard

• Fundamental Issue :

Parallel Operating System n*Linux + MPI …

(21st century Microsoft/Intel ?)

What should be in “generic” system s/w vs. “application specific” user s/w ??

Page 26: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Conclusion• HPC computing as a tool is here to stay• Cluster computing is a vehicle for a new way of doing science & engineering

(for the masses)• COW is only one example of compute engines satisfying fundamental laws =>

other MOC ?• (Digital) hardware : understood & economically sound• “Software” : cf. 1950’s

ad-hoc, need for language(s), theorethical support, run-time, fault tolerance, …• VUB : “Advanced Computer Architecture” + “Concurrente

Systemen”(NL)/”Parallel Systems”(E)• http://parallel.vub.ac.be

Page 27: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Vrije Universiteit Brussel : location

• Belgium : a EU experiment avant la lettre ??

Holland (A’dam) France (Paris)• ° [Alamo – 6]• 3 languages (NL,F,D)• 5 governements

(w/o county,city !)• NO supercomputer

• (Meta) stable ??• ~ Free education• 60 km coast / 10 M

people, 1 2L highway• No capital gains tx …• L&H … (Martha ??)• Airforce : F16 – ECM• CEC location & 1 of

the capitals …

Page 28: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

Page 29: 2005 ©Erik F. Dirkx Limits of Parallel/Distributed Computing Prof.dr.ir. Erik DIRKX Vrije Universiteit Brussel Erik.Dirkx@vub.ac.be

2005 ©Erik F. Dirkx

history• 1985 : B Army

mainframe + staff : <100 Kops, x00 MB,n*4800 bps

PC to “fine tune” + 1 temporary mil. service : >1Mips,20MB,10Mbps+ 1 EE/CS student in search for a PhD topic

• 1990 : “A Parallel Simulation Testbed for Computer Networks” : solved 0.1, posed 10 questions …

• 1992 : IBM T.J. Watson Vulcan/Deep Blue• 1993 : ETL, Tsukuba : Heterogeneous granularity • 1999 : Xilinx, San Jose : Reconfiguration• 2004 : UC Irvine