2005 ©erik f. dirkx limits of parallel/distributed computing prof.dr.ir. erik dirkx vrije...
TRANSCRIPT
2005 ©Erik F. Dirkx
Limits of
Parallel/Distributed Computing
Prof.dr.ir. Erik DIRKXVrije Universiteit Brussel
[email protected]://parallel.vub.ac.be
2005 ©Erik F. Dirkx
Introduction• (Cluster)Computers : a tool for a new way
of doing science & engineering (cheap:BYO !!!)
• “Hardware” ?
• “Software” ?
2005 ©Erik F. Dirkx
Need for “Speed”• Processing :
signal : structured (e.g. MP3)dynamic : unstructured
• Data : pictures, movie, simulation, …
• Interconnect :
bandwidth ><latency
2005 ©Erik F. Dirkx
A law of nature ??•
Fundamental Observation : (Erik’s law)
(remember : 20%+ of earth = Si …we are C based …)
• Only general purpose programmable devices will survive in the long term yet …
“programmable” = ??
??*)cost(:
0)$(lim:cost
npriceprofit
chipofcopynthn
2005 ©Erik F. Dirkx
General Purpose Hardware• P(rocessor)
=> ALU (compute) + CU (control)
• M(emory)=> as much as possible=> as fast as possible
• S(witch)=> throughput (telecom !)=> latency (telecom ?)
2005 ©Erik F. Dirkx
General Purpose “Computer”
• Amplifying elements
transistor (n*1000 atoms + quantum
mechanics)
• Connecting Elements
wire/fibre/wireless(Maxwell equations)
2005 ©Erik F. Dirkx
The original cluster (neo-cortex)• 10**11 general purpose neurons
=> compute & memory = “gray” matter
• 10**5 connections / neuron=> interconnect = “white” matter
• Switching time >1ms (digital PPM)• Input ~100 Mbps
(pre-thalamus)
• Output<<Input
storage : ~ 10**17 bits(do not drink & think …)
• ~20 W , Electro-Chemical,Carbo-Hydrate powered
2005 ©Erik F. Dirkx
General Purpose (neo)Cortex• General purpose
“cellular columns”(e.g. blind musician)
• 6 layer : 1 in,1out,4 compute
• 4 A4 pages constant density
• Tuned by “emotional”subsystem: real time, pre-emptive priorities
• Hierarchy root = “prefrontal cortex” (L=+, R=-)
2005 ©Erik F. Dirkx
Where is our “software” ?• Human (general purpose)
speed - - [12km/h]
endurance - - [42.195 km]
power - - [200w@120km]
force - - [52*13 ???]
accuracy - -
(re?)-configurability +++
=> Learning (Software ?)
• Other predator (special purpose)
speed ++ (e.g. cheetah)
endurance ++ (e.g. orca)
power ++ (e.g. hyena)
force ++ (e.g. shark)
accuracy ++ (e.g. eagle)
++ @ price of general purposeness
=> Genetics (Hardware?)
2005 ©Erik F. Dirkx
Generic (Multi)processor
Processors/Memory
Interconnect (1)
Front-end Processors/Memory
Interconnect (2)
Intra/Internet
Interconnect 1 : High Bandwidth, Low Latency, DL-free (!!)Interconnect 2 : OTS TCP/IP
2005 ©Erik F. Dirkx
Concrete Examples
• Cluster computers : 2.0 .. 100k CPU • “Field Programmable Gate Array”
Satisfies description, Fundamental Limits !“Program”/(Re)configure ??
• Hybrid : 1) COW cluster + accelerator in each node
e.g. Deep Blue : 32 * ( 1 + 8) => Variable Granularity …
2) “Cell” type of compute engines (“software” = interesting PhD/career topic ??)
2005 ©Erik F. Dirkx
VUB INFO : BYO • Design & Test
! Experience! Students
• Bottom Up & Top Down
CE/EE/Photonics/Applied Math
>< CS/Deductive
Engineering <> Specification/Design
2005 ©Erik F. Dirkx
Cluster BYO (2)• The “front-end” =>
Hierarchical Control (yang) (remember pre-frontal cortex…)
Bottleneck !!!!
><
Coordination/Distributed
(yin) (also PFC !!)
“order out of chaos” …
2005 ©Erik F. Dirkx
Alternative
• Special purpose device
=> temporary=> point solution
??? $$$$ [design, debug, …]??? Dynamic environment
!? Power (cf. context)?! Patent (cf. EU software patent dispute …)
2005 ©Erik F. Dirkx
2005 ©Erik F. Dirkx
2005 ©Erik F. Dirkx
Fundamental Bound (to Your enthousiasm ?) (physical)
technology
problem
)lg(**)$(
*),$(
2
1
max
1
nnkS
nkMP
SS
T
TSSpeedup
yet
n
0dn
dS0
dn
dS
0dn
dS
2005 ©Erik F. Dirkx
• Critical Parameter
informally
=> Hard <> Easy Problems
Granularity
eCommunicat
Compute
TT
GGray > (Compute cap.)
White < (Communication cap.)
yang yin
2005 ©Erik F. Dirkx
Granularity (II)• Experience : situation, optimum :
too coarse => sub-optimal : ! [US-football]
too fine => comm bottleneck: ! [EU-soccer]
• Tcomp = # instr * CPI * 1/f = Rproblem* Rmachine
• Tcomm = latency + #bits/bandwidth?=? Cproblem * Cmachine
• Cproblem = #databits
• Cmachine = …
2005 ©Erik F. Dirkx
Granularity (III)
• Cmachine =
• bandwidth ~ 1012 b/s => bw-1~1 ps
• latency 10Ghz = 0.1ns= 3 cm (vacuum);3mm (si)=> 1ps ~ 30m
bandwidthbits
latency 1
#
2005 ©Erik F. Dirkx
Granularity (IV)
• Generalization of “Amdahl section”
• ?? How to construct a “compiler”/computersystem with
dynamically tunable machine granularity to adapt to
dynamically varying demands on R and C from application(s)
• Structured ?! [ad hoc] / Unstructured ??
2005 ©Erik F. Dirkx
Lookahead Accumulation inDiscrete Event Simulation
• Improve Gproblem through compile time aggregation
• A-synchronous synchronization system (!)
2005 ©Erik F. Dirkx
Fine Grain Parallellism
• FPGA implementation (NOT automatic)
• ATM switch sim @ faster than real time …
• Speed-up = traffic pattern dependent
2005 ©Erik F. Dirkx
Application Software
• BYO
• Public Domain Packages e.g. ScaLAPack=> granularity !!
• Numerical, well structured >< non-Numerical, dynamic, ill structureddatabases (e.g. Google)
• How to optimize in a multidimensional space => “optimal”
Computer system price/performance …
2005 ©Erik F. Dirkx
System Software• Compute => Sequential Languages : e.g. p-STL
(?? Non-determinism, synchronization)• Storage => Virtual Memory, RAID, MRAM,…• Communication => Communication Library
e.g. “Parallel Virtual Machine” : Opene.g. “Message Passing Interface” : Standard
• Fundamental Issue :
Parallel Operating System n*Linux + MPI …
(21st century Microsoft/Intel ?)
What should be in “generic” system s/w vs. “application specific” user s/w ??
2005 ©Erik F. Dirkx
Conclusion• HPC computing as a tool is here to stay• Cluster computing is a vehicle for a new way of doing science & engineering
(for the masses)• COW is only one example of compute engines satisfying fundamental laws =>
other MOC ?• (Digital) hardware : understood & economically sound• “Software” : cf. 1950’s
ad-hoc, need for language(s), theorethical support, run-time, fault tolerance, …• VUB : “Advanced Computer Architecture” + “Concurrente
Systemen”(NL)/”Parallel Systems”(E)• http://parallel.vub.ac.be
2005 ©Erik F. Dirkx
Vrije Universiteit Brussel : location
• Belgium : a EU experiment avant la lettre ??
Holland (A’dam) France (Paris)• ° [Alamo – 6]• 3 languages (NL,F,D)• 5 governements
(w/o county,city !)• NO supercomputer
• (Meta) stable ??• ~ Free education• 60 km coast / 10 M
people, 1 2L highway• No capital gains tx …• L&H … (Martha ??)• Airforce : F16 – ECM• CEC location & 1 of
the capitals …
2005 ©Erik F. Dirkx
2005 ©Erik F. Dirkx
history• 1985 : B Army
mainframe + staff : <100 Kops, x00 MB,n*4800 bps
PC to “fine tune” + 1 temporary mil. service : >1Mips,20MB,10Mbps+ 1 EE/CS student in search for a PhD topic
• 1990 : “A Parallel Simulation Testbed for Computer Networks” : solved 0.1, posed 10 questions …
• 1992 : IBM T.J. Watson Vulcan/Deep Blue• 1993 : ETL, Tsukuba : Heterogeneous granularity • 1999 : Xilinx, San Jose : Reconfiguration• 2004 : UC Irvine