network dilation - university of california, santa barbaraparhami/pres_folder/parh16-network... ·...
TRANSCRIPT
Network Dilation:A Strategy for Building Families of Parallel Processing Architectures
Behrooz ParhamiDept. Electrical & Computer Eng.Univ. of California, Santa Barbara
Parallel Computer Architecture
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 02
Nodes or processors
Interconnects, communication
channels,or links
Parallel computer = Nodes + Interconnects(+ Switches)
B. Parhami,Plenum Press,1999
Interconnection Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 03
Node degreed
(max, min)
DiameterD Bisection
bandwidthB
Longest wire
Other attributes:Regularity
ScalabilityPackageability
Robustness
Numberof nodes
p
Heterogeneous or homogeneous nodes
Four Example Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 04
(a) 2D torus (b) 4D hypercube
(c) Chordal ring (d) Ring of rings
Nodes p = 16Degree d = 4Diameter D
Bisection BLongest wireRegularityScalabilityPackageabilityRobustness
10
Avg. distance
Spectrum of Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 05
P2P0 P1 P3 P4 P5 P6 P7 P8
P2P0 P1 P3 P4 P5 P6 P7 P8
P P P
P P P
P P P
0 1 2
3 4 5
6 7 8
P P P
P P P
P P P
0 1 2
3 4 5
6 7 8
P1
P0
P3
P4
P2P5
P7 P8
P6
Complete network
p/2 log p / log log p p 1 1 p 2
Sublogarithmic diameter Superlogarithmic diameter
PDN Star, pancake
Binary tree, hypercube
Torus Ring Linear array
log p
Sublogarithmic diameter Superlogarithmic diameter
P 1
P 2
P 3
P 4
P 5
P 6
P 7
P 8
P 0
Complete network
p/2 log p / log log p p 1 1 p 2
Sublogarithmic diameter Superlogarithmic diameter
PDN Star, pancake
Binary tree, hypercube
Torus Ring Linear array
log p
Sublogarithmic degree Superlogarithmic degree
Completenetwork
PDNHypercubeLinear array,ring
Direct Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 06
Nodes (or associated routers) directly linked to each other
Processor
Router
Router for a degree-dnode with q processors: d q bidirectional switch
Indirect Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 07
Processor
Router
Nodes (or associated routers) linked via intermediate switches
A Sea of Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 08
A Bit of History: Moving Full Circle
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 09
1960sMesh-based(ILLIAC IV)
1970sButterfly,
other MINs
1980sHypercube,bus-based
1990sFat tree,
LAN-based
Direct to indirect,shared memory
Lower diameter,message passing
Scalability,local wires
Greaterbandwidth
So, only a small portion of the sea of networks has been explored in practical parallel computers
2000s
The (d, D) Graph Problem
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 010
Suppose you have an unlimited supply of degree-d nodesHow many can be connected into a network of diameter D?Example 1: d = 3, D = 2; 10-node Petersen graph
Example 2: d = 7, D = 2; 50-node Hoffman-Singleton graph
Moore bound (undirected graphs)p 1 + d + d(d – 1) + . . . + d(d – 1)D–1
= 1 + d [(d – 1)D – 1]/(d – 2)Only ring with odd p and a few othernetworks match this bound x
d nodesd (d – 1) nodes
Symmetric Network
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 011
Viewed from any node, it looks the same
Asymmetric example
Symmetric example
Implications of Symmetry for Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 012
A degree-4 network Routing algorithm the
same for every node No weak spots
(critical nodes or links) Maximum number of
alternate paths feasible Derivation and proof of
properties easier
We need to prove a particular topological or routing property for only one node
A Necessity for Symmetry
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 013
Uniform node degree:d = 4; din = dout = 2
An asymmetric networkWith uniform node degree
Uniform node degree is necessary but not sufficient for symmetry
Interconnection Network Research• Topologies for connecting processing nodes
Devising and assessing new interconnection schemes• Routing algorithms and their performance
Oblivious / adaptive routing, deadlock avoidance/recovery• Layout and packaging of networks
Routing of links within / between chips, boards, cabinets• Robustness of interconnection networks
Reconfiguration capabilities and fault-tolerant routing • Networks-on-chip (NoC)
Optimal interconnection strategies for systems-on-chip• Data-center communication networks
Optimized for data-center traffic and energy efficiency
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 014
My Personal Research History
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 015
We are here
Grad 19681970
1974
19861988
My children, 22‐31
1969
The Challenge of Comparing Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 016
Liszka et al.: Is an alligator better than an armadillo?
My Pervious Work on Network Families
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 017
Systematic pruning
z
xy
IPL, 1998
IEEE TPDS, 2001
Y
X
Z
Swapped/OTIS networksBiswapped networks
Cluster i
Cluster j
Node j Node i
Node j
Node iJPDC, 2005
Int’l J Comp Math, 2011
My Previous Work on Dilated Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 018
Dilation along a Hamiltonian path of a de Bruijn network(Xiao, Liang, Parhami; IPL, 2012)
Switch Networks Used in Examples
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 019
Small example networks to illustrate the concepts3D hypercube = 3-cube (8 nodes, d = 3, D = 3)K4-connected cycles (12 nodes, d = 3, D = 3)
3-cube K4-connected cycles
Simplest Parallel Architectures
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 020
One processing node per switch/router nodeD = 2 + switch network diameterd = 1 + switch network degreeDegree-1 processing nodes
Alternative Parallel Architectures
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 021
One processing node per switch/router linkD 2 switch network diameterd = switch network degreeDegree-2 processing nodes
3- and 2-Dilated Network Examples
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 022
k processing nodes per switch/router linkD (k + 1) switch network diameterd = switch network degreeDegree-2 processing nodes
Diameter of Dilated Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 023
The diameter of a k-dilated network based on a diameter-Dsswitch network is bounded as (k + 1)Ds D (k + 1) Ds + kBoth bounds are tight, in the sense of equality being possible on both sides for suitably chosen networks.
E
BUB VB
UE
VE
xk+1–x
yk+1–y
Worst case: All four UB UE, UB VE, VB UE, VB VE paths are diametral
Best case: There is a non-diametral switch path (which can be at most one hop shorter than Ds)
Proof details in my forthcoming Scientia Iranica paper
Average Distance and Bisection Width
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 024
The average internode distance of a k-dilated network based on a switch network with average internode hop distance sis = (k + 1)s + k/2 + 1 + (k mod 2)/(2k).Proof details in my forthcoming Scientia Iranica paper
E
BUB VB
UE
VE
xk+1–x
yk+1–y
The bisection (band)width B of a dilated network remains the same as the bisection Bs of the switch network usedProof details in my forthcoming Scientia Iranica paper
Aggregate Bandwidth and Its Scalability
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 025
Network bisection B = Bs shows lack of scalabilitySo, unless traffic is mostly local, performance will suffer
Aggregate bandwidth Bagg = (k + 1)ndb[b is link bandwidth]
BW scalability ratioBSR = Bagg/ ndb/s
BSR is sublinear in the number knd/2 of nodes
For square torus of the same size: BSR = 8(knd / 2)1/2b
For hypercube of the same size: BSR = kndb / 2
Connectivity and Robustness
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 026
Processing node degree of 2 precludes a connectivity > 2
Connectivity of 2 can be achieved with many switch networks
All we need is for 2 of the 4 paths below to be node-disjoint
E
BUB VB
UE
VE
xk+1–x
yk+1–y
Fault diameter D + 2
Wide diameter D + 2
Superimposed Direct & Dilated Networks
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 027
k processing nodes per some switch/router linksD k + switch network diameterd = 2 switch network degreeDegree-2 processing nodes
Conclusions and Future Work
• A strategy for building families of networks– Variation in network size with same switch network– Same node architecture and routing used throughout– Applicable to many existing or proposed networks
• More network-independent / specific results– Improve, assess, and fine-tune the architectures– Use simulation to evaluate with realistic workloads– Derive scalability bounds, given performance goals– Which networks are better for use with dilation?– Full, partial, and hybrid schemes for network dilation
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 028
Questions or [email protected]
http://www.ece.ucsb.edu/~parhami/
B. Parhami Network Dilation: Building Families of Parallel Processing Architectures Slide # 029