multiprocessors and the interconnect. scope ztaxonomy zmetrics ztopologies zcharacteristics ycost...
TRANSCRIPT
![Page 1: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/1.jpg)
Multiprocessors and the Interconnect
![Page 2: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/2.jpg)
Scope
TaxonomyMetricsTopologiesCharacteristics
cost performance
![Page 3: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/3.jpg)
Interconnection
Carry data between processors and to memory
Interconnect components switches links (wires, fiber)
Interconnection network flavors static networks: point-to-point
communication linksAKA direct networks.
dynamic networks: switches and communication links
AKA indirect networks.
![Page 4: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/4.jpg)
Static vs. Dynamic
![Page 5: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/5.jpg)
Dynamic Networks
Switch: maps a fixed number of inputs to outputs
Number of ports on a switch = degree of the switch.
Switch cost grows as the square of switch degree peripheral hardware grows linearly with switch
degree packaging cost grows linearly with the number of
pins Key property: blocking vs. non-blocking
blocking path from p to q may conflict with path from r to s for independent p, q, r, s
non-blocking disjoint paths between each pair of independent sources
and sinks
![Page 6: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/6.jpg)
Network Interface
Processor node’s link to the interconnect Network interface responsibilities
packetizing communication data computing routing information buffering incoming/outgoing data
Network interface connection I/O bus: PCI or PCIx on many modern systems memory bus: e.g. AMD HyperTransport, Intel QuickPath higher bandwidth and tighter coupling than I/O bus
Network performance depends on relative speeds of I/O and memory buses
![Page 7: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/7.jpg)
Topologies
Many network topologiesTradeoff: performance vs. costMachines often implement
hybrids of multiple topologies packaging cost available components
![Page 8: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/8.jpg)
Metrics
Degree number of links per node
Diameter longest distance between two nodes in the
network
Bisection Width min # of wire cuts to divide the network in 2
halves
Cost: # links or switches
![Page 9: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/9.jpg)
Topologies: Bus
All processors access a common bus for exchanging data
Used in simplest and earliest parallel machines
Advantages distance between any two nodes is O(1) provides a convenient broadcast media
Disadvantages bus bandwidth is a performance
bottleneck
![Page 10: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/10.jpg)
Bus Systems
A bus system is a hierarchy of buses connection various system and subsystem components. has a complement of control, signal, and power lines.
a variety of buses in a system: Local bus – (usually integral to a system board)
connects various major system components (chips) Memory bus – used within a memory board to connect
the interface, the controller, and the memory cells Data bus – might be used on an I/O board or VLSI chip to
connect various components Backplane – like a local bus, but with connectors to
which other boards can be attached
![Page 11: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/11.jpg)
Bridges
The term bridge is used to denote a device that is used to connect two (or possibly more) buses.
The interconnected buses may use the same standards, or they may be different (e.g. PCI
in a modern PC).Bridge functions include
Communication protocol conversion Interrupt handling Serving as cache and memory agents
![Page 12: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/12.jpg)
Bus
Since much of the data accessed by processors is local to the processor, cache is critical for the performance of busbased machines
![Page 13: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/13.jpg)
Bus Replacement: Direct Connect
Intel Quickpath interconnect (2009 - present)
![Page 14: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/14.jpg)
Direct Connect: 4 Node Configurations
Figure Credit : The Opteron CMP NorthBridgeArchitecture, Now and in the Future, AMD , PatConway, Bill Hughes , HOT CHIPS 2006
4N SQXFIRE BW 14.9GB/sDiam 2 avg:1
4N FCXFIRE BW 29.9GB/sDiam 1, Avg: 0.75
![Page 15: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/15.jpg)
Direct Connect: 8 Node Configurations
![Page 16: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/16.jpg)
Crossbar Network
A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner
A non-blocking crossbar network connecting p processors to b memory banks
Cost of a crossbar: O(p^2) Generally difficult to scale for large values of p Earth Simulator: custom 640-way single-stage crossbar
![Page 17: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/17.jpg)
Assessing Network Alternatives
Buses excellent cost scalability poor performance scalability
Crossbars excellent performance scalability poor cost scalability
Multistage interconnects compromise between these
extremes
![Page 18: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/18.jpg)
Multistage Network
![Page 19: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/19.jpg)
Multistage Omega Network
Organization log p stages p inputs/outputs
At each stage, input i is connected to output j if:
![Page 20: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/20.jpg)
Omega Network Stage
Each Omega stage is connected in a perfect shuffle
![Page 21: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/21.jpg)
Omega Network Switches
2×2 switches connect perfect shuffles
Each switch operates in two modes
![Page 22: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/22.jpg)
Multistage Omega Network
Cost: p/2 × log p switching nodes → O(p log p)
![Page 23: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/23.jpg)
Omega Network Routing
Let s = binary representation of the source processor d = binary representation of the destination
processor or memory The data traverses the link to the first
switching node if the most significant bit of s and d are the same
route data in pass-through mode by the switch else
use crossover path Strip off leftmost bit of s and d Repeat for each of the log p switching
stages
![Page 24: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/24.jpg)
Omega Network Routing
![Page 25: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/25.jpg)
Blocking in an Omega Network
![Page 26: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/26.jpg)
Clos Network (non-blocking)
![Page 27: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/27.jpg)
Star Connected Network
Static counterparts of busesEvery node connected only to a
common node at the centerDistance between any pair of
nodes is O(1)
![Page 28: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/28.jpg)
Completely Connected Network
Each processor is connected to every other processor static counterparts of crossbars number of links in the network
scales as O(p^2)
![Page 29: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/29.jpg)
Linear Array
Each node has two neighbors: left & right
If connection between nodes at ends: 1D torus (ring)
![Page 30: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/30.jpg)
Meshes and k-d Meshes
Mesh: generalization of linear array to 2D nodes have 4 neighbors: north, south, east,
and west.k-d mesh:
d-dimensional mesh node have 2d neighbors
![Page 31: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/31.jpg)
Hypercubes
Special d-dimensional mesh: p nodes, d = log p
![Page 32: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/32.jpg)
Hypercube Properties
Distance between any two nodes is at most log p. Each node has log p neighbors Distance between two nodes = # of
bit positions that differ between node numbers
![Page 33: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/33.jpg)
Trees
![Page 34: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/34.jpg)
Tree Properties
Distance between any two nodes is no more than 2 log p
Trees can be laid out in 2D with no wire crossings
Problem links closer to root carry > traffic than
those at lower levels.Solution: fat tree
widen links as depth gets shallower copes with higher traffic on links near
root
![Page 35: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/35.jpg)
Fat Tree Network
Fat tree network for 16 processing nodes Can judiciously choose “fatness” of links take full advantage of technology and
packaging constraints
![Page 36: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/36.jpg)
Metrics for Interconnection Networks
![Page 37: Multiprocessors and the Interconnect. Scope zTaxonomy zMetrics zTopologies zCharacteristics ycost yperformance](https://reader036.vdocuments.site/reader036/viewer/2022070306/55177a625503460e6e8b5288/html5/thumbnails/37.jpg)
Metrics for Dynamic Interconnection Networks