introduction to parallel computingkarypis/parbook... · invalidate/update protocols the preferred...
TRANSCRIPT
![Page 1: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/1.jpg)
Introduction to Parallel Computing
George KarypisParallel Programming Platforms
![Page 2: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/2.jpg)
Elements of a Parallel ComputerHardware
Multiple ProcessorsMultiple MemoriesInterconnection Network
System SoftwareParallel Operating SystemProgramming Constructs to Express/Orchestrate Concurrency
Application SoftwareParallel Algorithms
Goal: Utilize the Hardware, System, & Application Software to either
Achieve Speedup: Tp = Ts/pSolve problems requiring a large amount of memory.
![Page 3: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/3.jpg)
Parallel Computing PlatformLogical Organization
The user’s view of the machine as it is being presented via its system software
Physical OrganizationThe actual hardware architecture
Physical Architecture is to a large extent independent of the Logical Architecture
![Page 4: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/4.jpg)
Logical Organization ElementsControl Mechanism
SISD/SIMD/MIMD/MISDSingle/Multiple Instruction Stream & Single/Multiple Data Stream
SPMD: Single Program Multiple Data
![Page 5: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/5.jpg)
Logical Organization Elements
Communication ModelShared-Address Space
UMA/NUMA/ccNUMA
Message-Passing
![Page 6: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/6.jpg)
Physical OrganizationIdeal Parallel Computer Architecture
PRAM: Parallel Random Access MachinePRAM Models
EREW/ERCW/CREW/CRCWExclusive/Concurrent Read and/or Write
Concurrent Writes are resolved viaCommon/Arbitrary/Priority/Sum
![Page 7: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/7.jpg)
Physical OrganizationInterconnection Networks (ICNs)
Provide processor-to-processor and processor-to-memory connectionsNetworks are classified as:
DynamicThe network consists of switching elements that the various processors attach to
indirect networkHistorically used to link processors-to-memory
shared-memory systems
StaticConsist of a number of point-to-point links
direct networkHistorically used to link processors-to-processors
distributed-memory system
![Page 8: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/8.jpg)
Static & Dynamic ICNs
![Page 9: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/9.jpg)
Evaluation Metrics for ICNsDiameter
The maximum distance between any two nodesSmaller the better.
ConnectivityThe minimum number of arcs that must be removed to break it into two disconnected networks
Larger the betterMeasures the multiplicity of paths
Bisection widthThe minimum number of arcs that must be removed to partition the network into two equal halves.
Larger the betterBisection bandwidth
Applies to networks with weighted arcs—weights correspond to the link width (how much data it can transfer)The minimum volume of communication allowed between any two halves of a network
Larger the betterCost
The number of links in the networkSmaller the better
![Page 10: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/10.jpg)
Metrics and Dynamic Networks
![Page 11: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/11.jpg)
Network TopologiesBus-Based Networks
Shared mediumInformation is being broadcastedEvaluation:
Diameter: O(1)Connectivity: O(1)Bisection width: O(1)Cost: O(p)
![Page 12: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/12.jpg)
Network TopologiesCrossbar Networks
Switch-based networkSupports simultaneous connectionsEvaluation:
Diameter: O(1)Connectivity: O(1)?Bisection width: O(p)?Cost: O(p2)
![Page 13: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/13.jpg)
Network TopologiesMultistage Interconnection Networks
![Page 14: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/14.jpg)
Multistage Switch Architecture
Pass-through
Cross-over
![Page 15: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/15.jpg)
Connecting the Various Stages
![Page 16: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/16.jpg)
Blocking in a Multistage SwitchRouting is done by comparing the bit-levelrepresentation of source and destination addresses.-match goes via pass-through-mismatch goes via cross-over
![Page 17: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/17.jpg)
Network TopologiesComplete and star-connected networks.
![Page 18: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/18.jpg)
Network TopologiesCartesian Topologies
![Page 19: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/19.jpg)
Network TopologiesHypercubes
![Page 20: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/20.jpg)
Network TopologiesTrees
![Page 21: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/21.jpg)
Summary of Performance Metrics
![Page 22: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/22.jpg)
Physical OrganizationCache Coherence in Shared Memory Systems
A certain level of consistency must be maintained for multiple copies of the same dataRequired to ensure proper semantics and correct program execution
serializabilityTwo general protocols for dealing with it
invalidate & update
![Page 23: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/23.jpg)
Invalidate/Update Protocols
![Page 24: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/24.jpg)
Invalidate/Update ProtocolsThe preferred scheme depends on the characteristics of the underlying application
frequency of reads/writes to shared variablesClassical trade-off between communication overhead (updates) and idling (stalling in invalidates)Additional problems with false sharingExisting schemes are based on the invalidate protocol
A number of approaches have been developed for maintaining the state/ownership of the shared data
![Page 25: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/25.jpg)
Communication Costs in Parallel Systems
Message-Passing SystemsThe communication cost of a data-transfer operation depends on:
start-up time: tsadd headers/trailer, error-correction, execute the routing algorithm, establish the connection between source & destination
per-hop time: thtime to travel between two directly connected nodes.
node latencyper-word transfer time: tw
1/channel-width
![Page 26: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/26.jpg)
Store-and-Forward & Cut-Through Routing
![Page 27: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/27.jpg)
Cut-through Routing Deadlocks
Messages 0, 1, 2, and 3need to go to nodes A, B,C, and D, respectively
![Page 28: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/28.jpg)
Communication Model Used for this Class
We will assume that the cost of sending a message of size m is:
In general true because ts is much larger than th and for most of the algorithms that we will study mtw is much larger than lth
![Page 29: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/29.jpg)
Routing MechanismsRouting:
The algorithm used to determine the path that a message will take to go from the source to destination
Can be classified along different dimensions
minimal vs non-minimaldeterministic vs adaptive
![Page 30: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/30.jpg)
Dimension Ordered RoutingThere is a predefined ordering of the dimensionsMessages are routed along the dimensions in that order until they cannot move any further
X-Y routing for meshesE-cube routine for hypercubes
![Page 31: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/31.jpg)
Topology EmbeddingsMapping between networks
Useful in the early days of parallel computing when topology specific algorithms were being developed.
Embedding quality metricsdilation
maximum number of lines an edge is mapped tocongestion
maximum number of edges mapped on a single link
![Page 32: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/32.jpg)
Mapping a Cartesian Topology onto a Hypercube
Cool things ☺
![Page 33: Introduction to Parallel Computingkarypis/parbook... · Invalidate/Update Protocols The preferred scheme depends on the characteristics of the underlying application frequency of](https://reader030.vdocuments.site/reader030/viewer/2022040114/5e2a7856e5df0548e8279f2e/html5/thumbnails/33.jpg)
Mapping a Cartesian Topology onto a Hypercube