the cosmic cube charles l. seitz presented by: jason d. robey 2 apr 03
TRANSCRIPT
![Page 1: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/1.jpg)
The Cosmic CubeThe Cosmic Cube
Charles L. SeitzCharles L. Seitz
Presented By: Jason D. RobeyPresented By: Jason D. Robey2 APR 032 APR 03
![Page 2: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/2.jpg)
AgendaAgenda• Introduction• Message Passing• Process Oriented• Concurrency Paradigm• Hardware Description• Software Considerations• Measurements• Future Work• Summary
![Page 3: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/3.jpg)
IntroductionIntroduction
• How do we get a whole bunch of processors to work together on the same problem in a scalable way?
• Test bed developed at Caltech for what they hoped to be a VLSI implementation
• Programmer controls data sharing, not cache coherency mechanisms
• Techniques for certain problems that give close to linear speed-up
![Page 4: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/4.jpg)
Message PassingMessage Passing
• Communication and synchronization primitives seen by programmer– Barrier– Blocking sends and receives– Broadcasts and node to node message passing– Explicit sharing of data through sending messages– Programmer decides when updates are necessary
• Hardware structure is memory/processor node– Separate consideration for memory vs. inter-process
communication– Optimize each– Memory is closer to where it will be used
![Page 5: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/5.jpg)
Message PassingMessage Passing
• Hyper-cube communications– Scales well
• O(n lg n) cost• O(lg n) worst case message delivery
– Simple routing• Discrete, 2-valued, n-tuple• Process address gives routing instructions
– Clustering• Can use “spheres” of nodes for separate problems
![Page 6: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/6.jpg)
Process OrientedProcess Oriented• Abstraction from direct hardware targeting• Processes mapped to nodes
– Multiple processes interleaved in single nodes– Unique addresses– Unique message channels– Programmer not concerned w/ actual number of
nodes and node addresses
• Kernel required on each node– Provides routing services– Provides process management services– Requires processing time
![Page 7: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/7.jpg)
Process OrientedProcess Oriented
• Caltech disallows process node switching– Prevents effective run-time load balancing
• Programmer responsibility
– Allows node ID to be included w/ process ID• Can take advantage of hyper-cube routing
simplifications
• Issue: Interleaving may be bad in certain cases– Context switch for message passing
![Page 8: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/8.jpg)
Concurrency ParadigmConcurrency Paradigm• Programmer must explicitly deal with
concurrency• Different from other approaches where
compiler or hardware is expected to find parallelization
• Requires a restructuring of single processor ides– Bubble sort becomes a linear solution– A lot of solutions need to be redesigned
altogether
![Page 9: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/9.jpg)
Concurrency ParadigmConcurrency Paradigm
• Techniques– Exploit outer loop unrolling
• Sparse/Predictable messaging
• Good for science and engineering problems– Regular loops– Predictable flow– SIMD—Same thing on a whole lotta data
![Page 10: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/10.jpg)
Hardware DescriptionHardware Description
• 64-node hyper-cube– 5 ft., 700 watts, $80,000– Linear projection– Simulation results led to hyper-cube choice– Allowed for slow network links compared to CPU
Speed• Node
– 8086 processor w/ 8087 coprocessor• Needed good floating-point operations• Slowed from 8 MHz to 5 MHz for 8087
– 128K RAM—Spend money on other things– 8K ROM for initialization and POSTs
![Page 11: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/11.jpg)
Hardware DescriptionHardware Description
• Developed prototype as test bed and resource raiser
• 1981-1982 for first prototype 2-cube
• Summer of 1983 to 6-cube
• First year: 560,000 node hours– 2 hard errors– 1 soft error/several days
![Page 12: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/12.jpg)
Software ConsiderationsSoftware Considerations
• Development and testing done on traditional machines
• Initialization had to deal with node checks in addition to RAM checks
• Extensions to C had to be developed to facilitate the machine’s use by other researchers
• Kernel must be developed– Deal with message passing constructs– Must manage requests from intermediate host (IH)– probe: Allows process access to message layer– spy: Allows IH to examine and modify kernel
execution data
![Page 13: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/13.jpg)
MeasurementsMeasurements
• Speedup = T(1)/T(n)
• Efficiency = Speedup / N– 1 is good– <= 1/N is bad
• Only really useful to measure scalability of an algorithm with problems requiring a lot more processes than nodes available
![Page 14: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/14.jpg)
MeasurementsMeasurements
• What affects efficiency? (Overhead)– Load balancing problems– Message start-up latency
• Big messages vs. small messages
– Hop latency– Processor time used in message routing
functions
![Page 15: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/15.jpg)
MeasurementsMeasurements
• Performance– Some apps achieved max of 3 MIPS in
floating-point ops– Many other apps reached optimal speed-up
compared to VAX11/780 with overheads of .025 - .5
• Low message frequency?
![Page 16: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/16.jpg)
Future WorkFuture Work
• Move routing functions to network device• Experiment with hybrid shared memory
approach• Allow for dynamic load-balancing• Experiment with more programmer control of
process to node assignments• Try different problem areas to expand message
protocol• Make interface more programmer friendly
![Page 17: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/17.jpg)
SummarySummary
• New programming paradigm required
• Offers lots of advantages in the scientific and engineering problem set
• May be interesting to apply to other domains
• Achieved what appears to be excellent scalability
• Good success in limited domain
![Page 18: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03](https://reader036.vdocuments.site/reader036/viewer/2022083005/56649f265503460f94c3e0f0/html5/thumbnails/18.jpg)
Questions?Questions?Comments?Comments?
Snide Remarks?Snide Remarks?