the cosmic cube charles l. seitz presented by: jason d. robey 2 apr 03

18
The Cosmic Cube The Cosmic Cube Charles L. Seitz Charles L. Seitz Presented By: Jason D. Robey Presented By: Jason D. Robey 2 APR 03 2 APR 03

Upload: michael-stevens

Post on 05-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

The Cosmic CubeThe Cosmic Cube

Charles L. SeitzCharles L. Seitz

Presented By: Jason D. RobeyPresented By: Jason D. Robey2 APR 032 APR 03

Page 2: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

AgendaAgenda• Introduction• Message Passing• Process Oriented• Concurrency Paradigm• Hardware Description• Software Considerations• Measurements• Future Work• Summary

Page 3: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

IntroductionIntroduction

• How do we get a whole bunch of processors to work together on the same problem in a scalable way?

• Test bed developed at Caltech for what they hoped to be a VLSI implementation

• Programmer controls data sharing, not cache coherency mechanisms

• Techniques for certain problems that give close to linear speed-up

Page 4: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Message PassingMessage Passing

• Communication and synchronization primitives seen by programmer– Barrier– Blocking sends and receives– Broadcasts and node to node message passing– Explicit sharing of data through sending messages– Programmer decides when updates are necessary

• Hardware structure is memory/processor node– Separate consideration for memory vs. inter-process

communication– Optimize each– Memory is closer to where it will be used

Page 5: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Message PassingMessage Passing

• Hyper-cube communications– Scales well

• O(n lg n) cost• O(lg n) worst case message delivery

– Simple routing• Discrete, 2-valued, n-tuple• Process address gives routing instructions

– Clustering• Can use “spheres” of nodes for separate problems

Page 6: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Process OrientedProcess Oriented• Abstraction from direct hardware targeting• Processes mapped to nodes

– Multiple processes interleaved in single nodes– Unique addresses– Unique message channels– Programmer not concerned w/ actual number of

nodes and node addresses

• Kernel required on each node– Provides routing services– Provides process management services– Requires processing time

Page 7: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Process OrientedProcess Oriented

• Caltech disallows process node switching– Prevents effective run-time load balancing

• Programmer responsibility

– Allows node ID to be included w/ process ID• Can take advantage of hyper-cube routing

simplifications

• Issue: Interleaving may be bad in certain cases– Context switch for message passing

Page 8: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Concurrency ParadigmConcurrency Paradigm• Programmer must explicitly deal with

concurrency• Different from other approaches where

compiler or hardware is expected to find parallelization

• Requires a restructuring of single processor ides– Bubble sort becomes a linear solution– A lot of solutions need to be redesigned

altogether

Page 9: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Concurrency ParadigmConcurrency Paradigm

• Techniques– Exploit outer loop unrolling

• Sparse/Predictable messaging

• Good for science and engineering problems– Regular loops– Predictable flow– SIMD—Same thing on a whole lotta data

Page 10: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Hardware DescriptionHardware Description

• 64-node hyper-cube– 5 ft., 700 watts, $80,000– Linear projection– Simulation results led to hyper-cube choice– Allowed for slow network links compared to CPU

Speed• Node

– 8086 processor w/ 8087 coprocessor• Needed good floating-point operations• Slowed from 8 MHz to 5 MHz for 8087

– 128K RAM—Spend money on other things– 8K ROM for initialization and POSTs

Page 11: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Hardware DescriptionHardware Description

• Developed prototype as test bed and resource raiser

• 1981-1982 for first prototype 2-cube

• Summer of 1983 to 6-cube

• First year: 560,000 node hours– 2 hard errors– 1 soft error/several days

Page 12: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Software ConsiderationsSoftware Considerations

• Development and testing done on traditional machines

• Initialization had to deal with node checks in addition to RAM checks

• Extensions to C had to be developed to facilitate the machine’s use by other researchers

• Kernel must be developed– Deal with message passing constructs– Must manage requests from intermediate host (IH)– probe: Allows process access to message layer– spy: Allows IH to examine and modify kernel

execution data

Page 13: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

MeasurementsMeasurements

• Speedup = T(1)/T(n)

• Efficiency = Speedup / N– 1 is good– <= 1/N is bad

• Only really useful to measure scalability of an algorithm with problems requiring a lot more processes than nodes available

Page 14: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

MeasurementsMeasurements

• What affects efficiency? (Overhead)– Load balancing problems– Message start-up latency

• Big messages vs. small messages

– Hop latency– Processor time used in message routing

functions

Page 15: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

MeasurementsMeasurements

• Performance– Some apps achieved max of 3 MIPS in

floating-point ops– Many other apps reached optimal speed-up

compared to VAX11/780 with overheads of .025 - .5

• Low message frequency?

Page 16: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Future WorkFuture Work

• Move routing functions to network device• Experiment with hybrid shared memory

approach• Allow for dynamic load-balancing• Experiment with more programmer control of

process to node assignments• Try different problem areas to expand message

protocol• Make interface more programmer friendly

Page 17: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

SummarySummary

• New programming paradigm required

• Offers lots of advantages in the scientific and engineering problem set

• May be interesting to apply to other domains

• Achieved what appears to be excellent scalability

• Good success in limited domain

Page 18: The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03

Questions?Questions?Comments?Comments?

Snide Remarks?Snide Remarks?