stanford streaming supercomputer (sss) fall quarter 2002 wrapup meeting

22
SS-FQ02-W: 1 December 10, 2002 Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting Bill Dally, Computer Systems Laboratory Stanford University December 10, 2002

Upload: juliana-ortiz

Post on 03-Jan-2016

36 views

Category:

Documents


5 download

DESCRIPTION

Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting. Bill Dally, Computer Systems Laboratory Stanford University December 10, 2002. Overview. Where we are today First year goal was met: demonstrated feasibility on single node - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 1 December 10, 2002

Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

Bill Dally, Computer Systems Laboratory

Stanford University

December 10, 2002

Page 2: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 2 December 10, 2002

Overview

• Where we are today– First year goal was met: demonstrated feasibility on single node– Feedback from site visit team was very positive – Potential for a big impact on scientific computing– But still much to do!

• Key FY03 goals– Get long-term software infrastructure in place

• Select approach, implement baseline Brook to SSS compiler– Multi-node versions that scale

• Language, compiler, simulator– Tackle hard problems: 3-D, Irregular neighborhoods/sparse matrix

solve• Language support, numerics support, evaluate on simulator

– Refine architecture• Cluster organization, aspect ratio, register organization, memory

organization– Industrial Partner

• Start serious discussions, outreach to build support, close partner in 04

Page 3: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 3 December 10, 2002

But first, lets review our overall goal

Exploit capabilities of VLSI to realize cost-effective scientific computing.

Page 4: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 4 December 10, 2002

The big picture

• VLSI technology enables us to put TeraOPS on a chip– Conventional general-purpose architecture cannot exploit this– The problem is bandwidth

• Streams expose locality and concurrency– Perform operations in record (not operation as with vector)

order– Enables compiler optimization at a larger scale than scalar

processing

• A stream architecture achieves high arithmetic intensity– Intensity = arithmetic rate/bandwidth– Bandwidth hierarchy, compound stream operations

• A Streaming Supercomputer is feasible– 100GFLOPS (64-b) on a chip, 1TFLOPS single-board computer,

PFLOPS systems

Page 5: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 5 December 10, 2002

Review – What is the SSS Project About?

• Exploit streams to give 100x improvement in performance/cost for scientific applications vs. ‘cluster’ supercomputers

– From 100 GFLOPS PCs to TFLOPS single-board computers to PFLOPS supercomputers

• Use layered programming system to simplify development and tuning of applications

– Stream languages– Streaming virtual machine

• Demonstrated feasibility of streaming scientific computing in year 1• Refine architecture and programming system in year 2

– Demonstrate realistic applications (3D, irregular)– Build usable compiler– Resolve architecture questions – aspect ratio, conditional execution,

sparse clusters, reg organization, memory system, etc…• Build a prototype and demonstrate CITS applications in years 3-6

– With industrial and government partners– Broaden our base of support

Page 6: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 6 December 10, 2002

Software Infrastructure

• Compiler– Decide on flow from Brook->SVM->SSS– Select base compiler

• ORC, Gnu, SUIF, Tendra, others…

– “Spike” a simple program from Brook->SSS– Optimizations

• SVM Simulator

Page 7: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 7 December 10, 2002

3-D Applications

• StreamFLO• StreamFEM• StreamMD/Gromacs

Page 8: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 8 December 10, 2002

Irregular Grids

• Need an application• Brook support for variable degree• Architecture/run-time support

Page 9: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 9 December 10, 2002

Multi-Node Execution

• Brook support• Manual partitioning for first step• Simple application on SVM simulator

Page 10: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 10 December 10, 2002

Industrial Partner

• Candidates– Cray, IBM, Sun, HP, SGI, Intel

• Initial discussion– Present SSS project and results to date– Discuss collaboration models– Identify next steps

Page 11: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 11 December 10, 2002

Outreach

• National Labs– Los Alamos– Livermore– Sandia

• Other Government– NASA– DARPA– DoD (Charlie Holland)– AFOSR

• User communities

Page 12: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 12 December 10, 2002

Software Fall 02 Goals

• Brook– Multi-node issues:

• Synchronization primitives• Data Partitioning

– Variable length records• SVM

– Multi-node simulator– Performance numbers for 3 apps

• Compilation– Pick new infrastructure & design compiler (Reservoir)– Generate SVM code from Brook – (StreamC to SVM)– SVM to {SMP, graphics, SSS} (SVM is SMP)

• Run-Time (Software services)– Identify issues

• Issues– Variable length records? With stencils?

Page 13: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 13 December 10, 2002

Software Win 02 Goals

Brook– Define carefully the semantics of the operators– Work on “views of memory” abstraction– Support for partitioning, shared memory, naming, fitting

into stream abstraction– Support for irregular neighborhoods– Multithreaded version (Christos)– Concrete Winter goals [Ian/Frank]

• Review of the language [Pat]• Partitioning (UPC)• Multi-node/Multi-threaded version• Irregular support – w/ application• PPoPP paper• MD on BRT

Page 14: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 14 December 10, 2002

Software Win 02 Goals

SVM– Finish prototype single node implementation [Done]– Compiler issue– Implement multinode version

• w/ multi-node app.• Start with one that runs on one processor [Francois]• Multithreaded on SMP – on SGI [+]• Cluster version [++]

– SVM to simulator path• Mattan – not an intermediate between Brook and SSS

Page 15: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 15 December 10, 2002

Software Win 02 Goals (3 of 3)

• Start regular meetings• Compiler

– Decide on flow from Brook->SVM->SSS [Mattan]

• Requirements

– Select base compiler [Jayanth]• ORC, Gnu, SUIF, Tendra, others…

– “Spike” a simple program from Brook->SSS [Mattan/Jayanth ++]

– Brook to Nvidia– Optimizations [Spring]

• Run time– Write a white paper

Page 16: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 16 December 10, 2002

Application Fall 02 Goals

• SteamMD– Migrate to Gromacs

• StreamFlo– Complete– 3D

• StreamFEM– 3D– Sparse LA

• Scalability – multiple nodes• Look at Sierra, purple benchmarks: ppm,

sweep3D

Page 17: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 17 December 10, 2002

Application Win 02 Goals

• StreamFLO[Fatica]– Partioned version; scalable– Convert to 3D

• StreamFEM [Barth]– Partioned version; scalable– Convert to 3D– Sparse LA

• StreamMD [Eric/student]– Migrate to GROMACS [Vijay Pande/Michael Levitt groups]– Redo inner (force) and outer (neighbor) loops– Partitioned version; scalable– Finish port to NV30: build cluster and folding@home

• Model applications [Ron/Frank]– Model PDES with sparse matrix solves

• An irregular application [Ron/Frank]• Look at Sierra, purple benchmarks: ppm, sweep3D [delay]

Page 18: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 18 December 10, 2002

Architecture Fall 02 Goals

• Simulator– Multi-node working– Indexable SRF– Scalar processor

• Point Studies– Conditionals– Aspect ratio– Indexable SRF– Add & Store (remote ops in general)– Iterative operations & extended precision– Network

• Spec– Flesh out I/O

• App studies

Page 19: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 19 December 10, 2002

Architecture Win 02 Goals

• Single-Node Simulator [Jung-Ho, Knight]– 64-bit support, MULADD, Scalar Processor

• Multi-Node Simulator [Jung-Ho, Abhishek]– Network model– Multi-node mechanisms

• Point Studies– Aspect ratio

• SSE vs VLIW– Conditional execution [Mattan/Ujval]– Sparse clusters– SRF organization [Nuwan]– Cache alternatives [Jung Ho]– Add and store study [Jung Ho]– I/O– Iterative operations [Francois]

Page 20: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 20 December 10, 2002

Special Win 02 Goals

• Fix website [Pat]– Public and private websites

• Name that computer– Mississippi– Axios– Submit names to Mattan– Bill, Pat, Bill to choose

• Project Party

Page 21: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 21 December 10, 2002

Winter Quarter Meeting Schedule

• 1/7 Ron Anything• 1/14 Francois/Mattan What is SVM• 1/21 Fatica 3D Flo• 1/28 Pat RTSL partitioning• 2/4 Bill Carlson [Pat] UPC• 2/11 Francois/Ian Discussion of targets

SSS/CG/MPI• 2/18 Tim B. Irregular grid• 2/25 Mattan Compilation Infrastructure• 3/4 Jung Ho Add & Store• 3/11 Bill Wrapup

Page 22: Stanford Streaming Supercomputer (SSS) Fall Quarter 2002 Wrapup Meeting

SS-FQ02-W: 22 December 10, 2002

Papers• Arch

– Indexable SRFs (Nuwan)– Streaming Supercomputer Overview (Tim K.)– Streaming on conventional CPUs (Mattan)– Conditionals (Ujval)– Remote Ops (Jung Ho)– Aspect Ratio (?)– Data parallel (SSE) vs. ILP (VLIW)

• Software– Design of Brook (Ian)– Data parallel programming on graphics HW (Pat)– Brook to CG

• Compiler• Apps

– Gromacs– StreamFEM (Tim2)

• Overview (Bill and Pat)