stream programming: luring programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf ·...
TRANSCRIPT
![Page 1: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/1.jpg)
Stream Programming: LuringProgrammers into the Multicore Era
Bill Thies
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology
Spring 2008
![Page 2: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/2.jpg)
Multicores are Here
1985 199019801970 1975 1995 2000
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 2
2005
Raw
Power4 Opteron
Power6
Niagara
YonahPExtreme
Tanglewood
Cell
IntelTflops
Xbox360
CaviumOcteon
RazaXLR
PA-8800
CiscoCSR-1
PicochipPC102
Broadcom 1480
20??
# ofcores
1
2
4
8
16
32
64
128256
512
Opteron 4PXeon MP
Athlon
AmbricAM2045
Tilera
![Page 3: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/3.jpg)
Multicores are Here
1985 199019801970 1975 1995 2000
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 2
2005
Raw
Power4 Opteron
Power6
Niagara
YonahPExtreme
Tanglewood
Cell
IntelTflops
Xbox360
CaviumOcteon
RazaXLR
PA-8800
CiscoCSR-1
PicochipPC102
Broadcom 1480
20??
# ofcores
1
2
4
8
16
32
64
128256
512
Opteron 4PXeon MP
Athlon
AmbricAM2045
Tilera
Hardware wasresponsible forimproving performance
![Page 4: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/4.jpg)
Multicores are Here
1985 199019801970 1975 1995 2000
4004
8008
80868080 286 386 486 Pentium P2 P3P4Itanium
Itanium 2
2005
Raw
Power4 Opteron
Power6
Niagara
YonahPExtreme
Tanglewood
Cell
IntelTflops
Xbox360
CaviumOcteon
RazaXLR
PA-8800
CiscoCSR-1
PicochipPC102
Broadcom 1480
20??
# ofcores
1
2
4
8
16
32
64
128256
512
Opteron 4PXeon MP
Athlon
AmbricAM2045
Tilera
Now, performanceburden falls onprogrammers
![Page 5: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/5.jpg)
Is Parallel Programming a New Problem?• No! Decades of research targeting multiprocessors
– Languages, compilers, architectures, tools…
• What is different today?1. Multicores vs. multiprocessors. Multicores have:
- New interconnects with non-uniform communication costs- Faster on-chip communication than off-chip I/O, memory ops- Limited per-core memory availability
2. Non-expert programmers- Supercomputers with >2048 processors today: 100 [top500.org]
- Machines with >2048 cores in 2020: >100 million [ITU, Moore]
3. Application trends- Embedded: 2.7 billion cell phones vs 850 million PCs [ITU 2006]
- Data-centric: YouTube streams 200 TB of video daily
![Page 6: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/6.jpg)
Streaming Application Domain• For programs based on streams of data
– Audio, video, DSP, networking, and cryptographic processing kernels
– Examples: HDTV editing, radar tracking, microphone arrays, cell phone base stations, graphics
Adder
Speaker
AtoD
FMDemod
LPF1
Duplicate
RoundRobin
LPF2 LPF3
HPF1 HPF2 HPF3
![Page 7: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/7.jpg)
Streaming Application Domain• For programs based on streams of data
– Audio, video, DSP, networking, and cryptographic processing kernels
– Examples: HDTV editing, radar tracking, microphone arrays, cell phone base stations, graphics
• Properties of stream programs– Regular and repeating computation– Independent filters
with explicit communication– Data items have short lifetimes
Adder
Speaker
AtoD
FMDemod
LPF1
Duplicate
RoundRobin
LPF2 LPF3
HPF1 HPF2 HPF3
![Page 8: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/8.jpg)
Brief History of Streaming1960 1970 1980 1990 2000
Models of Computation
Languages / Compilers
Modeling Environments
Petri NetsComp. Graphs
Kahn Proc. NetworksCommunicating Sequential Processes
SisalOccam
Lucid IdVALlazy
Synchronous Dataflow
Gabriel
LUSTRE
Ptolemy
EsterelC
Grape-IIMatlab/Simulink
etc.
ErlangpH
![Page 9: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/9.jpg)
Weaknesses• Unsuitable for static analysis• Cannot leverage deep results
from DSP / modeling community
Strengths• Elegance• Generality
Brief History of Streaming1960 1970 1980 1990 2000
Models of Computation
Languages / Compilers
Modeling Environments
Petri NetsComp. Graphs
Kahn Proc. NetworksCommunicating Sequential Processes
SisalOccam
Lucid IdVALlazy
Synchronous Dataflow
Gabriel
LUSTRE
Ptolemy
EsterelC
Grape-IIMatlab/Simulink
etc.
ErlangpH
![Page 10: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/10.jpg)
Weaknesses• Unsuitable for static analysis• Cannot leverage deep results
from DSP / modeling community
Strengths• Elegance• Generality
Brief History of Streaming1960 1970 1980 1990 2000
Models of Computation
Languages / Compilers
Modeling Environments
Petri NetsComp. Graphs
Kahn Proc. NetworksCommunicating Sequential Processes
SisalOccam
Lucid IdVALlazy
Synchronous Dataflow
Gabriel
LUSTRE
Ptolemy
EsterelC
Grape-IIMatlab/Simulink
etc.
ErlangpH
StreamItCg StreamC
Brook
“StreamProgramming”
![Page 11: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/11.jpg)
StreamIt: A Language and Compilerfor Stream Programs
• Key idea: design language that enables static analysis
• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain
• Project contributions:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]
– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]
– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]
– Cache-aware scheduling [LCTES'03, LCTES'05]
– Extracting streams from legacy code [MICRO'07]
– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]
– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications
![Page 12: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/12.jpg)
StreamIt: A Language and Compilerfor Stream Programs
• Key idea: design language that enables static analysis
• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain
• I contributed to:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]
– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]
– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]
– Cache-aware scheduling [LCTES'03, LCTES'05]
– Extracting streams from legacy code [MICRO'07]
– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]
– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications
![Page 13: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/13.jpg)
StreamIt: A Language and Compilerfor Stream Programs
• Key idea: design language that enables static analysis
• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain
• This talk:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]
– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]
– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]
– Cache-aware scheduling [LCTES'03, LCTES'05]
– Extracting streams from legacy code [MICRO'07]
– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]
– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications
![Page 14: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/14.jpg)
Part 1: Language Design
Joint work with Michael GordonWilliam Thies, Michal Karczmarek, Saman Amarasinghe (CC’02)
William Thies, Michal Karczmarek, Janis Sermulins, Rodric Rabbah,Saman Amarasinghe (PPoPP’05)
![Page 15: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/15.jpg)
StreamIt Language Basics• High-level, architecture-independent language
– Backend support for uniprocessors, multicores (Raw, SMP), cluster of workstations
• Model of computation: synchronous dataflow– Program is a graph of independent filters– Filters have an atomic execution step
with known input / output rates– Compiler is responsible for
scheduling and buffer management
• Extensions to synchronous dataflow – Dynamic I/O rates– Support for sliding window operations– Teleport messaging [PPoPP’05]
Decimate
Input
Output
110
11
x 10
x 1
x 1
[Lee & Messerschmidt,1987]
![Page 16: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/16.jpg)
Representing Streams• Conventional wisdom: stream programs are graphs
– Graphs have no simple textual representation– Graphs are difficult to analyze and optimize
• Insight: stream programs have structure
structuredunstructured
![Page 17: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/17.jpg)
Structured Streams
may be any StreamIt language construct
joinersplitter
pipeline
feedback loop
joiner splitter
splitjoin
filter • Each structure is single-input, single-output
• Hierarchical and composable
![Page 18: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/18.jpg)
Radar-Array Front End
![Page 19: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/19.jpg)
Filterbank
![Page 20: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/20.jpg)
FFT
![Page 21: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/21.jpg)
Block Matrix Multiply
![Page 22: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/22.jpg)
MP3 Decoder
![Page 23: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/23.jpg)
Bitonic Sort
![Page 24: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/24.jpg)
FM Radio with Equalizer
![Page 25: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/25.jpg)
Ground Moving Target Indicator (GMTI)
99 filters3566 filter instances
![Page 26: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/26.jpg)
26
void->void pipeline FMRadio(int N, float lo, float hi) {add AtoD();
add FMDemod();
add splitjoin {split duplicate;for (int i=0; i<N; i++) {
add pipeline {add LowPassFilter(lo + i*(hi - lo)/N);
add HighPassFilter(lo + i*(hi - lo)/N);}
}join roundrobin();
}add Adder();
add Speaker();}
Adder
Speaker
AtoD
FMDemod
LPF1
Duplicate
RoundRobin
LPF2 LPF3
HPF1 HPF2 HPF3
Example Syntax: FMRadio
![Page 27: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/27.jpg)
• Software radio
• Frequency hopping radio
• Acoustic beam former
• Vocoder
• FFTs and DCTs
• JPEG Encoder/Decoder
• MPEG-2 Encoder/Decoder
• MPEG-4 (fragments)
• Sorting algorithms
• GMTI (Ground Moving Target Indicator)
• DES and Serpent crypto algorithms
• SSCA#3 (HPCS scalable benchmark for synthetic aperture radar)
• Mosaic imaging using RANSAC algorithm
StreamIt Application Suite
Total size: 60,000 lines of code
![Page 28: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/28.jpg)
Control Messages
• Occasionally, low-bandwidth control messages are sent between actors
• Often demands precise timing– Communications: adjust protocol,
amplification, compression– Network router: cancel invalid packet– Adaptive beamformer: track a target– Respond to user input, runtime errors– Frequency hopping radio
• Traditional techniques:– Direct method call (no timing guarantees)– Embed message in stream (opaque, slow)
AtoD
duplicate
LPF2LPF1 LPF3
HPF2HPF1 HPF3
Transmit
roundrobin
Encode
Decode
![Page 29: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/29.jpg)
• Looks like method call, but timed relative to data in the stream
– Exposes dependences to compiler– Simple and precise for user
- Adjustable latency- Can send upstream or downstream
void setProtocol(int p) {reconfig(p);
}
TargetFilter x;if newProtocol(p) {
x.setProtocol(p) @ 2;}
Idea 2: Teleport Messaging
![Page 30: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/30.jpg)
Part 2: Automatic Parallelization
Joint work with Michael GordonMichael I. Gordon, William Thies, Saman Amarasinghe (ASPLOS’06)
Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe (ASPLOS’02)
![Page 31: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/31.jpg)
Streaming is an Implicitly Parallel Model• Programmer thinks about functionality, not parallelism
• More explicit models may…– Require knowledge of target [MPI] [cG]
– Require parallelism annotations [OpenMP] [HPF] [Cilk] [Intel TBB]
• Novelty over other implicit models?[Erlang] [MapReduce] [Sequoia] [pH] [Occam] [Sisal] [Id] [VAL] [LUSTRE][HAL] [THAL] [SALSA] [Rosette] [ABCL] [APL] [ZPL] [NESL] […]
Exploiting streaming structure for robust performance
![Page 32: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/32.jpg)
Parallelism in Stream Programs
Task parallelism– Analogous to thread (fork/join)
parallelism
Data Parallelism– Peel iterations of filter, place within
scatter/gather pair (fission)– parallelize filters with state
Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized
Splitter
Joiner
Task
![Page 33: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/33.jpg)
Parallelism in Stream Programs
Task parallelism– Analogous to thread (fork/join)
parallelism
Data parallelism– Analogous to DOALL loops
Pipeline parallelism– Analogous to ILP that is
exploited in hardware
Splitter
Joiner
Splitter
Joiner
Task
Pip
elin
e
Data
Stateless
![Page 34: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/34.jpg)
Baseline: Fine-Grained Data Parallelism
Adder
Splitter
Joiner
BandStopBandStopBandStopAdderSplitter
Joiner
ExpandExpandExpand
ProcessProcessProcess
Joiner
BandPassBandPassBandPass
CompressCompressCompress
BandStopBandStopBandStop
Expand
BandStop
Splitter
Joiner
Splitter
Process
BandPass
Compress
Splitter
Joiner
Splitter
Joiner
Splitter
Joiner
ExpandExpandExpand
ProcessProcessProcess
Joiner
BandPassBandPassBandPass
CompressCompressCompress
BandStopBandStopBandStop
Expand
BandStop
Splitter
Joiner
Splitter
Process
BandPass
Compress
Splitter
Joiner
Splitter
Joiner
Splitter
Joiner
![Page 35: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/35.jpg)
0
2
4
6
8
10
12
14
16
18
Bitonic
SortCha
nnelV
ocod
er
DCT
DES
FFTFilte
rbank
FMRadio
Serpen
t
TDEMPEG2-s
ubse
tVoc
oder
Radar
Geometr
ic Mea
nThro
ughp
ut N
orm
aliz
ed to
Sin
gle
Cor
e St
ream
It
Fine-Grained Data
Coarse-Grained Task + Data
Evaluation:Fine-Grained Data Parallelism
Raw Microprocessor16 inorder, single-issue cores with D$ and I$
16 memory banks, each bank with DMACycle accurate simulator
![Page 36: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/36.jpg)
0
2
4
6
8
10
12
14
16
18
Bitonic
SortCha
nnelV
ocod
er
DCT
DES
FFTFilte
rbank
FMRadio
Serpen
t
TDEMPEG2-s
ubse
tVoc
oder
Radar
Geometr
ic Mea
nThro
ughp
ut N
orm
aliz
ed to
Sin
gle
Cor
e St
ream
It
Fine-Grained Data
Coarse-Grained Task + Data
Evaluation:Fine-Grained Data Parallelism
Good Parallelism!Too Much Synchronization!
![Page 37: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/37.jpg)
Splitter
Joiner
Expand
BandStop
Process
BandPass
Compress
Expand
BandStop
Process
BandPass
Compress
Adder
Coarsening the Granularity
![Page 38: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/38.jpg)
Splitter
Joiner
BandPassCompressProcessExpand
BandPassCompressProcessExpand
BandStop BandStop
Adder
Coarsening the Granularity
![Page 39: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/39.jpg)
Splitter
Joiner
BandPassCompressProcessExpand
Splitter
Joiner
BandPassCompressProcessExpand
BandPassCompressProcessExpand
Splitter
Joiner
BandPassCompressProcessExpand
BandStop BandStop
Coarsening the Granularity
Adder
![Page 40: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/40.jpg)
BandStop BandStop
Splitter
Joiner
BandPassCompressProcessExpand
Splitter
Joiner
BandPassCompressProcessExpand
BandPassCompressProcessExpand
Splitter
Joiner
BandPassCompressProcessExpand
Splitter
Joiner
BandStop
Splitter
Joiner
BandStop
Coarsening the Granularity
AdderAdderAdderAdderAdder
Splitter
Joiner
![Page 41: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/41.jpg)
0
2
4
6
8
10
12
14
16
18
Bitonic
SortCha
nnelV
ocod
er
DCT
DES
FFTFilte
rbank
FMRadio
Serpen
t
TDEMPEG2-s
ubse
tVoc
oder
Radar
Geometr
ic Mea
nThro
ughp
ut N
orm
aliz
ed to
Sin
gle
Cor
e St
ream
It
Fine-Grained Data
Coarse-Grained Task + Data
Evaluation: Coarse-Grained Data Parallelism
Good Parallelism!Low Synchronization!
![Page 42: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/42.jpg)
Simplified Vocoder
RectPolar
Splitter
Joiner
AdaptDFT AdaptDFT
Splitter
Splitter
Amplify
Diff
UnWrap
Accum
Amplify
Diff
Unwrap
Accum
Joiner
Joiner
PolarRect
66
20
2
1
1
1
2
1
1
1
20 Data Parallel
Data Parallel
Target a 4-core machine
![Page 43: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/43.jpg)
Data Parallelize
RectPolarRectPolarRectPolar
Splitter
Joiner
AdaptDFT AdaptDFT
Splitter
Splitter
Amplify
Diff
UnWrap
Accum
Amplify
Diff
Unwrap
Accum
Joiner
RectPolar
Splitter
Joiner
RectPolarRectPolarRectPolarPolarRect
Splitter
Joiner
Joiner
66
20
2
1
1
1
2
1
1
1
20
5
5
Target a 4-core machine
![Page 44: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/44.jpg)
Data + Task Parallel Execution
Time
Cores
21
Splitter
Joiner
Splitter
Splitter
Joiner
Splitter
Joiner
RectPolarSplitter
Joiner
Joiner
66
2
1
1
1
2
1
1
1
5
5
Target a 4-core machine
![Page 45: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/45.jpg)
We Can Do Better
Time
Cores
Splitter
Joiner
Splitter
Splitter
Joiner
Splitter
Joiner
RectPolarSplitter
Joiner
Joiner
66
2
1
1
1
2
1
1
1
5
5
16
Target a 4-core machine
![Page 46: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/46.jpg)
RectPolar
RectPolar
RectPolar
RectPolar
Prologue
New Steady
State
Coarse-Grained Software Pipelining
![Page 47: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/47.jpg)
0
2
4
6
8
10
12
14
16
18
Bitonic
SortCha
nnelV
ocod
er
DCT
DES
FFTFilte
rbank
FMRadio
Serpen
t
TDEMPEG2-s
ubse
tVoc
oder
Radar
Geometr
ic Mea
nThro
ughp
ut N
orm
aliz
ed to
Sin
gle
Cor
e St
ream
It Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline
Evaluation: Coarse-Grained Task + Data + Software Pipelining
![Page 48: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/48.jpg)
0
2
4
6
8
10
12
14
16
18
Bitonic
SortCha
nnelV
ocod
er
DCT
DES
FFTFilte
rbank
FMRadio
Serpen
t
TDEMPEG2-s
ubse
tVoc
oder
Radar
Geometr
ic Mea
nThro
ughp
ut N
orm
aliz
ed to
Sin
gle
Cor
e St
ream
It Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline
Evaluation: Coarse-Grained Task + Data + Software Pipelining
Best Parallelism!Lowest Synchronization!
![Page 49: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/49.jpg)
Parallelism: Take Away• Stream programs have abundant parallelism
– However, parallelism is obfuscated in language like C
• Stream languages enable new & effective mapping
– In C, analogous transformations impossibly complex – In StreamC or Brook, similar transformations possible
[Khailany et al., IEEE Micro’01] [Buck et al., SIGGRAPH’04] [Das et al., PACT’06] […]
• Results should extend to other multicores– Parameters: local memory, comm.-to-comp. cost– Preliminary results on Cell are promising [Zhang, dasCMP’07]
Coarsen Granularity
Data Parallelize
Software Pipeline
![Page 50: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/50.jpg)
Part 3: Domain-Specific Optimizations
Joint work with Andrew Lamb, Sitij AgrawalAndrew Lamb, William Thies, Saman Amarasinghe (PLDI’03)
Sitij Agrawal, William Thies, Saman Amarasinghe (CASES’05)
![Page 51: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/51.jpg)
DSP Optimization Process• Given specification of algorithm,
minimize the computation cost
Adder
Speaker
AtoD
FMDemod
LPF1
Duplicate
RoundRobin
LPF2 LPF3
HPF1 HPF2 HPF3
Linear
![Page 52: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/52.jpg)
DSP Optimization Process• Given specification of algorithm,
minimize the computation cost– Currently done by hand (MATLAB)
Speaker
Equalizer
AtoD
FMDemod
IFFT
FFT
![Page 53: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/53.jpg)
DSP Optimization Process• Given specification of algorithm,
minimize the computation cost– Currently done by hand (MATLAB)
• Can compiler replace DSP expert?– Library generators limited [Spiral] [FFTW] [ATLAS]– Enable unified development environment
Speaker
Equalizer
AtoD
FMDemod
IFFT
FFT
![Page 54: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/54.jpg)
Focus: Linear State Space Filters• Properties:
– Outputs are linear function of inputs and states– New states are linear function of inputs and states
• Most common target of DSP optimizations– FIR / IIR filters– Linear difference equations– Upsamplers / downsamplers– DCTs
u
x’ = Ax + Bu
y = Cx + Du
inputs
states
outputs
![Page 55: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/55.jpg)
Focus: Linear State Space Filters
u
x’ = Ax + Bu
y = Cx + Du
inputs
states
outputs
![Page 56: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/56.jpg)
Focus: Linear Filters
float->float filter Scale {work push 2 pop 1 { float u = pop();push(u);push(2*u);
}}
u
y = Du
inputs
outputs
Linear dataflow analysis
![Page 57: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/57.jpg)
Focus: Linear Filters
float->float filter Scale {work push 2 pop 1 { float u = pop();push(u);push(2*u);
}}
uinputs
outputs
Linear dataflow analysis
=y1y2
12
u
![Page 58: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/58.jpg)
Combining Adjacent Filters
y = Du
z = EyG
z = EDu
Filter 1
Filter 2
y
u
z
CombinedFilter
u
z
z = Gu
![Page 59: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/59.jpg)
Combination Example
Filter 1
Filter 2
y
u
z
CombinedFilter
u
z[ ]654=AE
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡=
321
BD
C = [ 32 ]G
1 multsoutput
6 multsoutput
![Page 60: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/60.jpg)
• If matrix dimensions mis-match?
The General Case
[D]U
E
[D]U
E
[D][D]
[D]
Original Expanded
σ
pop = σ
Matrix expansion:
![Page 61: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/61.jpg)
• If matrix dimensions mis-match?
The General Case
[D]U
E
[D]U
E
[D][D]
[D]
Original Expanded
σ
pop = σ
Matrix expansion:
![Page 62: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/62.jpg)
Pipelines
Feedback Loops
The General Case
![Page 63: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/63.jpg)
Splitjoins
The General Case
![Page 64: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/64.jpg)
-40%
-20%
0%
20%
40%
60%
80%
100%
FIRRate
Conve
rtTarg
etDete
ctFMRad
io
Radar
FilterB
ank
Vocod
erOve
rsample
DToA
Benchmark
Flop
s R
emov
ed (%
)
linear
0.3%
Floating-Point Operations Reduction
![Page 65: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/65.jpg)
-40%
-20%
0%
20%
40%
60%
80%
100%
FIRRate
Conve
rtTarg
etDete
ctFMRad
io
Radar
FilterB
ank
Vocod
erOve
rsample
DToA
Benchmark
Flop
s R
emov
ed (%
)
linear
freq
-140%
0.3%
Floating-Point Operations Reduction
![Page 66: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/66.jpg)
Splitter
Sink
RR
Mag
Detect
Duplicate
Mag
Detect
Mag
Detect
BeamForm BeamForm BeamForm BeamForm
Filter Filter Filter Filter
Mag
Detect
RR
Splitter(null)
Input Input Input Input Input Input Input Input Input Input Input Input
Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec
Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec
FIR1 FIR1 FIR1 FIR1 FIR1 FIR FIR1 FIR1 FIR1 FIR1 FIR1 FIR1
FIR2 FIR2 FIR2 FIR2 FIR2 FIR FIR2 FIR2 FIR2 FIR2 FIR2 FIR2
Radar (Transformation Selection)
![Page 67: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/67.jpg)
RR
RR
RR
Splitter
Sink
RR
Filter
Mag
Detect
Filter
Mag
Detect
Filter
Mag
Detect
Duplicate
BeamForm BeamForm BeamForm BeamForm
Filter
Mag
Detect
Splitter(null)
Input Input Input Input Input Input Input Input Input Input Input Input
Radar (Transformation Selection)
![Page 68: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/68.jpg)
RR
RR
Splitter
Sink
RR
Filter
Mag
Detect
Filter
Mag
Detect
Filter
Mag
Detect
RR
Duplicate
BeamForm BeamForm BeamForm BeamForm
Filter
Mag
Detect
Splitter(null)
Input Input Input Input Input Input Input Input Input Input Input Input
Radar (Transformation Selection)
![Page 69: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/69.jpg)
2.4 times as many FLOPS
half as many FLOPS
Radar (Transformation Selection)
RR
RR
Splitter
Sink
RR
Filter
Mag
Detect
Filter
Mag
Detect
Filter
Mag
Detect
Filter
Mag
Detect
Splitter(null)
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Input
Splitter
Sink
RR
Mag
Duplicate
Mag Mag Mag
RR
Splitter(null)
Input Input Input Input Input Input Input Input Input Input Input Input
Maximal Combination andShifting to Frequency Domain
Using TransformationSelection
![Page 70: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/70.jpg)
-40%
-20%
0%
20%
40%
60%
80%
100%
FIRRateCon
vert
TargetD
etect
FMRadio
Radar
FilterB
ank
Vocode
rOve
rsample
DToA
Benchmark
Flop
s Re
mov
ed (%
)
linearfreqautosel
-140%
0.3%
Floating Point Operations Reduction
![Page 71: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/71.jpg)
-200%-100%
0%100%200%300%400%500%600%700%800%900%
FIRRateCon
vert
TargetD
etect
FMRadio
Radar
FilterB
ank
Vocode
rOve
rsample
DToA
Benchmark
Spe
edup
(%)
linearfreqautosel
Execution Speedup
On a Pentium IV
5%
![Page 72: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/72.jpg)
-200%-100%
0%100%200%300%400%500%600%700%800%900%
FIRRateCon
vert
TargetD
etect
FMRadio
Radar
FilterB
ank
Vocode
rOve
rsample
DToA
Benchmark
Spe
edup
(%)
linearfreqautosel
Execution Speedup
On a Pentium IV
5%
Additional transformations:1. Eliminating redundant states2. Eliminating parameters
(non-zero, non-unary coefficients)3. Translation to the compressed domain
![Page 73: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/73.jpg)
StreamIt: Lessons Learned• In practice, I/O rates of filters are often matched [LCTES’03]
– Over 30 publications study an uncommon case (CD-DAT)
• Multi-phase filters complicate programs, compilers– Should maintain simplicity of only one atomic step per filter
• Programmers accidentally introduce mutable filter state
1 2 3 2 7 8 7 5
x 147 x 98 x 28 x 32
void>int filter SquareWave() {int x = 0;
work push 1 {push(x);x = 1 - x;
} }
void>int filter SquareWave() {work push 2 {
push(0);push(1);
}} statefulstateless
![Page 74: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/74.jpg)
Future of StreamIt• Goal: influence the next big language
Source: B. Stroustrup, The Design and Evolution of C++
1960
1970
1980
1990
Structural influenceFeature influenceFortran
Algol 60CPL
BCPL
C
ANSI C
Simula 67
C with Classes
C++
C++arm
C++std
ML CluAlgol 68
Ada
Origins of C++
Academic origin
![Page 75: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/75.jpg)
Research Trajectory• Vision: Make emerging computational substrates
universally accessible and useful1. Languages, compilers, & tools for multicores
– I believe new language / compiler technologycan enable scalable and robust performance
– Next inroads: expose & exploit flexibility in programs
2. Programmable microfluidics– We have developed programming languages,
tools, and flexible new devices for microfluidics– Potential to revolutionize biology experimentation
3. Technologies for the developing world– TEK: enable Internet experience over email account– Audio Wiki: publish content from a low-cost phone– uBox / uPhone: monitor & improve rural healthcare
![Page 76: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/76.jpg)
Conclusions• A parallel programming model will succeed only by
luring programmers, making them do less, not more
• Stream programminglures programmers with:– Elegant programming primitives– Domain-specific optimizations
• Meanwhile, streamingis implicitly parallel– Robust performance via task,
data, & pipeline parallelism
• We believe stream programming will play a key rolein enabling a transition to multicore processors
Contributions– Structured streams– Teleport messaging– Unified algorithm for task,
data, pipeline parallelism– Software pipelining of
whole procedures– Algebraic simplification of
whole procedures– Translation from time to frequency – Selection of best DSP transforms
![Page 77: Stream Programming: Luring Programmers into the …people.csail.mit.edu/thies/jobtalk08.pdf · 2009-02-09 · Stream Programming: Luring Programmers into the Multicore Era ... Application](https://reader035.vdocuments.site/reader035/viewer/2022062413/5b305e467f8b9adc6e8e1204/html5/thumbnails/77.jpg)
Acknowledgments• Project supervisors
– Prof. Saman Amarasinghe – Dr. Rodric Rabbah
• Contributors to this talk– Michael I. Gordon (Ph.D. Candidate) – leads StreamIt backend efforts– Andrew A. Lamb (M.Eng) – led linear optimizations– Sitij Agrawal (M.Eng) – led statespace optimizations
• Compiler developers– Kunal Agrawal– Allyn Dimock– Qiuyuan Jimmy Li
• Application developers– Basier Aziz– Matthew Brown– Matthew Drake
• User interface developers– Kimberly Kuo
– Jasper Lin– Michal Karczmarek– David Maze
– Shirley Fung– Hank Hoffmann– Chris Leger
– Janis Sermulins– Phil Sung– David Zhang
– Ali Meli– Satish Ramaswamy– Jeremy Wong
– Juan Reyes