ldcillanguage and compiler support for stream...

79
L dC il Language and Compiler Support for Stream Programs Support for Stream Programs Bill Thies Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Thesis Defense September 11 2008 September 11, 2008

Upload: others

Post on 21-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

L d C ilLanguage and CompilerSupport for Stream ProgramsSupport for Stream Programs

Bill ThiesBill Thies

Computer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

Thesis Defense

September 11 2008September 11, 2008

Page 2: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Date: Wed, 17 Nov 1999From: Saman Amarasinghe <saman@lcs mit edu>From: Saman Amarasinghe <[email protected]>To: Bill Thies <[email protected]>Subject: UROP OpportunitiesSubject: UROP Opportunities

Hi Bill,

I have a few UROP opportunities in the RAW project ...

Most of the projects can lead to an MENG thesis and beyond ...y

Page 3: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Date: Wed, 17 Nov 1999From: Saman Amarasinghe <saman@lcs mit edu>From: Saman Amarasinghe <[email protected]>To: Bill Thies <[email protected]>Subject: UROP OpportunitiesSubject: UROP Opportunities

Hi Bill,

I have a few UROP opportunities in the RAW project...

Most of the projects can lead to an MENG thesis and beyond...y

Page 4: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Acknowledgments• Project supervisors

– Prof. Saman Amarasinghe – Dr. Rodric RabbahContributors to this talk• Contributors to this talk– Michael I. Gordon (Ph.D. student) – led development of Raw backend– Andrew A. Lamb (M.Eng) – led development of linear optimizations

Sitij A l (M E ) l d d l t f t t ti i ti– Sitij Agrawal (M.Eng) – led development of statespace optimizations• Compiler developers

– Kunal Agrawal – Jasper Lin – Phil Sungg– Allyn Dimock– Steve Hall– Qiuyuan Jimmy Li

Jasper Lin– Michal Karczmarek– David Maze– Janis Sermulins

Phil Sung– Ceryen Tan– David Zhang

Qiuyuan Jimmy Li• Application developers

– Basier AzizM tth B

Janis Sermulins

– Shirley Fung – Mani Narayanan– Matthew Brown– Jiawen Chen– Matthew Drake

– Hank Hoffmann– Chris Leger– Ali Meli

– Satish Ramaswamy– Jeremy Wong

• User interface developers– Kimberly Kuo – Juan Reyes

Page 5: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4 Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Page 6: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4 Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Hardware wasresponsible forimproving performance

Page 7: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Multicores are Here

1985 199019801970 1975 1995 2000

4004

8008

80868080 286 386 486 Pentium P2 P3P4Itanium

Itanium 2

2005

Raw

Power4 Opteron

Power6

Niagara

YonahPExtreme

Tanglewood

Cell

IntelTflops

Xbox360

CaviumOcteon

RazaXLR

PA-8800

CiscoCSR-1

PicochipPC102

Broadcom 1480

20??

# ofcores

1

2

4

8

16

32

64

128256

512

Opteron 4PXeon MP

Athlon

AmbricAM2045

Tilera

Now, performanceburden falls onprogrammers

Page 8: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Is Parallel Programming a New Problem?• No! Decades of research targeting multiprocessors

– Languages, compilers, architectures, tools…

• What is different today?1. Multicores vs. multiprocessors. Multicores have:

- New interconnects with non-uniform communication costs- Faster on-chip communication than off-chip I/O, memory ops- Limited per-core memory availability

2. Non-expert programmers- Supercomputers with >2048 processors today: 100 [top500.org]

- Machines with >2048 cores in 2020: >100 million [ITU, Moore]

3. Application trends- Embedded: 2.7 billion cell phones vs 850 million PCs [ITU 2006]

- Data-centric: YouTube streams 200 TB of video daily

Page 9: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Streaming Application Domain• For programs based on streams of data

– Audio, video, DSP, networking, and cryptographic processing kernels

– Examples: HDTV editing, radar tracking, microphone arrays, cell phone base stations, graphics

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Page 10: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Streaming Application Domain• For programs based on streams of data

– Audio, video, DSP, networking, and cryptographic processing kernels

– Examples: HDTV editing, radar tracking, microphone arrays, cell phone base stations, graphics

• Properties of stream programs– Regular and repeating computation– Independent filters

with explicit communication– Data items have short lifetimes

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Page 11: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

EsterelC

Grape-IIMatlab/Simulink

etc.

ErlangpH

Page 12: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Weaknesses• Unsuitable for static analysis• Cannot leverage deep results

from DSP / modeling community

Strengths• Elegance• Generality

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

EsterelC

Grape-IIMatlab/Simulink

etc.

ErlangpH

Page 13: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Weaknesses• Unsuitable for static analysis• Cannot leverage deep results

from DSP / modeling community

Strengths• Elegance• Generality

Brief History of Streaming1960 1970 1980 1990 2000

Models of Computation

Languages / Compilers

Modeling Environments

Petri NetsComp. Graphs

Kahn Proc. NetworksCommunicating Sequential Processes

SisalOccam

Lucid IdVALlazy

Synchronous Dataflow

Gabriel

LUSTRE

Ptolemy

EsterelC

Grape-IIMatlab/Simulink

etc.

ErlangpH

StreamItCg StreamC

Brook

“StreamProgramming”

Page 14: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain

• Project contributions:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications

Page 15: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain

• I contributed to:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications

Page 16: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

StreamIt: A Language and Compilerfor Stream Programs

• Key idea: design language that enables static analysis

• Goals:1. Expose and exploit the parallelism in stream programs2. Improve programmer productivity in the streaming domain

• This talk:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06]

– Domain-specific optimizations [PLDI'03, CASES'05, TechRep'07]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

– 7 years, 25 people, 300 KLOC– 700 external downloads, 5 external publications

Page 17: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Part 1: Language Design

Joint work with Michael GordonWilliam Thies, Michal Karczmarek, Saman Amarasinghe (CC’02)

William Thies, Michal Karczmarek, Janis Sermulins, Rodric Rabbah,Saman Amarasinghe (PPoPP’05)

Page 18: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

StreamIt Language Basics• High-level, architecture-independent language

– Backend support for uniprocessors, multicores (Raw, SMP), cluster of workstations

• Model of computation: synchronous dataflow– Program is a graph of independent filters– Filters have an atomic execution step

with known input / output rates– Compiler is responsible for

scheduling and buffer management

• Extensions to synchronous dataflow – Dynamic I/O rates– Support for sliding window operations– Teleport messaging [PPoPP’05]

Decimate

Input

Output

110

11

x 10

x 1

x 1

[Lee & Messerschmidt,1987]

Page 19: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Representing Streams• Conventional wisdom: stream programs are graphs

– Graphs have no simple textual representation– Graphs are difficult to analyze and optimize

• Insight: stream programs have structure

structuredunstructured

Page 20: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Structured Streams

may be any StreamIt language construct

joinersplitter

pipeline

feedback loop

joiner splitter

splitjoin

filter • Each structure is single-input, single-output

• Hierarchical and composable

Page 21: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Radar-Array Front End

Page 22: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Filterbank

Page 23: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

FFT

Page 24: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Block Matrix Multiply

Page 25: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

MP3 Decoder

Page 26: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Bitonic Sort

Page 27: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

FM Radio with Equalizer

Page 28: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Ground Moving Target Indicator (GMTI)

99 filters3566 filter instances

Page 29: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

26

void->void pipeline FMRadio(int N, float lo, float hi) {add AtoD();

add FMDemod();

add splitjoin {split duplicate;for (int i=0; i<N; i++) {

add pipeline {add LowPassFilter(lo + i*(hi - lo)/N);

add HighPassFilter(lo + i*(hi - lo)/N);}

}join roundrobin();

}add Adder();

add Speaker();}

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Example Syntax: FMRadio

Page 30: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

• Software radio

• Frequency hopping radio

• Acoustic beam former

• Vocoder

• FFTs and DCTs

• JPEG Encoder/Decoder

• MPEG-2 Encoder/Decoder

• MPEG-4 (fragments)

• Sorting algorithms

• GMTI (Ground Moving Target Indicator)

• DES and Serpent crypto algorithms

• SSCA#3 (HPCS scalable benchmark for synthetic aperture radar)

• Mosaic imaging using RANSAC algorithm

StreamIt Application Suite

Total size: 60,000 lines of code

Page 31: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Control Messages

• Occasionally, low-bandwidth control messages are sent between actors

• Often demands precise timing– Communications: adjust protocol,

amplification, compression– Network router: cancel invalid packet– Adaptive beamformer: track a target– Respond to user input, runtime errors– Frequency hopping radio

• Traditional techniques:– Direct method call (no timing guarantees)– Embed message in stream (opaque, slow)

AtoD

duplicate

LPF2LPF1 LPF3

HPF2HPF1 HPF3

Transmit

roundrobin

Encode

Decode

Page 32: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

• Looks like method call, but timed relative to data in the stream

– Exposes dependences to compiler– Simple and precise for user

- Adjustable latency- Can send upstream or downstream

void setProtocol(int p) {reconfig(p);

}

TargetFilter x;if newProtocol(p) {

x.setProtocol(p) @ 2;}

Idea 2: Teleport Messaging

Page 33: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Part 2: Automatic Parallelization

Joint work with Michael GordonMichael I. Gordon, William Thies, Saman Amarasinghe (ASPLOS’06)

Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, Saman Amarasinghe (ASPLOS’02)

Page 34: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Streaming is an Implicitly Parallel Model• Programmer thinks about functionality, not parallelism

• More explicit models may…– Require knowledge of target [MPI] [cG]

– Require parallelism annotations [OpenMP] [HPF] [Cilk] [Intel TBB]

• Novelty over other implicit models?[Erlang] [MapReduce] [Sequoia] [pH] [Occam] [Sisal] [Id] [VAL] [LUSTRE][HAL] [THAL] [SALSA] [Rosette] [ABCL] [APL] [ZPL] [NESL] […]

Exploiting streaming structure for robust performance

Page 35: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Parallelism in Stream Programs

Task parallelism– Analogous to thread (fork/join)

parallelism

Data Parallelism– Peel iterations of filter, place within

scatter/gather pair (fission)– parallelize filters with state

Pipeline Parallelism– Between producers and consumers– Stateful filters can be parallelized

Splitter

Joiner

Task

Page 36: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Parallelism in Stream Programs

Task parallelism– Analogous to thread (fork/join)

parallelism

Data parallelism– Analogous to DOALL loops

Pipeline parallelism– Analogous to ILP that is

exploited in hardware

Splitter

Joiner

Splitter

Joiner

Task

Pip

elin

e

Data

Stateless

Page 37: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Baseline: Fine-Grained Data Parallelism

Adder

Splitter

Joiner

BandStopBandStopBandStopAdderSplitter

Joiner

ExpandExpandExpand

ProcessProcessProcess

Joiner

BandPassBandPassBandPass

CompressCompressCompress

BandStopBandStopBandStop

Expand

BandStop

Splitter

Joiner

Splitter

Process

BandPass

Compress

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

ExpandExpandExpand

ProcessProcessProcess

Joiner

BandPassBandPassBandPass

CompressCompressCompress

BandStopBandStopBandStop

Expand

BandStop

Splitter

Joiner

Splitter

Process

BandPass

Compress

Splitter

Joiner

Splitter

Joiner

Splitter

Joiner

Page 38: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

0

2

4

6

8

10

12

14

16

18

Bitonic

SortCha

nnelV

ocod

er

DCT

DES

FFTFilte

rbank

FMRadio

Serpen

t

TDEMPEG2-s

ubse

tVoc

oder

Radar

Geometr

ic Mea

nThro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation:Fine-Grained Data Parallelism

Raw Microprocessor16 inorder, single-issue cores with D$ and I$

16 memory banks, each bank with DMACycle accurate simulator

Page 39: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

0

2

4

6

8

10

12

14

16

18

Bitonic

SortCha

nnelV

ocod

er

DCT

DES

FFTFilte

rbank

FMRadio

Serpen

t

TDEMPEG2-s

ubse

tVoc

oder

Radar

Geometr

ic Mea

nThro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation:Fine-Grained Data Parallelism

Good Parallelism!Too Much Synchronization!

Page 40: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Splitter

Joiner

Expand

BandStop

Process

BandPass

Compress

Expand

BandStop

Process

BandPass

Compress

Adder

Coarsening the Granularity

Page 41: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

BandStop BandStop

Adder

Coarsening the Granularity

Page 42: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandStop BandStop

Coarsening the Granularity

Adder

Page 43: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

BandStop BandStop

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

BandPassCompressProcessExpand

Splitter

Joiner

BandPassCompressProcessExpand

Splitter

Joiner

BandStop

Splitter

Joiner

BandStop

Coarsening the Granularity

AdderAdderAdderAdderAdder

Splitter

Joiner

Page 44: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

0

2

4

6

8

10

12

14

16

18

Bitonic

SortCha

nnelV

ocod

er

DCT

DES

FFTFilte

rbank

FMRadio

Serpen

t

TDEMPEG2-s

ubse

tVoc

oder

Radar

Geometr

ic Mea

nThro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It

Fine-Grained Data

Coarse-Grained Task + Data

Evaluation: Coarse-Grained Data Parallelism

Good Parallelism!Low Synchronization!

Page 45: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Simplified Vocoder

RectPolar

Splitter

Joiner

AdaptDFT AdaptDFT

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Amplify

Diff

Unwrap

Accum

Joiner

Joiner

PolarRect

66

20

2

1

1

1

2

1

1

1

20 Data Parallel

Data Parallel

Target a 4-core machine

Page 46: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Data Parallelize

RectPolarRectPolarRectPolar

Splitter

Joiner

AdaptDFT AdaptDFT

Splitter

Splitter

Amplify

Diff

UnWrap

Accum

Amplify

Diff

Unwrap

Accum

Joiner

RectPolar

Splitter

Joiner

RectPolarRectPolarRectPolarPolarRect

Splitter

Joiner

Joiner

66

20

2

1

1

1

2

1

1

1

20

5

5

Target a 4-core machine

Page 47: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Data + Task Parallel Execution

Time

Cores

21

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

RectPolarSplitter

Joiner

Joiner

66

2

1

1

1

2

1

1

1

5

5

Target a 4-core machine

Page 48: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

We Can Do Better

Time

Cores

Splitter

Joiner

Splitter

Splitter

Joiner

Splitter

Joiner

RectPolarSplitter

Joiner

Joiner

66

2

1

1

1

2

1

1

1

5

5

16

Target a 4-core machine

Page 49: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

RectPolar

RectPolar

RectPolar

RectPolar

Prologue

New Steady

State

Coarse-Grained Software Pipelining

Page 50: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

0

2

4

6

8

10

12

14

16

18

Bitonic

SortCha

nnelV

ocod

er

DCT

DES

FFTFilte

rbank

FMRadio

Serpen

t

TDEMPEG2-s

ubse

tVoc

oder

Radar

Geometr

ic Mea

nThro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline

Evaluation: Coarse-Grained Task + Data + Software Pipelining

Page 51: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

0

2

4

6

8

10

12

14

16

18

Bitonic

SortCha

nnelV

ocod

er

DCT

DES

FFTFilte

rbank

FMRadio

Serpen

t

TDEMPEG2-s

ubse

tVoc

oder

Radar

Geometr

ic Mea

nThro

ughp

ut N

orm

aliz

ed to

Sin

gle

Cor

e St

ream

It Fine-Grained DataCoarse-Grained Task + DataCoarse-Grained Task + Data + Software Pipeline

Evaluation: Coarse-Grained Task + Data + Software Pipelining

Best Parallelism!Lowest Synchronization!

Page 52: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Parallelism: Take Away• Stream programs have abundant parallelism

– However, parallelism is obfuscated in language like C

• Stream languages enable new & effective mapping

– In C, analogous transformations impossibly complex – In StreamC or Brook, similar transformations possible

[Khailany et al., IEEE Micro’01] [Buck et al., SIGGRAPH’04] [Das et al., PACT’06] […]

• Results should extend to other multicores– Parameters: local memory, comm.-to-comp. cost– Preliminary results on Cell are promising [Zhang, dasCMP’07]

Coarsen Granularity

Data Parallelize

Software Pipeline

Page 53: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Part 3: Domain-Specific Optimizations

Joint work with Andrew Lamb, Sitij AgrawalAndrew Lamb, William Thies, Saman Amarasinghe (PLDI’03)

Sitij Agrawal, William Thies, Saman Amarasinghe (CASES’05)

Page 54: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

DSP Optimization Process• Given specification of algorithm,

minimize the computation cost

Adder

Speaker

AtoD

FMDemod

LPF1

Duplicate

RoundRobin

LPF2 LPF3

HPF1 HPF2 HPF3

Linear

Page 55: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

DSP Optimization Process• Given specification of algorithm,

minimize the computation cost– Currently done by hand (MATLAB)

Speaker

Equalizer

AtoD

FMDemod

IFFT

FFT

Page 56: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

DSP Optimization Process• Given specification of algorithm,

minimize the computation cost– Currently done by hand (MATLAB)

• Can compiler replace DSP expert?– Library generators limited [Spiral] [FFTW] [ATLAS]– Enable unified development environment

Speaker

Equalizer

AtoD

FMDemod

IFFT

FFT

Page 57: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Focus: Linear State Space Filters• Properties:

– Outputs are linear function of inputs and states– New states are linear function of inputs and states

• Most common target of DSP optimizations– FIR / IIR filters– Linear difference equations– Upsamplers / downsamplers– DCTs

u

x’ = Ax + Bu

y = Cx + Du

inputs

states

outputs

Page 58: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Focus: Linear State Space Filters

u

x’ = Ax + Bu

y = Cx + Du

inputs

states

outputs

Page 59: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Focus: Linear Filters

float->float filter Scale {work push 2 pop 1 { float u = pop();push(u);push(2*u);

}}

u

y = Du

inputs

outputs

Linear dataflow analysis

Page 60: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Focus: Linear Filters

float->float filter Scale {work push 2 pop 1 { float u = pop();push(u);push(2*u);

}}

uinputs

outputs

Linear dataflow analysis

=y1y2

12

u

Page 61: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Combining Adjacent Filters

y = Du

z = EyG

z = EDu

Filter 1

Filter 2

y

u

z

CombinedFilter

u

z

z = Gu

Page 62: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Combination Example

Filter 1

Filter 2

y

u

z

CombinedFilter

u

z[ ]654=AE

⎥⎥⎥

⎢⎢⎢

⎡=

321

BD

C = [ 32 ]G

1 multsoutput

6 multsoutput

Page 63: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

• If matrix dimensions mis-match?

The General Case

[D]U

E

[D]U

E

[D][D]

[D]

Original Expanded

σ

pop = σ

Matrix expansion:

Page 64: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

• If matrix dimensions mis-match?

The General Case

[D]U

E

[D]U

E

[D][D]

[D]

Original Expanded

σ

pop = σ

Matrix expansion:

Page 65: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Pipelines

Feedback Loops

The General Case

Page 66: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Splitjoins

The General Case

Page 67: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

-40%

-20%

0%

20%

40%

60%

80%

100%

FIRRate

Conve

rtTarg

etDete

ctFMRad

io

Radar

FilterB

ank

Vocod

erOve

rsample

DToA

Benchmark

Flop

s R

emov

ed (%

)

linear

0.3%

Floating-Point Operations Reduction

Page 68: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

-40%

-20%

0%

20%

40%

60%

80%

100%

FIRRate

Conve

rtTarg

etDete

ctFMRad

io

Radar

FilterB

ank

Vocod

erOve

rsample

DToA

Benchmark

Flop

s R

emov

ed (%

)

linear

freq

-140%

0.3%

Floating-Point Operations Reduction

Page 69: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Splitter

Sink

RR

Mag

Detect

Duplicate

Mag

Detect

Mag

Detect

BeamForm BeamForm BeamForm BeamForm

Filter Filter Filter Filter

Mag

Detect

RR

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec

Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec Dec

FIR1 FIR1 FIR1 FIR1 FIR1 FIR FIR1 FIR1 FIR1 FIR1 FIR1 FIR1

FIR2 FIR2 FIR2 FIR2 FIR2 FIR FIR2 FIR2 FIR2 FIR2 FIR2 FIR2

Radar (Transformation Selection)

Page 70: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

RR

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Duplicate

BeamForm BeamForm BeamForm BeamForm

Filter

Mag

Detect

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Radar (Transformation Selection)

Page 71: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

RR

Duplicate

BeamForm BeamForm BeamForm BeamForm

Filter

Mag

Detect

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Radar (Transformation Selection)

Page 72: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

2.4 times as many FLOPS

half as many FLOPS

Radar (Transformation Selection)

RR

RR

Splitter

Sink

RR

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Filter

Mag

Detect

Splitter(null)

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Input

Splitter

Sink

RR

Mag

Duplicate

Mag Mag Mag

RR

Splitter(null)

Input Input Input Input Input Input Input Input Input Input Input Input

Maximal Combination andShifting to Frequency Domain

Using TransformationSelection

Page 73: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

-40%

-20%

0%

20%

40%

60%

80%

100%

FIRRateCon

vert

TargetD

etect

FMRadio

Radar

FilterB

ank

Vocode

rOve

rsample

DToA

Benchmark

Flop

s Re

mov

ed (%

)

linearfreqautosel

-140%

0.3%

Floating Point Operations Reduction

Page 74: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

-200%-100%

0%100%200%300%400%500%600%700%800%900%

FIRRateCon

vert

TargetD

etect

FMRadio

Radar

FilterB

ank

Vocode

rOve

rsample

DToA

Benchmark

Spe

edup

(%)

linearfreqautosel

Execution Speedup

On a Pentium IV

5%

Page 75: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

-200%-100%

0%100%200%300%400%500%600%700%800%900%

FIRRateCon

vert

TargetD

etect

FMRadio

Radar

FilterB

ank

Vocode

rOve

rsample

DToA

Benchmark

Spe

edup

(%)

linearfreqautosel

Execution Speedup

On a Pentium IV

5%

Additional transformations:1. Eliminating redundant states2. Eliminating parameters

(non-zero, non-unary coefficients)3. Translation to the compressed domain

Page 76: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

StreamIt: Lessons Learned• In practice, I/O rates of filters are often matched [LCTES’03]

– Over 30 publications study an uncommon case (CD-DAT)

• Multi-phase filters complicate programs, compilers– Should maintain simplicity of only one atomic step per filter

• Programmers accidentally introduce mutable filter state

1 2 3 2 7 8 7 5

x 147 x 98 x 28 x 32

void>int filter SquareWave() {int x = 0;

work push 1 {push(x);x = 1 - x;

} }

void>int filter SquareWave() {work push 2 {

push(0);push(1);

}} statefulstateless

Page 77: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Future of StreamIt• Goal: influence the next big language

Source: B. Stroustrup, The Design and Evolution of C++

1960

1970

1980

1990

Structural influenceFeature influenceFortran

Algol 60CPL

BCPL

C

ANSI C

Simula 67

C with Classes

C++

C++arm

C++std

ML CluAlgol 68

Ada

Origins of C++

Academic origin

Page 78: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Research Trajectory• Vision: Make emerging computational substrates

universally accessible and useful1. Languages, compilers, & tools for multicores

– I believe new language / compiler technologycan enable scalable and robust performance

– Next inroads: expose & exploit flexibility in programs

2. Programmable microfluidics– We have developed programming languages,

tools, and flexible new devices for microfluidics– Potential to revolutionize biology experimentation

3. Technologies for the developing world– TEK: enable Internet experience over email account– Audio Wiki: publish content from a low-cost phone– uBox / uPhone: monitor & improve rural healthcare

Page 79: LdCilLanguage and Compiler Support for Stream ...groups.csail.mit.edu/commit/papers/2009/thies-thesis-defense.pdf · Acknowledgments • Project supervisors – Prof. Saman Amarasinghe

Conclusions• A parallel programming model will succeed only by

luring programmers, making them do less, not more

• Stream programminglures programmers with:– Elegant programming primitives– Domain-specific optimizations

• Meanwhile, streamingis implicitly parallel– Robust performance via task,

data, & pipeline parallelism

• We believe stream programming will play a key rolein enabling a transition to multicore processors

Contributions– Structured streams– Teleport messaging– Unified algorithm for task,

data, pipeline parallelism– Software pipelining of

whole procedures– Algebraic simplification of

whole procedures– Translation from time to frequency – Selection of best DSP transforms