fractal - mit csailpeople.csail.mit.edu/sanchez/papers/2017.fractal.isca.slides.pdf · fractal: an...

Post on 27-May-2020

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SUVINAYSUBRAMANIAN,MARKC.JEFFREY,MALEEN ABEYDEERA,HYUN RYONG LEE,VICTORA.YING,JOELEMER,DANIELSANCHEZ

ISCA2017

FRACTALANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM

Currentspeculativesystemsscalepoorly

Speculativeparallelization,e.g.TM,simplifiesparallelprogramming

Performspoorlyonrealworldapplications……becauseapplicationscompriselargeatomictasks

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 2

Largeatomictaskslimitperformance

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 3

DatabaseTransactionquery X ……update Z……query U……update V

Millionsofcycles

Pronetoaborts

Challengingtotrack

Serial(missesparallelism)

Largeatomictaskshaveabundantnestedparallelism!

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 4

qry X qry K

upd Z qry Y qry Y upd J qry S

qry Mupd L… qry U upd V……

… Howto- extractparallelism?- maintainatomicity?- achievehighperformance?

PriorTMsfailtoexploitnestedparallelism1. Mergingof“nested”speculative

statewithparent

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 5

Core2 Core4Core1 Core3

Time

X

A B

YXA B Y

2. Cyclicdependencebetweenparentandnestedchildren

Largespeculativestate,pronetoaborts Deadlockandlivelock issues

Seethepaperformoredetails!

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 6

Orderingtaskstoguaranteeatomicity

Core2 Core4Core1 Core3Time

X

A B

YX

A B Y

X Y1 2

X Y1 2

A B1.1 1.2

Fractal decouplesatomicityfromparallelism

1. Decouplesunitofatomicityfromunitofparallelism◦ Domain:Alltasksbelongingtoadomainappeartoexecuteatomically

2. Implementationguaranteesatomicitybyorderingtasks◦ Nomergingspeculativestate

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 7

BenefitsofFractalTinytasks

Easytotrack

Composable speculativeparallelism

Fractal ExecutionModel

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 8

DECOUPLINGATOMICITYFROMPARALLELISM

Domains togrouptasksintoatomicunits

Fractalprogramsconsistofatomictasks

Tasksmayaccessarbitrarydata

Tasksmaycreatechildtasks

Tasksbelongtoahierarchyofnesteddomains

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 9

Semanticsacrossdomains

Eachtask:◦ cancreateasinglesubdomain◦ canenqueue childtaskstosubdomain orcurrent domain

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 10

A B

C D

E

X

L M

N O

P

Y

(Alltasksindomain+creatorofdomain)

Appeartoexecuteassingleatomicunit

à

Rootdomain

Semanticswithinadomain

Unordered◦ Arbitraryorderwhilerespectingparent-childdependences

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 11

A B

C D

E

X

L M

N O

P

Y

Timestamp-ordered◦ Tasksappeartoexecuteinincreasingtimestamporder

◦ Childrenappeartoexecuteafterparent

1 10

2

3

12

Rootdomain

Fractal softwareAPI

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 12

fractal::enqueue(function_pointer, timestamp, arguments...);

fractal::create_subdomain(<domain_type>);

Creatingandenqueuing tasks

Creatingsub-domains

forall(), callcc(), parallel_reduce()High-levelprogramminginterface,e.g.

Example:DatabasetransactionsinFractal

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 13

query X query Zupdate Zquery Uupdate V

qry X

qry Z

upd Z

qry U

upd V

query A query Bupdate Cupdate Zupdate K

qry A

qry B

upd C

upd Z

upd K

RootdomainTXN1 TXN2

1

2

3

4

5

1

2

3

4

5

T1 T2

Fractal Implementation

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 14

ATOMICITYTHROUGHORDERING

Fractal VirtualTime(VT)

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 15

Fractal assignsafractalvirtualtime(VT)toeachtaskCapturestheorderingoftasksacrossdomains,withinadomain

FractalVT= 45 23 108 … 9

DomainVT…

Example:DatabasetransactionsinFractal

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 16

query X query Zupdate Zquery Uupdate V

qry X

qry Z

upd Z

qry U

upd V

query A query Bupdate Cupdate Zupdate K

qry A

qry B

upd C

upd Z

upd K

RootdomainTXN1 TXN2

1

2

3

4

5

1

2

3

4

5

1 1

1 2

1 3

1 4

1 5

2 1 2 4

2 2 2 5

2 3

T1 T21 2

Example:DatabasetransactionsinFractal

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 16

query X query Zupdate Zquery Uupdate V

qry X

qry Z

upd Z

qry U

upd V

query A query Bupdate Cupdate Zupdate K

qry A

qry B

upd C

upd Z

upd K

RootdomainTXN1 TXN2

1

2

3

4

5

1

2

3

4

5

1 1

1 2

1 3

1 4

1 5

2 1 2 4

2 2 2 5

2 3

FractalVTcapturesallorderinginformation

T1 T21 2

Swarm[MICRO’15] :Anefficientsubstratefororderedspeculation

LargehardwaretaskqueuesScalableorderedcommitsScalableorderedspeculation

17

64-tile,256-corechip Tileorganization

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3sliceRouter

TaskunitMem /IO

Mem

/IO

Mem /IO

Mem

/IO

Tile

EfficientlysupportstinyspeculativetasksFRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM

Swarmexecutestasksspeculativelyandoutoforder

Fractal featuresFractal VTconstructionrequiresnocentralizedstructures

Fractal VTassignsorderdynamically

Hardwaresupportsafewnumberofconcurrentdepths◦ “Zooming”operationsallowforunboundednesting◦ Spilltasksfromshallowerdomainstomemory◦ Parallelismcompoundsquicklywithdepth

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 18

Seethepaperformoredetails!

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

T11

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry U1 4

T11

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

qry U1 4

upd V1 5

T11

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

qry U1 4

upd V1 5

T11

T22

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

qry U1 4

upd V1 5 qry A

2 1

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

qry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

Tracking,conflictdetectionatleveloffine-graintasks

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

Tracking,conflictdetectionatleveloffine-graintasks

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

Tracking,conflictdetectionatleveloffine-graintasksSelectiveabortswastelesswork

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

qry B2 2

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

T11

T22

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 19

Time

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

qry U1 4

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

T11

T22

qry U1 4

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

Time

T11

T22

qry U1 4

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttaskqry X

1 1

qry Z1 2

upd Z1 3

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

Time

T22

T1

qry U1 4

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttask

upd Z1 3

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

Time

qry X

qry Z

T22

T1

qry U1 4

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 20

TXN1 TXN2

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttask

upd Z1 3

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

Commitparentbeforechildcompletes

Time

qry X

qry Z

T22

T1

T1

qry U1 4

upd K2 5

upd Z2 4

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 21

TXN1=X TXN2=Y

Core2 Core4Core1 Core3

query X query Zupdate Zquery Uupdate V

query A query Bupdate Cupdate Zupdate K

Aborttask

upd Z1 3

upd V1 5 qry A

2 1

upd Z2 4

qry B2 2

upd K2 5

upd C2 3

Task-leveltracking

Task-levelCD

Selectiveaborts

Commitparentbeforechildcompletes

Time

qry X

qry Z

T22

Fractalunlocksthebenefitsoffine-grainparallelism

Evaluation

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 22

Event-driven,Pin-basedsimulatorTargetsystem:256-core,64-tilechip

Methodology

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 23

Scalabilityexperimentsfrom1–256cores◦ Scaled-downsystemshavefewertiles

Core Core Core Core

L1I/D L1I/D L1I/D L1I/D

L2

L3sliceRouter

TaskunitMem /IO

Mem

/IO

Mem /IO

Mem

/IO

Tile

64MBsharedL3(1MB/tile)

256KBper-tileL2s

16KBper-coreL1s

16Ktaskqueueentries(64/core)4Kcommitqueueentries(16/core)

In-order,single-issue,scoreboarded

Applications◦ Unordered(STAMP):labyrinth,bayes

◦ Ordered:color,msf,silo,maxflow,mis

Fractal uncoversabundantnestedparallelism

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 24

Flat Fractal

Largeatomictasks Nestedparallelismexposedthroughfine-grainedtasks

Fractal uncoversabundantnestedparallelism

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 25

1

128

256

Speedup

1c 128c 256c

322xmaxflow

1

128

256

1c 128c 256c

bayes

1

64

128

1c 128c 256c

labyrinth Flat1x—4.9xFractal

88x—322x

Flat Fractal

Flat 3260 1.8M 16MFractal 373 3590 220

Averagetasklength(cycles)

Fractal avoidsover-serialization

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 26

1

64

128

Speedup

1c 128c 256c

145xmis

1

64

128

1c 128c 256c

color

1

32

64

1c 128c 256c

msfFlat FractalSwarm Flat

26x—98xSwarm

21x—119xFractal

40x—145xFlat 162 633 113Fractal 115 96 49

Averagetasklength(cycles)

ConclusionSpeculativesystemsmustextractnestedparallelisminordertoscalelarge,complex,real-worldapplications

Fractal:Anexecutionmodelforfine-grainnestedspeculativeparallelism◦ Decoupleatomicityfromparallelism◦ Guaranteeatomicitybyorderingtasks

Fractal unlocksthebenefitsoffine-grainspeculativeparallelism◦ Parallelizesmanychallengingworkloads◦ Enablescompositionofspeculativeparallelalgorithms

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 27

ThankYou!Questions?Speculativesystemsmustextractnestedparallelisminordertoscalelarge,complex,real-worldapplications

Fractal:Anexecutionmodelforfine-grainnestedspeculativeparallelism◦ Decoupleatomicityfromparallelism◦ Guaranteeatomicitybyorderingtasks

Fractal unlocksthebenefitsoffine-grainspeculativeparallelism◦ Parallelizesmanychallengingworkloads◦ Enablescompositionofspeculativeparallelalgorithms

FRACTAL:ANEXECUTIONMODELFORFINE-GRAINNESTEDSPECULATIVEPARALLELISM 28

top related