tongping liu, charlie curtsinger, emery berger d threads : efficient deterministic multithreading...

Post on 21-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tongping Liu, Charlie Curtsinger, Emery Berger

DTHREADS: Efficient Deterministic Multithreading

Insanity: Doing the same thing over and

over again and expecting different

results.

2

In the Beginning…

3

There was the Core.

4

And it was Good.

5

It gave us our Daily Speed.

6

Until the Apocalypse.

7

And the Speed was no Moore.

8

And then came a False Prophet…

9

10

Want speed?

11

I BRING YOU THE GIFT OF PARALLELISM!

12

color = ; row = 0; // globalsvoid nextStripe(){ for (c = 0; c < Width; c++) drawBox (c,row,color); color = (color == )? : ; row++;}for (n = 0; n < 9; n++) pthread_create(t[n], nextStripe);for (n = 0; n < 9; n++) pthread_join(t[n]);JUST USE THREADS…

13

14

15

16

17

18

pthreads

race conditions

atomicity violations

deadlock

order violations

19

Salvation?

20

21

pthreads

race conditions

atomicity violations

deadlock

order violations

DTHREADS

deterministic

race conditions

atomicity violations

deadlock

order violations

22DTHREADS Enables…

Race-free Executions

Replay Debugging w/o Logging

Replicated State Machines

23

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Usually faster than the state of the art

24

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

DTHREADS: Efficient Determinism

Generally as fast or faster than pthreads

25

% g++ myprog.cpp –l thread

DTHREADS: Easy to Use

p

26

Isolation

shared address space disjoint address spaces

27

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

28

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

29

Performance: Processes vs. Threads

threadsprocesses

1 2 4 8 16 32 64 128 256 512 1024Thread Execution Time (ms)

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0.0

Nor

mal

ized

Exec

ution

Tim

e

30

“Shared Memory”

31

Snapshot pagesbefore modifications

“Shared Memory”

32

Write back diffs

“Shared Memory”

33

“Thread” 1

“Thread” 2

“Thread” 3

Parallel Serial

Update in Deterministic Time & Order

Parallelmutex_lock

cond_wait

pthread_create

34

PHOENIX

histogra

m

kmea

ns

linea

r_reg

ressio

n

matrix_

multiply pca

revers

e_index

string_

match

word_count

PARSEC

blacksc

holes

cannea

l

dedup

ferret

strea

mcluste

r

swap

tions

hmean

0

1

2

3

4

dthreads pthreads

runti

me

rela

tive

to p

thre

ads

DTHREADS performance analysis

35

Thread 1

Main Memory

Core 1

Thread 2

Core 2

Invalidate

The Culprit: False Sharing

36

Thread 1 Thread 2

Invalidate

Main Memory

Core 1 Core 2

The Culprit: False Sharing

20x

37

Process 1 Process 2

Global State

Core 1 Core 2

Process 2

Process 1

DTHREADS: Eliminates False Sharing!

38

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

39

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

40

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

ordering only isolation only dthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Dthreads detailed analysis

DTHREADS: Detailed Analysis

41

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

42

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

43

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

CoreDet dthreads pthreads

spee

dup

of 8

cor

es o

ver 2

cor

es

Scalability

DTHREADS: Scalable Determinism

44

DTHREADS

% g++ myprog.cpp –l threadp

45

End

46

A a

47

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n

w/o

out

liers

0

1

2

3

4

5

6

dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s

Excluding Outliers

DTHREADS: Without Outliers

Just 5% slower than pthreads

48

Commit Protocol

Time

Twin Page

Diff

Global State

LocalState

49

a 0

b 0

a 1

b 1

DTHREADS Example Execution

a 0

b 0

a 0

b 0

a 0

b 0

if(a == 0) b = 1;

if(b == 0) a = 1;

Global State

Committed State

a 1

b 1

50

No Problem

a 0

b 0

if(a == 0) b = 1;

if(b == 0) a = 1;

a 1

b 1

51

That’s Better.

a 0

b 0

lock();if(a == 0) b = 1;unlock();

lock();if(b == 0) a = 1;unlock();

b 1

52

a 0

b 0

a 1lock();if(a == 0) b = 1;unlock();

lock();if(b == 0) a = 1;unlock();

Or is it?

53

A aDeterminism

aA

A aIs this enough?

54

bBA a

C

Robust Determinism

55

External Nondeterminism

?

socket = open_socket(80);listen(socket);

56

http://www.gnu.org/s/pth/

57

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

sOverhead

58

Wrap-Up

A a Determinism

Robust Determinism

Internal Determinism?

59

Wrap-Up

Threads to Processes

Commit Before Synch.

Commit In Token Order

60

PHOEN

IX

hist

ogra

m

kmea

ns

linea

r_re

gres

sion

mat

rix_m

ultip

lypc

a

reve

rse_

inde

x

strin

g_m

atch

wor

d_co

unt

PARSE

C

blac

ksch

oles

cann

eal

dedu

p

ferret

stre

amclu

ster

swap

tions

hmea

n0

1

2

3

4

5

6

CoreDet dthreads pthreads

run

tim

e r

ela

tive t

o p

thre

ad

s 8.4

Overhead with CoreDet

7.8

[ASPLOS 10]

Performance: DTHREADS & CoreDet vs. pthreads

61

How DTHREADS Provides Determinism

Isolation

Deterministic Time

Deterministic Order

62

Evaluation

Phoenixhttp://mapreduce.stanford.edu

http://parsec.cs.princeton.edu

top related