goldilocks: efficiently computing the happens-before relation using locksets

22
Goldilocks: Efficiently Computing Goldilocks: Efficiently Computing the Happens-Before Relation Using the Happens-Before Relation Using Locksets Locksets Tayfun Elmas 1 , Shaz Qadeer 2 , Serdar Tasiran 1 1 Koç University, İstanbul, Turkey 2 Microsoft Research, Redmond, WA FATES/RV’06 August 15-16, Seattle, WA

Upload: clove

Post on 14-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets. Tayfun Elmas 1 , Shaz Qadeer 2 , Serdar Tasiran 1 1 Koç University, İstanbul, Turkey 2 Microsoft Research, Redmond, WA. FATES/RV’06 August 15-16, Seattle, WA. Our goal. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

Goldilocks: Efficiently Computing the Goldilocks: Efficiently Computing the Happens-Before Relation Using LocksetsHappens-Before Relation Using Locksets

Tayfun Elmas1, Shaz Qadeer2, Serdar Tasiran1

1Koç University, İstanbul, Turkey2Microsoft Research, Redmond, WA

FATES/RV’06August 15-16, Seattle, WA

Page 2: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

2Our goalOur goal

• Continuous runtime monitoring of concurrent Java programs

– Target: Race conditions

– Criteria• Efficiency: Tolerable impact on performance• Precision: Prevent false alarms

• The Java Memory Model (JMM) [Manson et.al, POPL’05]

– “Two accesses form a data race in an execution of a program if• they conflict,• they are from different threads and• they are not ordered by happens-before (H-B).”

• Exact H-B computation precise race detection

Page 3: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

3Existing dynamic approachesExisting dynamic approaches

• Vector-clock algorithms [Mattern, 1989]

– Vector clock: For each thread and variable, a vector of logical clocks

• Vector has size T = #threads

– Vector updated at each synchronization operation

• Precise but inefficient in some cases– O(T) computation at each synchronization operation

– Other algorithms use cheaper checks for well-protected variables

• Thread-local variables, variables protected by single locks

• Lockset algorithms [Savage et.al., 1997]

– Lockset: A set of locks protecting access to variable d

– Lockset update rules specific to a synchronization discipline

• Efficient, intuitive, but imprecise– False alarms: Synchronization discipline violated but no race occurred

– Additional mechanisms to reduce false alarms

• State machines for object initialization, escape, thread-locality

Page 4: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

4Our workOur work

• The Goldilocks algorithm– Novel lockset-based method that

precisely computes H-B• As efficient as other lockset algorithms• As precise as vector-clocks• Uniformly captures all synchronization disciplines

• Our locksets contain locks, volatile variables, thread ids

• Theorem: When thread t accesses variable d, there is no race iff

Lockset of d at that point contains t

• Sound: Detects all apparent races that occur in execution• Precise: Race reported Two accesses not ordered by H-B

• No false alarms• No alarms about potential races in similar executions

Page 5: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

5OutlineOutline

• The Goldilocks algorithm

• Implementation

• Evaluation

• Conclusions

Page 6: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

6ExampleExamplea := IntBox()

b := IntBox()

acquire(L1)

acquire(L1)

acquire(L2)

a.x ++

release(L1)

tmp:= a

a := b

b := tmp

class IntBox {

int x;

}

release(L1)

release(L2)acquire(L2)

b.x ++

release(L2)

T1 T2 T3

Global Variablesa, b: IntBox

o1.x, o2.x: int

o1a

o2b

L1

L2

o2a

o1b

L1

L2

Page 7: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

7EraserErasera := IntBox()

b := IntBox()

acquire(L1)

acquire(L1)

acquire(L2)

a.x ++

release(L1)

tmp:= a

a := b

b := tmp

release(L1)

release(L2)

acquire(L2)

b.x ++

release(L2)

T1

T2

T3

LS(o1.x) = {all locks}

No access to o1.x, LS(o1.x) not modified

LS(o1.x) = {all locks} {L1} = {L1}

check LS(o1.x) LH(T1) =

LS(o1.x) = {L1} {L3} = check LS(o1.x) LH(T3) =

Racereported!

Page 8: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

8The happens-before relationThe happens-before relation

p

p

p

sw

sw

Happens-before in JMM: hb

Transitive closure of

• Program orders of threads: p

• Synchronizes-with: sw

• release(l) sw acquire(l)

• vol-write(v) sw vol-read(v)

• fork(t) hb (action of t)

• (action of t) hb join(t)

hb

a.x ++

b.x ++

a := IntBox()

b := IntBox()

acquire(L1)

acquire(L1)

acquire(L2)

release(L1)

tmp:= a

a := b

b := tmp

release(L1)

release(L2)

acquire(L2)

release(L2)

T1

T2

T3

Page 9: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

9Goldilocks intuitionGoldilocks intuition

• LS: (Variables) (Threads Locks Volatiles)

• Update rules maintain invariants:1. Thread t LS(d) t is owner of d

• Accesses to d by t are race-free2. Lock l LS(d) acquire l to become owner of d3. Volatile v LS(d) read v to become owner of d

• When t accesses d: Race-free iff (t LS(d))

• After t accesses d: LS(d) = { t }– t is the only owner of d– Other threads: Must synchronize with t

• In order to become an owner of d

Page 10: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

10Lockset update rulesLockset update rules• Ownership transfer between threads

– LS(d) grows through synchronization actions

• release(l) by tFor each variable d: if (t LS(d)) (add l to LS(d))

• acquire(l) by tFor each variable d: if (l LS(d)) (add t to LS(d))

• volatile-write(v) by tFor each variable d: if (t LS(d)) (add v to LS(d))

• volatile-read(v) by tFor each variable d: if (v LS(d)) (add t to LS(d))

• fork(s) by tFor each variable d: if (t LS(d)) (add s to LS(d))

• join(s) by tFor each variable d: if (s LS(d)) (add t to LS(d))

Page 11: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

11GoldilocksGoldilocksLS(o1.x) =

LS(o1.x) = {T1}First access

LS(o1.x) = {T1, L1}(T1 LS) (add L1 to LS)

LS(o1.x) = {T1, L1, T2}(L1 LS) (add T2 to LS)

LS(o1.x) = {T1, L1, T2, L2}(T2 LS) (add L2 to LS)

LS(o1.x) = {T1, L1, T2, L2, T3}(L2 LS) (add T3 to LS)

LS(o1.x) = {T3}(T3 LS) (No race)

LS(o1.x) = {T3, L2}(T3 LS) (add L2 to LS)

a := IntBox()

b := IntBox()

acquire(L1)

acquire(L1)

acquire(L2)

a.x ++

release(L1)

tmp:= a

a := b

b := tmp

release(L1)

release(L2)

acquire(L2)

b.x ++

release(L2)

T1

T2

T3

LS(o1.x) = {T1, L1, T2}(L2 LS) (add T2 to LS)

LS(o1.x) = {T1, L1, T2}(T2 LS) (add L1 to LS)

Page 12: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

12Uniform handling of many scenariosUniform handling of many scenarios

• Dynamically changing locksets

• Permanent/temporary thread-locality

• Container-protected objects– Lockset of contained variable changes

although variable is not touched

• Synchronization using wait/notify(All)– No additional lockset update rules

• Synchronization using volatile variables– Conditional branches on volatile variables

• Classes in java.util.concurrent package– Semaphores, barriers, ...

Page 13: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

13OutlineOutline

• The Goldilocks algorithm

• Implementation

• Evaluation

• Conclusions

Page 14: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

14ImplementationImplementation

• Naive implementation too inefficient

acquire(l) by thread t

For each variable d: if (l LS(d)) (add t to LS(d))

Implementation features• Short-circuit checks before lockset computation

– Handle thread-locality, unique protecting lock,...

• Lazy evaluation of locksets– Apply update rules at only variable access– Keep synchronization actions in a global event list

• Order of events consistent with p and sw

• Implicit, shared representation of locksets– Use temporary locksets only at access

Global event list

T2, vol-write, v

T1, release, l

T1, vol-read, v

T2, acquire, l

T1, acquire, l

T2, release, l

x

y

Page 15: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

15Implementation in KaffeImplementation in Kaffe• In the Kaffe Virtual Machine

[http://www.kaffe.org]

– Clean room implementation of JVM in C– Full Java platform functionality

• Instrumented byte-code interpreter– Functions executing instructions for synchronization, heap access

• Per thread checking– Each thread checks its own actions– Communication via global event list– Applicable to multiprocessors

Handle-Action (Thread t, Action )

IF is a synchronization action

Add to the global event list

ELSE IF is an access to variable dIF all short-circuit checks fail

Apply-Lockset-Rules(t, d)

Global event list

T2, vol-write, v

T1, release, l

T1, vol-read, v

T2, acquire, l

T1, acquire, l

T2, release, l

Page 16: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

16Short-circuit checksShort-circuit checks

• Sufficient, constant time checks for H-B– If any of them succeed: No race

No need for lockset computation

• Track owner thread– For each variable d, keep the last accessor thread

• owner-thread(d): Current accessor thread

– Succeeds when d remains thread-local

• Track single unique lock– For each variable d, guess a unique protecting lock

• single-lock(d): Random lock held by current accessor thread

– Succeeds as long as d is accessed while holding same lock

Page 17: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

17Lazy evaluation of locksetsLazy evaluation of locksets

o1.x

T1, alloc, o2

T1, alloc, o1

T1, acquire, L1

a := IntBox()

b := IntBox()

acquire(L1)

a.x ++

T1

a := IntBox()

b := IntBox()

acquire(L1)

a.x ++

T1

acquire(L1)

acquire(L2)

release(L1)

tmp:= a

a := b

b := tmp

release(L1)

release(L2)

acquire(L2)

b.x ++

T2

T3

T1, alloc, o2

T1, alloc, o1

T1, acquire, L1

T2, acquire, L1

T2, acquire, L2

T1, release, L1

T2, release, L1

T2, release, L2

T3, acquire, L2

Initialize LS(o1.x) = { T1 }

Repeat Apply lockset rules on LS(o1.x)

Until last synchronization action by T3

Check whether T3 LS(o1.x)

T1, alloc, o2

T1, acquire, L1

T2, acquire, L1

T2, acquire, L2

T1, release, L1

T2, release, L1

T2, release, L2

T1, alloc, o1

T3, release, L2

T3, acquire, L2

Garbage collect unreferenced events

a := IntBox()

b := IntBox()

acquire(L1)

a.x ++

T1

acquire(L1)

acquire(L2)

release(L1)

tmp:= a

a := b

b := tmp

release(L1)

release(L2)

acquire(L2)

b.x ++

T2

T3

release(l)

Page 18: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

18OutlineOutline

• The Goldilocks algorithm

• Implementation

• Evaluation

• Conclusions

Page 19: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

19EvaluationEvaluation

• Algorithms evaluated

– Goldilocks

– Eraser with state machines

– Vector-clocks

Benchmarks

• Microbenchmarks: Interesting, artificial programs– Multiset: Well-protected insertions, deletions, lookups of integers– SharedSpot: Contains variables each protected by a unique lock– LocalSpot: Contains thread-local variables

• Larger programs for performance comparison– Raja, SciMark, Grande

Page 20: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

20

SHAREDSPOT

0

5

10

15

20

25

30

1 2 4 8 16 32 64 128 256

Number of threads

Per

-acc

ess

chec

kin

g t

ime

Eraser

Vector-clock

Goldilocks

MULTISET

020406080

100120140160

4 5 7 9 11 13 19 23 29 35

Number of threads

Per

-acc

ess

chec

kin

g t

ime

Eraser

Vector-clock

Goldilocks

MicrobenchmarksMicrobenchmarks

Interesting cases: Thread-locality, variables protected by single unique locks

Short-circuit checks work

Per-access cost increasesvery slowly with # of threads

LOCALSPOT

0

5

10

15

20

25

30

35

40

1 2 4 8 16 32 64 128 256

Number of threads

Per

-acc

ess

chec

kin

g t

ime

Eraser

Vector-clock

Goldilocks

Page 21: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

21Large benchmarksLarge benchmarks

• Goldilocks much faster than vector clocks

• Performance comparable to Eraser

• Precision comes at little or no extra cost

Benchmark #Thr UninstrumentedRuntime (sec.) Runtime (sec.) Slowdown Runtime (sec.) Slowdown Runtime (sec.) Slowdown

Raja 4 8,7 96,9 11,1 114,5 13,2 67,5 7,8SciMark 3 28,3 49,6 1,8 49,6 1,8 38,6 1,4moldyn 10 13,0 154,3 11,9 184,6 14,2 105,6 8,1montecarlo 10 6,5 147,9 22,8 176,3 27,1 128,2 19,7raytracer 10 1,9 73,1 38,5 87,4 46,0 45,8 24,1sor 6 21,5 100,0 4,7 135,7 6,3 56,9 2,6

Eraser Vector-clock Goldilocks

Page 22: Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets

22ConclusionsConclusions

• The Goldilocks algorithm: A precise lockset-based characterization of the happens-before relation– Sound: Detects all apparent races– Precise: No false alarms– Efficient: Short-circuit checks + Lazy evaluation

• Handles all synchronization disciplines uniformly– Thread-locality, dynamically changing locksets,

volatile variable-based synchronization, ...

• Applicable to both model checking & runtime monitoring

• Future work– Dynamic & static methods based on Goldilocks– Tolerable cost for continuous runtime monitoring

• Tight integration of static methods and Goldilocks