Pacer: Proportional Detection of Data RacesMichael BondKatherine CoonsKathryn McKinleyUniversity of Texas at Austin
Detecting data races in production
Overhead
FastTrack[Flanagan & Freund ’09]
80x 8x
Overhead
FastTrack[Flanagan & Freund ’09]
creads&writes + csync n
Number of threads
Overhead
FastTrack[Flanagan & Freund ’09]
creads&writes + csync n
Problemin future
Problemtoday
Overhead
FastTrack[Flanagan & Freund ’09]
creads&writes + csync n
Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)
Sampling rate
Overhead
FastTrack[Flanagan & Freund ’09]
creads&writes + csync n
Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)
Sampling periods Non-sampling periods
Overhead
FastTrack[Flanagan & Freund ’09]
creads&writes + csync n
Pacer (creads&writes + csync n) r + cnon-sampling (1 – r)
Probability (detecting any race)
FastTrack 1
Pacer r
Detect race first access sampled
Sampling period
Thread A Thread B
Non-sampling period
Sampling period
Non-sampling period
Non-sampling period
Thread A Thread B
write x
read x
read y
write y
Insight #1:Stop tracking variable after
non-sampled access
Thread A
write x
unlock m
Thread B
Thread A
write x
unlock m
Thread B
lock m
Thread A
write x
unlock m
Thread B
lock m
write x
Thread A
write x
unlock m
read x
Thread B
lock m
write x
Thread A
write x
unlock m
read x
Thread B
lock m
write xRace!
Thread A
write x
unlock m
read x
Thread B
lock m
write xRace!
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
Vector clocks
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
Vector clocks
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
Vector clocks
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5@A
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 2
5@A
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
6 25 2
5@A
Incrementclock
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
6 2
5 4
5 2
Joinclocks
5@A
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5 2
5@A
6 2
Happens before?
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5@A
5 2
6 2
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5@A
5 2
6 2
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5@A
5 2
6 2
No work performed
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5@A
5 2
6 2
Race uncaught
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5 2
6 2
4@B
A B A B
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4
5 4
5 2
6 2
4@B
Happens before?Race!
A B A B
Insight #2: We only care whether“A happens before B”
if A is sampled
Thread A Thread B
Do these events happen before other events?We don’t care!
Increment clocks
Thread A Thread B
Don’t increment clocks
Increment clocks
Don’t increment clocks
Don’t increment clocks
Do these events happen before other events?We don’t care!
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4A B A B
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
5 4
5 2
No clock increment
A B A B
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
5 4
5 2
5 2
A B A B
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
5 4
5 2
5 2
Unnecessary join
A B A B
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
5 4
5 2
5 2
O(n) O(1)
A B A B
Implementation
http://jikesrvm.org/Research+Archive
Performance
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0
5
10
15
20
eclipsehsqldbxalanpseudojbb
Sampling rate
Slow
dow
n
1
Performance
Qualitative improvementin time & space
Accuracy
Probability (detecting any race) = r?
Per-Race Accuracy(eclipse, r = 1%)
0%
1%
8%
Distinct races (ordered by detection rate)
Det
ectio
n ra
te
Related Work
LiteRace [Marino et al. ’09]
Cold-region hypothesis [Chilimbi & Hauswirth ’04]
Full analysis at synchronization operations
Deployable Race Detection
Accuracy, time, space sampling rateDetect race first access sampled
Deployable Race Detection
Accuracy, time, space sampling rateDetect race first access sampled
Qualitative improvement
Deployable Race Detection
Accuracy, time, space sampling rateDetect race first access sampled
Qualitative improvementHelp developers fix difficult-to-reproduce bugs
Deployable Race Detection
Accuracy, time, space sampling rateDetect race first access sampled
Qualitative improvementHelp developers fix difficult-to-reproduce bugs
Thank you
Backup
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
A B A B
Example: “Timeless” Non-Sampling Periods
5 4
v6
Vector clock versions
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4
5 4
A B A B
Example: “Timeless” Non-Sampling Periods
v6
5 2 v6
v6
Thread A
unlock m1
…
unlock m2
Thread B
lock m1
…
lock m2
5 2 3 4A B A B
Example: “Timeless” Non-Sampling Periods
v6
5 2 v6
5 2 v6
Join unnecessary
5 4v6
Per-Race Accuracy (Eclipse)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 270%
1%
10%
100%
r = 25%r = 10%r = 5%r = 3%r = 1%
Distinct races (each line sorted by detection rate)
Det
ectio
n ra
te
Space Performance (eclipse)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
200
400
600
800
r=100%r=25%r=5%r=1%Base
Fraction of program execution
Live
mem
ory
(MB)
Performance (0-10% sampling rate)
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0
1
2
3
4
eclipsehsqldbxalanpseudojbb
Sampling rate
Slow
dow
n
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0
1
2
3
4
eclipsehsqldbxalanpseudojbb
Sampling rate
Slow
dow
n
33% base overhead
52% over-head
Performance (0-10% sampling rate)
0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10%0
1
2
3
4
eclipsehsqldbxalanpseudojbb
Sampling rate
Slow
dow
n
33% base overhead
52% over-head
Performance (0-10% sampling rate)
Qualitative improvement
Methodology
Core 2 Quad (4 cores) Multithreaded benchmarks (DaCapo & SPECjbb2000)
Evaluating sampling-based race detection Need 100s of trials to evaluate Some races are rare Evaluate only frequent races
Data Races
Two accesses to same variable (one is a write)
One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write
Data Races
Thread A
write x
unlock m
Thread B Two accesses to same variable (one is a write)
One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write
Data Races
Thread A
write x
unlock m
Thread B
lock m
write x
Two accesses to same variable (one is a write)
One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write
Data Races
Thread A
write x
unlock m
read x
Thread B
lock m
write x
Two accesses to same variable (one is a write)
One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write
Data Races
Thread A
write x
unlock m
read x
Thread B
lock m
write xRace!
Two accesses to same variable (one is a write)
One access doesn’t happen before the other Program order Synchronization order▪ Acquire-release▪ Wait-notify▪ Fork-join▪ Volatile read-write
Why Do We Care?
Races indicate Atomicity violations Order violations
Why Do We Care?
Races indicate Atomicity violations Order violations
Races lead to Sequential consistency violations
No races sequential consistency (Java/C++) Races writes observed out of order
Why Do We Care?
Races indicate Atomicity violations Order violations
Races lead to Sequential consistency violations
No races sequential consistency (Java/C++) Races writes observed out of order
Most races potentially harmful [Flanagan & Freund ’10]
Producer-Consumer Example
class ProducerConsumer { boolean ready; int x;
produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
Does It Race?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
Does It Race?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
So What?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
So What?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
Can read old value
So What?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { … = x; while (!ready) { } }}
Legal reordering by compiler or hardware
How to Fix?
class ProducerConsumer { boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
Properly Synchronized
class ProducerConsumer { volatile boolean ready; int x; T1 T2 produce() { x = … ; ready = true; }
consume() { while (!ready) { } … = x; }}
Happens- before edge
Example #2
class LibraryBook { Set<Person> borrowers;}
Initialization on Demand
class LibraryBook { Set<Person> borrowers;
addBorrower(Person p) { if (borrowers == null) { borrowers = new HashSet<Person>(); } borrowers.add(p); }}
Synchronized but Slow?
class LibraryBook { Set<Person> borrowers;
addBorrower(Person p) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } borrowers.add(p); }}
Double-Checked Locking
class LibraryBook { Set<Person> borrowers;
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } } borrowers.add(p); }}
Does It Race?
class LibraryBook { Set<Person> borrowers;
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet<Person>(); } } } borrowers.add(p); }}
Does It Race?
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { borrowers = new HashSet(); } } }
...
borrowers.add(p);}
addBorrower(Person p) {
if (borrowers == null) { ...
}
borrowers.add(p);
}
So What?
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; obj.<init>(); borrowers = obj; } } }
...
borrowers.add(p);}
addBorrower(Person p) {
if (borrowers == null) { ...
}
borrowers.add(p);
}
So What?
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj; obj.<init>(); } } }
...
borrowers.add(p);}
addBorrower(Person p) {
if (borrowers == null) { ...
}
borrowers.add(p);
}
So What?
addBorrower(Person p) { if (borrowers == null) { synchronized (this) { if (borrowers == null) { HashSet obj = alloc HashSet; borrowers = obj;
obj.<init>(); }}} ... borrowers.add(p);}
addBorrower(Person p) {
if (borrowers == null) { ...
}
borrowers.add(p);
}
Performance vs. Accuracy
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0
5
10
15
20
eclipsehsqldbxalanpseudojbb
Detection rate ( sampling rate)
Slow
dow
n
Performance vs. Accuracy
-1% 1% 3% 5% 7% 9% 11% 13% 15%0
1
2
3
4
5
eclipsehsqldbxalanpseudojbb
Detection rate ( sampling rate)
Slow
dow
n
33% base overhead
~50% overhead
Accuracy & Performance
Program alone FastTrack Pacer
Detection rate 0 occurrence rate occurrence rate × r
Running time t t(c1 + c2n) t[(c1 + c2n)r + c3]
Evaluate only frequent races Evaluate scaling with r Don’t evaluate scaling with n
Northeast Blackout of 2003
Northeast Blackout of 2003
50 million people
Northeast Blackout of 2003
Energy Management System Alarm and Event Processing Routine (1 MLOC)
http://www.securityfocus.com/news/8412
Northeast Blackout of 2003
Energy Management System Alarm and Event Processing Routine (1 MLOC)
Post-mortem analysis: 8 weeks"This fault was so deeply embedded, it took them weeks of poring through millions of lines of code and data to find it.” –Ralph DiNicola, FirstEnergy
http://www.securityfocus.com/news/8412
Northeast Blackout of 2003
Race condition Two threads writing to data structure simultaneously
Usually occurs without error Small window for causing data corruption
http://www.securityfocus.com/news/8412
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)
Vector Clock-Based Race Detection
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)
FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown
Vector Clock-Based Race Detection
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)
FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown
Problem today
Problem in future
Vector Clock-Based Race Detection
Tracks happens-before: sound & precise 80X slowdown Each analysis step: O(n) time (n = # of threads)
FastTrack [Flanagan & Freund ’09] Reads & writes (97%): O(1) time Synchronization (3%): O(n) time 8X slowdown
Vector Clock-Based Race Detection
Vector Clock-Based Race Detection
Thread A Thread B
5 2 3 4A B A B
Vector clocks
Thread A Thread B
5 2 3 4A B A B
Vector clocks
Thread A’s logical time Thread B’s logical time
Vector Clock-Based Race Detection
Thread A Thread B
5 2 3 4A B A B
Vector clocks
Last logical time “received” from B
Last logical time “received” from A
Vector Clock-Based Race Detection
5 2 3 4A B A B
Vector Clock-Based Race Detection
Thread A
unlock m
Thread B
lock m6 2Increment
clock
5 2 3 4A B A B
Vector Clock-Based Race Detection
Thread A
unlock m
Thread B
lock m6 2
5 4
5 2
Joinclocks
5 2 3 4A B A B
Vector Clock-Based Race Detection
Thread A
unlock m
Thread B
lock m6 2
5 4 n = # of threads
O(n) time
5 2
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5@A
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5@A
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
6 2
5@A
5 2
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
6 2
5@A
5 2
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
6 2
5 4
5@A
5 2
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5 4
5@A
6 2Happens before?5 2
5@A
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5 4
6 2
4@B
5 2
5@A
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5 4
6 2
Happens before?
4@B
5 2
5@A
Vector Clock-Based Race Detection
Thread A
write x
unlock m
read x
Thread B
lock m
write x
5 2 3 4A B A B
5 4
6 2
Happens before?
4@BRace!
5 2
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Prior Work Isn’t Deployable
Sampling rate
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Running time t(c1 + c2n)
(Theoretical) Accuracy & Performance
No. of threads
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Running time t(c1 + c2n)
(Theoretical) Accuracy & Performance
Reads & writes Synchronization
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Running time t(c1 + c2n)
(Theoretical) Accuracy & Performance
Reads & writes
Problem today Problem in future
Synchronization
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Running time t(c1 + c2n) t[(c1 + c2n)r + c3]
Overhead in sampling periods
(Theoretical) Accuracy & Performance
FastTrack[Flanagan & Freund ’09]
Pacer
Detection rate occurrence rate occurrence rate × r
Running time t(c1 + c2n) t[(c1 + c2n)r + c3]
Overhead in sampling periods
Overhead in non-sampling periods (small)
(Theoretical) Accuracy & Performance
Pacer
Pacer
Detecting Data Races in Production
Data race occurs extremely rarely
Data race occurs extremely rarely
Data race occurs periodically
Pre-deployment Deployed
Detecting Data Races in Production
“We test exhaustively … we had in excess of three million online operational hours [342 years] in which
nothing had ever exercised that bug.”–Mike Unum, manager of commercial solutions, GE Energy
http://www.securityfocus.com/news/8412
Detecting Data Races in Production
Data race buggy execution