![Page 1: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/1.jpg)
Demand-Driven Software Race
Detection using Hardware
Performance CountersPerformance Counters
Joseph L. Greathouse†, Zhiqiang Ma‡, Matthew I. Frank‡
Ramesh Peri‡, Todd Austin†
†University of Michigan ‡Intel Corporation
CSCADS
Aug 2, 2011
![Page 2: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/2.jpg)
In spite of proposed hardware solutions
Concurrency Bugs Still Matter
Bulk Memory
Commits
2
Commits
TRANSACTIONAL
MEMORYAMD
ASF
?
Sun
Rock
?
![Page 3: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/3.jpg)
Concurrency Bugs Matter NOW
if(ptr==NULL)
if(ptr==NULL)
TIME
Thread 2mylen=large
Thread 1mylen=small
len2=thread_local->mylen;
ptr=malloc(len2);Nov. 2010 OpenSSL Security Flaw
3
memcpy(ptr, data2, len2)
ptrLEAKED
TIME
∅
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
Nov. 2010 OpenSSL Security Flaw
![Page 4: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/4.jpg)
This Talk in One Sentence
Speed up software race detection
with existing hardware support.with existing hardware support.
4
![Page 5: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/5.jpg)
Software Data Race Detection
� Add checks around every memory access
� Find inter-thread sharing events
� Synchronization between write-shared
accesses?
� No? Data race.
5
![Page 6: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/6.jpg)
Thread 2mylen=large
Thread 1mylen=small
if(ptr==NULL)
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
Example of Data Race Detection
ptr write-shared?Interleaved
Synchronization?
TIME
if(ptr==NULL)
len2=thread_local->mylen;len2=thread_local->mylen;
ptr=malloc(len2);ptr=malloc(len2);
memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)
6
TIME
![Page 7: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/7.jpg)
SW Race Detection is Slow
150
200
250
300
Race Detector Slowdown (x) Phoenix PARSEC
7
0
50
100
his
tog
ram
km
ea
ns
line
ar_
reg
rC
ma
trix
_m
ulC
pca
str
ing_
ma
tch
wo
rd_
co
un
t
Ge
oM
ea
n
bla
cksch
ole
s
bo
dytr
ack
face
sim
ferr
et
fre
qm
ine
raytr
ace
sw
ap
tio
ns
flu
ida
nim
ate
vip
s
x2
64
ca
nn
ea
l
de
du
p
str
ea
mclu
sC
Ge
oM
ea
n
Race Detector Slowdown (x
![Page 8: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/8.jpg)
Goal of this Work
Accelerate Software Data Race Detection
Technique #1: Making it Fast
Demand-Driven Data Race Detection
Technique #2: Keeping it Real
Find sharing events with existing HW
8
![Page 9: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/9.jpg)
Inter-thread Sharing is What’s Important
“Data races ... are failures in programs that access and
update shared data in critical sections” – Netzer & Miller, 1992
if(ptr==NULL)
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
Thread-local data
NO SHARING
Shared data
NO INTER-THREAD
TIME
9
if(ptr==NULL)
memcpy(ptr, data1, len1)
len2=thread_local->mylen;len2=thread_local->mylen;
ptr=malloc(len2);ptr=malloc(len2);
memcpy(ptr, data2, len2)memcpy(ptr, data2, len2)
NO INTER-THREAD
SHARING EVENTS
TIME
![Page 10: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/10.jpg)
Very Little Inter-Thread Sharing
Phoenix PARSEC
40
50
60
70
80
90
100
Sharing Events
1.5
2
2.5
3
Sharing Events
10
0
10
20
30
40
% Write-Sharing Events
0
0.5
1
% Write-Sharing Events
![Page 11: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/11.jpg)
Technique 1: Demand-Driven Analysis
Multi-threadedApplication
SoftwareRace Detector
11
Local
Access
Inter-thread
sharing
Inter-thread Sharing Monitor
![Page 12: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/12.jpg)
Inter-thread Sharing Monitor
� Check each memory op. for write-sharing
� Signal software race detector on sharing
� Possible to do in software
+ Can be built now with instrumentation
– Slow. May take as long as race detection
12
![Page 13: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/13.jpg)
� Follow read/write sets of threads
Fast user-level faults
Sharing Monitor
Thread 1
WRITE Y
Ideal Hardware Sharing Detector
Thread 2
READ Y
Thread 1
WRITE Y
T1
R: ∅
W: ∅
T2
R: ∅
W: ∅
Thread 2
READ Y
T1
R: ∅
W: {Y}
W->R
Sharing
� Fast user-level faults
13
Multi-threadedApplication
SoftwareRace Detector
Inter-thread Sharing Monitor
![Page 14: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/14.jpg)
Limitations of Existing Hardware
� Fast faults
� Solution: Enable detector for long periods of time
Multi-threadedApplication
SoftwareRace Detector
NO
� Read/write sets
� Solution:
14
Inter-thread Sharing Monitor
NO
SHARING
![Page 15: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/15.jpg)
Technique 2: Hardware Sharing Detector
� Hardware Performance Counters
� Interrupt on cache coherency events
� Intel’s HITM event: W→R Data Sharing
S
M
S
IHITM
� Limitations of this method:
� SMT sharing can’t be counted
� Cache eviction
� Others in paper
15
M IHITM
![Page 16: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/16.jpg)
Demand-Driven Analysis on Real HW
Execute
Instruction
Disable HITM AnalysisNO
NO
16
SW Race
Detection
Enable
Analysis
AnalysisInterrupt?
Sharing
Recently?
Enabled?
NOYES
YES
YES
![Page 17: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/17.jpg)
Experimental Evaluation
� Modified Intel Inspector XE Race Detector
� Linux on 4-core Core i7, no Hyper-Threading
� Performance Tests:
� Phoenix Suite� Phoenix Suite
� PARSEC
� Accuracy Tests:
� Phoenix Suite
� PARSEC
� Pre-release version of RADBench
17
Simulation
![Page 18: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/18.jpg)
Performance Difference
150
200
250
300
Race Detector Slowdown (x) Phoenix PARSEC
18
0
50
100
his
tog
ram
km
ea
ns
line
ar_
reg
rC
ma
trix
_m
ulC
pca
str
ing_
ma
tch
wo
rd_
co
un
t
Ge
oM
ea
n
bla
cksch
ole
s
bo
dytr
ack
face
sim
ferr
et
fre
qm
ine
raytr
ace
sw
ap
tio
ns
flu
ida
nim
ate
vip
s
x2
64
ca
nn
ea
l
de
du
p
str
ea
mclu
sC
Ge
oM
ea
n
Race Detector Slowdown (x
![Page 19: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/19.jpg)
Performance Increases
8
10
12
14
16
18
20
driven Analysis
Speedup (x)
Phoenix PARSEC
51x
19
0
2
4
6
8
his
tog
ram
km
ea
ns
line
ar_
reg
rC
ma
trix
_m
ulC
pca
str
ing_
ma
tch
wo
rd_
co
un
t
Ge
oM
ea
n
bla
cksch
ole
s
bo
dytr
ack
face
sim
ferr
et
fre
qm
ine
raytr
ace
sw
ap
tio
ns
flu
ida
nim
ate
vip
s
x2
64
ca
nn
ea
l
de
du
p
str
ea
mclu
sC
Ge
oM
ea
n
Demand-driven Analysis
Speedup (x
![Page 20: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/20.jpg)
Demand-Driven Analysis Accuracy
8
10
12
14
16
18
20
driven Analysis
Speedup (x)
1/1 2/4 3/3 4/4 3/3 4/4 4/42/4 4/4 4/42/4
Accuracy vs.
Continuous Analysis:
20
0
2
4
6
8
his
tog
ram
km
ea
ns
line
ar_
reg
rC
ma
trix
_m
ulC
pca
str
ing_
ma
tch
wo
rd_
co
un
t
Ge
oM
ea
n
bla
cksch
ole
s
bo
dytr
ack
face
sim
ferr
et
fre
qm
ine
raytr
ace
sw
ap
tio
ns
flu
ida
nim
ate
vip
s
x2
64
ca
nn
ea
l
de
du
p
str
ea
mclu
sC
Ge
oM
ea
n
Demand-driven Analysis
Speedup (x)
Continuous Analysis:
97%
![Page 21: Demand-Driven Software Race Detection using Hardware ...cscads.rice.edu/Peri-RaceDetection-CScADS-2011.pdf · Demand-Driven Software Race Detection using Hardware Performance Counters](https://reader033.vdocuments.site/reader033/viewer/2022060520/604e1510b62fbb70032827d5/html5/thumbnails/21.jpg)
Future Directions
� Better Performance
� Fast user-level faults
� Application specific hardware
� More Accuracy
� Better performance counters� Better performance counters
� Inform SW on cache evictions/misses
� Smooth transition to ideal hardware
� Combine sampling & demand-driven analysis
21