accurate and efficient filtering for the intel thread checker race detector

Post on 05-Jan-2016

28 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Accurate and Efficient Filtering for the Intel Thread Checker Race Detector. By Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas. 2014-10-23. OS Lab. Ok-Kyoon Ha. 2006 ACM. Motivation. debugging data races is a difficult task - PowerPoint PPT Presentation

TRANSCRIPT

Accurate and Efficient Filtering for the Intel Thread Checker Race Detector

By Paul Sack, Brian E. Bliss, Zhiqiang Ma, Paul Petersen, Josep Torrellas

04/20/23

OS Lab Ok-Kyoon Ha2006 ACM

SBMP06 2

Motivation

debugging data races is a difficult task

detector has two common types of algorithms - Lockset-based algorithm & Vector clock-based algorithm

data race-detection tools- have reasonable overheads (2x slowdowns)

- do not provide much useful information or have limited usage models

Intel Thread Checker

- provide an abundance of useful information and have few usage constrains

- have high performance costs (233x slowdowns)

SBMP06 3

Overheads of Intel’s Thread Checker

- instrumentation alone: slowdown of 22x

- full algorithm: slowdown of 233x

- memory overhead: imposes a 20x

SBMP06 4

Approach

Objective- to reduce the amount of work done by the algorithm

Filtering useless references

SBMP06 5

Three Filters (1/3)

Stack Filter- filter if one thread accesses another’s stack

- cannot cause data races to be lost and is very efficient

Implementation Issues of Stack Filter- the simplest filter and has the lowest overhead

- compares the memory reference address with the stack base and limit address

SBMP06 6

Three Filters (2/3)

Duplicate Filter- maintain the first load and store references to a variable in each segments

- filter duplicate references in segments

- can only cause Thread Checker to lose duplicate data races

Implementation Issues of Duplicate Filter- slower than the stack filter

- maintains filter tables that organized 4 fields

add size type ID

add size type ID

T1

T2

SBMP06 7

Three Filters (3/3) FSM Filter

- base the Eraser state machine

- filter reference in the Private state and in the Shared Read Only state

- filter the initial references (Uninit → Private, Private → SHD RO)

R, WR

R1, W1

UNINIT PRIVATE

SHR RW SHR RO

Eraser state machine

R1, W1

W

W’R’

SBMP06 8

Experimental Setup

Environments- 4-way 2.5GHz Pentium 4 workstation

- use the SPLASH-2 applications

- run with 4 threads on 4 processors

Measurements- filtering statistics are collected by running each application three times

- performance results are collected by running each application nine times

- each application is run in Thread Checker with and without three filters

- compare the number of data-race bugs reported with and without the filters

SBMP06 9

Filtering Effectiveness

Different filter combinations Incremental filtering effectiveness

SBMP06 10

Performance

Speedups obtained with filtering

SBMP06 11

Data-race Detection

Characterizing the impact of the three filers combined

SBMP06 12

Conclusions and Future Work

Conclusion- Intel Thread Checker slowdown of 233x on average

- filtering out the vast majority of memory references

- develop three filters that filter 98% of all memory references

- speedups of 3.3x on average

Future Work- improve the FSM filter

- to improve the other overhead sources in Thread Checker

top related