muvi: automatically inferring multi-variable access correlations and detecting related semantic and...
TRANSCRIPT
MUVI: Automatically InferringMulti-Variable Access Correlations andDetecting Related Semantic and Concurrency Bugs
Shan Lu ([email protected])
Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou
University of Illinoishttp://opera.cs.uiuc.edu
Bugs are bad!
Software bugs are costly! Account for 40% of system failures [Marcus2000] Cost US economy $59.5 billion annually [NIST]
Techniques to improve program correctness are desired
Software bug categories
Memory bugs Improper memory accesses and usage A lot of study and effective detection tools
Semantic bugs Violation to the design requirements or programmer intentions Biggest part (~80%*) of software bugs No silver bullet
Concurrency bugs Wrong synchronization in concurrent execution Increasingly important with the pervading concurrent program
trend Hard to detect
* Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06]
An important type of semantic information
Software programs contain many variables
Variables are NOT isolated Semantic bond exists among variables
Correct programs consistently access correlated variables
xy
z
s
t
u
v w
Variable Access Correlation
Variable correlation in programs
Semantic correlation widely exists among variables
struct fb_var_screeninfo
{ …
int red_msb;
int blue_msb;
int green_msb;
int transp_msb;
}Linux
Different aspects
struct net_device_stats
{
…
long rv_packets
long rv_bytes;
}Linux
Different representatio
n
struct st_test_file *
cur_file;
struct st_test_file *
file_stack;
MySQL
Implementation-demand
Class THD
{
…
char* db;
int db_length;
} MySQL
Constraint specification
M Y BD
4
write ( ) write ( )
Variable access correlation ( constraint )
Maintaining correlation usually needs consistent access
db db_length
red/…/transp red/…/transp
A1 ( x ) A2 ( y )access
readwrite
accessreadwrite
rv_packets rv_bytes
file_stack cur_file
write ( ) access* ( )
write ( ) write ( )
access ( ) access ( )
Variable access correlation
*access: read or write
Violating the correlations leads to bugs
Programmers may forget to access correlated variables
A type of semantic bugs not handled by previous tools
Correlated variables
struct fb_var_screeninfo{ … int red_msb; int blue_msb; int green_msb; int transp_msb;}
Mostly consistent access
--- correct
Inconsistent access
--- BUG!
int imsttfb_check_var ( … ){
...var->red_msb = 0;var->green_msb = 0;var->blue_msb = 0;var->transp_msb = 0;…
}
int neofb_check_var (...){
... var->red_msb=0; var->green_msb=0; var->blue_msb=0; /* forget transp_msb!!*/
...}
Confirmed by Linux developers
Inconsistent update bugs
More examples of inconsistent update bugs
are in our paper.
Programmers may forget to synchronize concurrent accesses to correlated variables
This is NOT a traditional data race bug Bug occurs even if accesses to each single variable are well sy
nchronized
js_FlushPropertyCache ( … ) {
memset ( cachetable, 0, SIZE);
…
cacheempty = TRUE;
}
js_PropertyCacheFill ( … ) {
cachetable[indx] = obj;
…
cacheempty = FALSE;
}
Violating the correlations leads to bugs (ii)
Multi-variable concurrency bugs
struct JSCache {
…
JSEntry table[SIZE];
bool empty;
}
Thread 1 Thread 2
lock ( T )
unlock ( T )
unlock ( T )lock ( E )
unlock ( E )
Mozilla
lock ( T )
unlock ( E )
lock ( E )BUG
Our contribution
A technique to automatically infer variable access correlation
Bug detection based on variable access correlation Inconsistent-update semantic bugs Multi-variable concurrency bugs
Disclose correlations and new bugs from real-world applications (Linux-device_driver, Mozilla, MySQL, Httpd)
> 6000 variable correlations 39 new inconsistent-update semantic bugs 4 new multi-variable concurrency bugs from Mozilla
Outline
Motivation What is variable access correlation
MUVI variable access correlation inference MUVI bug detection
Inconsistent-update semantic bug detection Multi-variable concurrency bug detection
Evaluation Conclusions
Access correlatio
n
Basic idea of correlation inference Our target:
Our inference method:
Assumption: mature program, mostly correct x and y appear together in many times x and y seldom appear separately
Statistically infer access correlation based on variable access pattern in source code
access correlationA1 ( x ) A2 ( y )
How to judge ``together’’?
Our metric: static code distance within a function scope Our paper talks about other potential metricsHow to do this efficiently?
Frequent itemset mining
A common data mining technique
Itemset: a set of items ( no order ) E.g. (v, w, x, y, z)
Sub-itemset: E.g. (w, y)
Itemset database Goal: find frequent sub-itemsets
in an itemset database Support: number of appearances
E.g. support of (w, y) is 3 Frequent: support > threshold
(v, x, m, n)
(v, w, y, t )
(v, w, y, z, s )
( v, w, x, y, z )
Flowchart of variable correlation inference
Source files
Mining
Frequent variable sets
Itemset Database
Pre-processing
Variable access correlation
Post-processing
How?
How?
MUVI Inference algorithm (pre-process)
ProgramSource Code Itemset
Database
?
What is an item? A variable
What is an itemset? A function
What to put into an itemset? Accessed variables Access type
(read/write)
MUVI Inference algorithm (pre-process)
Input: program Output: an itemset database Flow-insensitive, inter-procedural analysis
Consider Global variables and structure-typed variables Also consider variables accessed in callee functions
………
{read, z}f3
{write, S::y}f2
{read, x}f1
Databaseint x;f1 ( ) {
read x;
}
f2 ( ) {
S t; write t.y;}
int z;f3 ( ) { read z; f1 ( ); f2 ( );}
{read, x}
{write, S::y}
f1 f2
f3
MUVI Inference algorithm (post-process)
Input: frequent variable sets (x, y), which appear together in many functions
Pruning What if x and y appear separately many times?
Prune out low confidence (conditional probability) pairs What if x is too popular, e.g. stderr, stdout?
Categorize based on access type write (x) write (y)? Or write (x) read (y)? etc.
Output: variable correlation A1 ( x ) A2 ( y )
Outline
Motivation MUVI variable access correlation inference MUVI bug detection
Inconsistent-update semantic bug detection Multi-variable concurrency bug detection
Evaluation Conclusions
Inconsistent-update bug detection
Step 1: get all write(x)acc(y) correlations Step 2: get all violations to above
correlations Step 3: prune out unlikely bugs
Code analysis to check caller and callee functions
write (fb_var_screeninfo::blue_msb) access (fb_var_screeninfo::transp_msb)
#support = 11 #violation = 1 (function neofb_check_var)
inconsistent-update bug
int neofb_check_var (...){ ... var->red_msb=0; var->green_msb=0; var->blue_msb=0; /* forget transp_msb!!*/ ...}
Thread 1 Thread 2
cacheà table[indx] = obj;
cacheà empty = FALSE;
memset (cacheà table,0,SIZE) ;
cacheà empty = TRUE;A2 ( x )
Thread 1 Thread 2
A1 ( x )
Lock-Set
L (A2) ∩ L (A1) = Ф ?
Multi-variable concurrency bug detection-- MUVI Lock-set algorithm
Original algorithm Look for common locks among conflicting accesses to each
shared variable MV Lock-Set algorithm
Look for common locks among conflicting accesses to each shared variable and their correlated accesses
L (A2) ∩ L (A1) ∩ L (A3) = Ф ?
Lock-Set MVA3 ( y )
Lock ( T )
Unlock ( T )
Lock ( E )
Unlock ( E )
Lock ( T )
Unlock ( T )
Lock ( E )
Unlock ( E )
Multi-variable concurrency bug detection-- Other MUVI extension algorithm
MUVI happens-before algorithm Check the happens-before relation among
conflicting accesses to each single variable Check the happens-before relation among
conflicting accesses to each single variable and correlated accesses
Other extension Extending hybrid race detection Extending atomicity violation bug detection
Outline
Motivation MUVI variable access correlation inference MUVI bug detection
Inconsistent-update semantic bug detection Multi-variable concurrency bug detection
Evaluation Conclusions
Methodology
For variable correlation and inconsistent-update bug detection:
Linux (device driver) Mozilla MySQL PostgreSQL
For multi-variable concurrency bug detection:
Five existing real bugs from Mozilla and MySQL
All latest versions
Find four new multi-variable concurrency bugs during the detection process
Results on correlation inference
App. #Access-Correlati
on
#Involved
Variables
%False Positive
s
Analysis Time
Mozilla 1431 1380 16% 157m
MySQL 726 703 13% 19m
Linux 3353 3038 19% 175mPostgre-SQL 939 833 15% 98m
Macro, inline functionscoincidence
Inconsistent-update bug detection results
App. # of MUVI
bug report
# of new bugs found
# of bad programmi
ng
# of false
positives
Linux 40 22 (12) 5 13
Mozilla 30 7 (0) 8 15
MySQL 20 9 (5) 3 8Postgre-SQL 10 1 (0) 4 5Semantic
exceptionsWrong correlationsNo future read access
MV-Happens-Before has similar results
Multi-variable concurrency bug detection results
Bug MV-Lockset
Detect Bug? False Positive
Moz-js1 Y 1
Moz-js2 Y 2
Moz-imap Y 0
MySQL-log Y 3
MySQL-blog N 0
Variables are conditionally correlatedThe correlation is missed by MUVI
Multi-variable concurrency bug detection results
4 new multi-variable concurrency bugs detected!
Thread 1 Thread 2js_NewString( … ){ // allocate a new string JS_ATOMIC_INCREMENT (&(rt->totalStrings));
PR_Lock(rtLock); rt->lengthSum += length; PR_Unlock(rtLock);} Mozilla jsstr.h Mozilla jsstr.c
printJSStringStats ( ... ) {
count = rt à totalStrings; mean = rt à lengthSum / count; printf ( …… );
}
struct JSRuntime { int totalStrings; /* # of allocated strings*/ double lengthSum; /* Total length of allocated strings */ }
Mozilla jscntxt.h
Wrong result!
Conclusion
Variable access correlations can be inferred
Variable access correlation is important Help detect two types of bugs Other usage
Provide specifications to ease programming Provide hints for assigning locks or TMs
E.g. AtomicSet, AutoLocker, Colorama
Related works
Program specification inference [ErnstICSE00], [EnglerSOSP01], [KremenekOSDI06],
[LiblitPLDI03], [WhaleyISSTA02], [YangICSE06], etc. Code pattern mining
[LiOSDI04], [LiFSE05], [LivshitsFSE05], etc. Concurrency bug detection
[ChoiPLDI02], [EnglerSOSP03], [FlanaganPOPL04], [SavageTOCS97], [Praun01], [XuPLDI05], [YuSOSP05], etc.
Techniques for easing concurrent programming [Harris03], [HerlihyISCA93], [McCloskeyPOPL06],
[Rajwar02], [Hammond04], [Moore6], [Rossbach07], etc.
Acknowledgement
Prof. Stefan Savage (shepherd) Anonymous reviewers Prof. Liviu Iftode GOOGLE student travel grant NSF, DOE, Intel research grants
Thanks!
http://opera.cs.uiuc.edu