real world concurrency bug characteristics
TRANSCRIPT
Learning from Mistakes –Real World Concurrency Bug Characteristics
Yuanyuan(YY) ZhouAssociate ProfessorUniversity of Illinois, Urbana-Champaign
ASPLOS 2008
Paper contributions
Concurrency bug detection What types of bugs are un-solved? How bugs are diagnosed in practice?
Concurrent program testing What are the manifestation conditions for concurrency bugs?
Concurrent program language design How many mistakes can be avoided What other support do we need?
All can benefit from a closer look at real world concurrency bugs
105 real-world concurrency bugs from 4 large open source programs Study from 4 dimensions
Bug patterns Manifestation condition Diagnosing strategy Fixing methods
Study methodology
Bug report
8 years10 years7 years6 yearsBug DB history
640.32LOC (M line)
C++C++Mainly CC++/CLanguage
GUIClientServerServerSoftware Type
OpenOfficeMozillaApacheMySQL
Applications
Limitations No scientific computing applications No JAVA programs Never-enough bug samples
2
6
OpenOffice
31
74
Total
1649Deadlock
411314Non-deadlock
MozillaApacheMySQL
Bugs analyzed
Classified based on root causes Categories
Atomicity violation The desired atomicity of certain
code region is violated Order violation
The desired order between two (sets of) accesses is flipped
Others
X
X
Thread 1 Thread 2
Thread 1 Thread 2
Pattern
Bug patterns
Bug patterns: atomicity bug
Bug patterns: order violation bug
Bug patterns: underestimating #threads
Bug patterns: order violation bug
Bug patterns: order violation bug
We should focus on atomicity violation and order violation
Bug detection tools for order violation bugs are desired
*There are 3-bug overlap between Atomicity and Order
Implications
Atomicity Order Other0
10
20
30
40
50
OpenOfficeMozillaApacheMySQL
Bug pattern presence in analyzed softwares
Bug manifestation condition A specific execution order among a smallest set of memory accesses
The bug is guaranteed to manifest, as long as the condition is satisfied
How many threads are involved? How many variables are involved? How many accesses are involved?
Manifestation
Bug manifestation study
Single variables are more common The widely-used simplification is reasonable
Multi-variable concurrency bugs are non-negligible Techniques to detect multi-variable concurrency bugs are needed
1 Variable > 1 Variables0
20
40
60OpenOffice Mozilla Apache MySQL
49 (66%)
25 (34%)
Implications
#bugs
Bug manifestation: how many variables?
Bug patterns: multi-variable
1 acc. 2 acc. 3 acc. 4 acc. >4 acc.0
5
10
15
20
25
30
35
OpenOffice Mozilla Apache MySQL
1 acc. 2 acc. 3 acc. 4 acc. >4 acc.0
5
10
15
20
25
OpenOffice Mozilla Apache MySQL
Concurrent program testing can focus on small groups of accesses The testing target shrinks from exponential to polynomial
Non-deadlock bugs Deadlock bugs
7 (9%)1 (3%)
# of Bugs
Double lock
Bug manifestation: how many accesses?
101 out of 105 (96%) bugs involve at most two threads Most bugs can be reliably disclosed if we check all possible interleaving between each pair of threads
Few bugs cannot Example: Intensive resource competition among many threads causes unexpected delay
Bug manifestation: how many threads?
Bug patterns: underestimating #threads
Adding/changing locks 20 (27%) Condition check 19 (26%) Data-structure change 19 (26%) Code switch 10 (13%) Other 6 ( 8%)
No silver bullet for fixing concurrency bugs. Lock usage information is not enough to fix bugs.
ImplicationsImplications
20 (27%)
Bug fix strategies: non-deadlock bugs
Bug fixes: lock vs. condition variable
Bug fixes: two condition variables
Adding/changing locks 20 (27%) Condition check 19 (26%) Data-structure change 19 (26%) Code switch 10 (13%) Other 6 ( 8%)
No silver bullet for fixing concurrency bugs. Lock usage information is not enough to fix bugs.
ImplicationsImplications
20 (27%)
Bug fix strategies: non-deadlock bugs
Bug fixes: code switch
Give up resource acquisition 19 (61%) Change resource acquisition order 7 (23%) Split the resource to smaller ones 1 ( 3%) Others 4 (13%)
We need to pay attention to the correctness of ``fixed’’ deadlock bugs
Might introduce non-deadlock bugsMight introduce
non-deadlock bugs
Fix
Implications
Bug fix strategies: deadlock bugs
Impact of concurrency bugs ~ 70% leads to program crash or hang
Reproducing bugs are critical to diagnosis Many examples
Programmers lack diagnosis tools No automated diagnosis tools mentioned Most are diagnosed via code review Reproduce bugs are extremely hard and directly determines the diagnosing time
60% 1st-time patches still contain concurrency bugs (old or new) Usefulness and concerns of transactional memory
Summary
Seriousness of concurrency bugs