thomas ball principal researcher microsoft corporation sebastian burckhardt researcher microsoft...
TRANSCRIPT
Concurrency Analysis Platform And Tools For Finding Concurrency Bugs Thomas Ball
Principal ResearcherMicrosoft Corporation
Sebastian BurckhardtResearcherMicrosoft Corporation
Madan MusuvathiResearcherMicrosoft Corporation
Shaz QadeerSenior ResearcherMicrosoft Corporation
TL58
Concurrency Is HARD !
Rare thread interleavings can result in bugs
These bugs are hard to find, reproduce, and debug Heisenbugs: Observing the bug can “fix” it !
A huge productivity problem Developers and testers can spend
weeks chasing a single Heisenbug
Main Takeaways
You can find and reproduce Heisenbugs new automatic tool called CHESS for Win32 and .NET
CHESS used extensively inside Microsoft Parallel Computing Platform (PCP) Singularity Dryad/Cosmos
Releasing via DevLabs
Why Is Concurrency Hard? Madan Musuvathi
ResearcherMicrosoft Corporation
demo
Concurrency Analysis Platform (CAP)
Goal: Drive a program along an interleaving of choice Interleaving decided by user or by a program/tool
Today: Controlling/observing concurrency is difficult Manual and intrusive process
Enables lots of concurrency tools: Test a program along a set of interleavings Reproduce Heisenbugs Program understanding / debugging ...
Taming Concurrency
Madan MusuvathiResearcherMicrosoft Corporation
demo
CAP Architecture
CAP
MemoryModelbugs
Monitors
Coverage
Repro
TestingDataraces
Debugging Visualization
UnmanagedProgram
Windows
ManagedProgram
.NET CLR
• Record the interleaving executed• Drive the program along an interleaving
CAP Specifics
Ability to explore all interleavings Need to understand complex concurrency APIs
(Win32 and System.Threading) Threads, threadpools, locks, semaphores,
async I/O, APCs, timers, …
Does not introduce false behaviors Any interleaving produced by CAP is
possible on the real scheduler
Overview
Concurrency Analysis Platform (CAP)
CHESS : find/reproduce Heisenbugs Integration with Visual Studio Demo on CCR Heisenbug
Future CAP tools FeatherLite: Data-race detection Sober: Memory-model bugs
CHESS: Find And Reproduce Heisenbugs
Kernel: Threads, Scheduler, Synchronization Objects
While(not done) { TestScenario()}
TestScenario() { …}
ProgramCHESS CHESS runs the scenario in a loop
• Every run takes a different interleaving• Every run is repeatable
Win32/.NET
Uses the CAP scheduler • To control and direct interleavings
Detect• Assertion violations• Deadlocks• Dataraces• Livelocks
CAP
x = 1; … … … … … x = k;
CHESS challenge Programs have LOTS of interleavings
Thread 1 Thread n
x = 1; … … … … …x = k;
…
n threads
k steps each
Number of executions: nnk Exponential in both n and k
For n=2, k = 100 > # of atoms in the universe
Limits scalability to large programs
Goal: Scale CHESS to large programs (large k)
x = 1;if (p != 0) { x = p->f;}
x = 1;if (p != 0) {
Preemption Bounding
Focus on executions with small number of preemptions
Unexpected preemptions cause bugs
x = p->f;}
p = 0;
Thread 1 Thread 2
preemption
non-preemption
x = 1; … … … … … x = k;
Managing Astronomical Number Of Interleavings
Thread 1 Thread n
x = 1; … … … … …x = k;
…
n threads
k steps each
Number of interleavings: nnk Interleavings with c preemptions (n2k)c. nn
For n=2, k=100, c=2< 1 million interleavings
Analysis techniques reduce this further
CHESS In VSTS
Thomas BallPrincipal ResearcherMicrosoft Corporation
demo
Real Scenario: Heisenbug In CCR
demo
George Chrysanthakopoulus’ Challenge
CHESS Internal Customers
Parallel Computing Platform PLINQ: Parallel LINQ CDS: Concurrent Data Structures STM: Software Transactional Memory TPL: Task Parallel Library ConcRT: Concurrency RunTime CCR: Concurrency Coordination Runtime
Dryad/Cosmos
Singularity (Research OS from MSR) CHESS can systematically test the boot and shutdown process
CHESS athttp://msdn.microsoft.com/devlabs/
announcing
Overview
Concurrency Analysis Platform (CAP)
CHESS: Find/reproduce Heisenbugs Integration with Visual Studio Demo on CCR Heisenbug
Future CAP tools FeatherLite: Data-race detection Sober: Memory-model bugs
FeatherLiteLightweight data-race detection
Data-races: Access to data with insufficient synchronization
Data-races are a common source of concurrency errors
EnterCS(cs)if (p != 0) { x = p->f;}LeaveCS(cs)
p = 0;
Thread 1 Thread 2
Sampling To Reduce Overhead
Existing data-race detection tools have a large runtime overhead More than 10X slowdown Process every memory access
Intelligent sampling algorithms Process < 5% of the memory accesses
Less than 30% runtime overhead Existing tools > 1000% overhead
FeatherLite + CAP"Active" data-race detection
Programs have “benign” data-races Do not result in program crashes Example: Updating statistical
counters without holding a lock
FeatherLite drives the program along the two outcomes of the data-race
SoberTool for finding memory-model errors
Expert programmers use “lock-free” techniques Use low-level synchronizations, volatile variables, … For performance
Such programs are exposed to memory-model issues Compiler can reorder instructions Hardware can reorder/delay memory accesses
Result: Hard-to-find bugs that are hard-to-understand
Sober Quiz
Can both threads DoWork() at the same time?
flag1 =
true;
if( !flag2 )
DoWork();
flag2 =
true;
if( !flag1 )
DoWork();
Thread 1 Thread 2
// initial statevolatile bool flag1 = false;volatile bool flag2 = false;
Conclusion
Concurrency Analysis Platform (CAP) for controlling thread interleavings Enables lots of concurrency tools
CHESS – Find and reproduce Heisenbugs Don’t stress, use CHESS Look for download on http://msdn.microsoft.com/devlabs/
Future CAP tools FeatherLite: Lightweight data-race detection Sober: Tool for finding memory-model bugs
Evals & Recordings
Please fill
out your
evaluation for
this session at:
This session will be available as a recording at:
www.microsoftpdc.com
Please use the microphones provided
Q&A
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.