concurrency checking with chess: learning from experience tom ball, sebastian burckhardt, chris...
Post on 27-Mar-2015
226 Views
Preview:
TRANSCRIPT
Concurrency Checking with CHESS: Learning from Experience
Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer
Outline
• What is CHESS?– a testing tool, plus– a test methodology (concurrency unit tests)– a platform for research and teaching
• Chess design decisions
• Learnings from CHESS user forum, champions
What is CHESS?
• CHESS is a user-mode scheduler
• Controls all scheduling nondeterminism– “Hijacks” scheduling control from the OS
• Guarantees:– Every run takes a different thread schedule– Reproduce the schedule for every run
Concurrency Unit Tests
“Generally, in our test environment, we want to test what we call scenarios. A scenario might be a specific feature or API usage. In my case I am trying to test the scenario of a user canceling a command execution on a different thread.”
Steve Hale, Microsoft
A Concurrency Unit Test Pattern:Fork-Join
void ForkJoinTest() { var t1 = new Thread(() => { S1 }); var t2 = new Thread(() => { S2 });
t1.Start(); t2.Start(); t1.Join(); t2.Join();
Debug.Assert(...);}
Concurrency Unit Tests
• Small scope hypothesis– For most bugs, there exists a short-running
scenario with only a few threads that can find it
• Unit tests provide– Better coverage of schedules– Easier debugging, regression, etc.
CHESS as Research/Teaching Platformhttp://research.microsoft.com/chess/
• Source code release – chesstool.codeplex.com
• Courseware with CHESS– Practical Parallel and Concurrent Programming– coming this fall!
• Preemption bounding [PLDI07]– speed search for bugs– simple counterexamples
• Fair stateless exploration [PLDI08]– scales to large programs
• Architecture [OSDI08]– Tasks and SyncVars– API wrappers
• Store buffer simulation [CAV08]• Preemption sealing [TACAS10]
– orthogonal to preemption bounding– where (not) to search for bugs
• Best-first search [PPoPP10] • Automatic linearizability
checking [PLDI10]• More features
– Data race detection– Partial order reduction– More monitors…
CHESS Design Decisions• Stateless state space exploration• No change to underlying scheduler• Ability to enumerate all/only feasible schedules• Schedule points = synchronization points and use
race detection to make up the difference• Serialize concurrent behavior• Suite of search/reduction strategies– preemption bounding, sealing– best-first search
• Monitor API to easily add new checking capability
Stateless model checking [Verisoft]Given a program with an acyclic state spaceSystematically enumerate all paths
Don’t capture program states Not necessary for terminationPrecisely capturing states is hard and expensive
At the cost of potentially revisiting statesPartial-order reduction alleviates redundant exploration
CHESS architecture
CHESSScheduler
CHESSScheduler
UnmanagedProgram
UnmanagedProgram
WindowsWindows
ManagedProgramManagedProgram
CLRCLR
CHESSExploration
Engine
CHESSExploration
Engine
Win32 Wrappers
.NET Wrappers
• Capture scheduling nondeterminism• Drive the program along an interleaving of choice
Running Example
Lock (l);bal += x;Unlock(l);
Lock (l);bal += x;Unlock(l);
Lock (l);t = bal;Unlock(l);
Lock (l);bal = t - y;Unlock(l);
Lock (l);t = bal;Unlock(l);
Lock (l);bal = t - y;Unlock(l);
Thread 1 Thread 2
Introduce Schedule() points
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Thread 1 Thread 2
Instrument calls to the CHESS scheduler
Each call is a potential preemption point
First-cut solution: Random sleeps
Introduce random sleep at schedule points
Does not introduce new behaviorsSleep models a possible
preemption at each locationSleeping for a finite amount
guarantees starvation-freedom
Sleep(rand());Lock (l);bal += x;Sleep(rand());Unlock(l);
Sleep(rand());Lock (l);bal += x;Sleep(rand());Unlock(l);
Sleep(rand());Lock (l);t = bal;Sleep(rand());Unlock(l);
Sleep(rand());Lock (l);bal = t - y;Sleep(rand());Unlock(l);
Sleep(rand());Lock (l);t = bal;Sleep(rand());Unlock(l);
Sleep(rand());Lock (l);bal = t - y;Sleep(rand());Unlock(l);
Thread 1 Thread 2
Improvement 1:Capture the “happens-before” graph
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Thread 1 Thread 2
Delays that result in the same “happens-before” graph are equivalent
Avoid exploring equivalent interleavings
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Sleep(5)
Sleep(5)
Improvement 2:Understand synchronization semantics
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Thread 1 Thread 2 Avoid exploring delays that are impossible
Identify when threads can make progress
CHESS maintains a run queue and a wait queueMimics OS scheduler state
Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;
Schedule(); Lock (l);t = bal;
Emulate execution on a uniprocessor
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Schedule();Lock (l);bal += x;Schedule(); Unlock(l);
Thread 1 Thread 2
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);
Enable only one thread at a time
Linearizes a partial-order into a total-order
Controls the order of data-races
CHESS modes: speed vs coverageFast-mode
Introduce schedule points before synchronizations, volatile accesses, and interlocked operations
Finds many bugs in practice
Data-race modeRepeat
Find data racesIntroduce schedule points before racing memory accesses
Captures all sequentially consistent (SC) executions
Capture all sources of nondeterminism?No.Scheduling nondeterminism? Yes
Timing nondeterminism? YesControls when and in what order the timers fire
Nondeterministic system calls? MostlyCHESS uses precise abstractions for many system calls
Input nondeterminism? NoRely on users to provide inputs
Program inputs, files read, packets received,…Good tradeoff in the short term
But can’t find race-conditions on error handling code
CHESS architecture
CHESSScheduler
CHESSScheduler
UnmanagedProgram
UnmanagedProgram
WindowsWindows
ManagedProgramManagedProgram
CLRCLR
CHESSExploration
Engine
CHESSExploration
Engine
Win32 Wrappers
.NET Wrappers
CHESS wrappersTranslate Win32/.NET synchronizations Into CHESS scheduler abstractions
Tasks : schedulable entitiesThreads, threadpool work items, async. callbacks, timer functions
SyncVars : resources used by tasksGenerate happens-before edges during execution
Executable specification for complex APIsMost time consuming and error-prone part of CHESS
Enables CHESS to handle multiple platforms
http://msdn.microsoft.com/en-us/devlabs/cc950526.aspxhttp://social.msdn.microsoft.com/Forums/en-US/chess/threads/
Learning from Experience:User forum, Champions
“CHESS Doesn’t Scale”Hmm… we just ran CHESS on the Singularity operating
system (and found bugs in the bootup/shutdown sequence)
What they usually mean:“CHESS isn’t very effective on a long-running test”“There are a lot of possible schedules!”
Time for enumerative model checking(Time to execute one test) x (# schedules)
Find lots of bugs with 2 preemptionsProgram Lines of code Bugs
Work Stealing Q 4K 4
CDS 6K 1
CCR 9K 3
ConcRT 16K 4
Dryad 18K 7
APE 19K 4
STM 20K 2
TPL 24K 9
PLINQ 24K 1
Singularity 175K 2
37 (total)
“CHESS Isn’t Push Button”
“The more I look at CHESS the more I realize that I could use some general guidance on how to author test code that will actually help CHESS reveal concurrency bugs.”
Daniel Stolt
Challenge -> Opportunity: New “Push button” concurrency tools
Cuzz [ASPLOS 2010]: Concurrency FuzzingAttach to any running executableFind concurrency bugs faster through smart fuzzing
Lineup [PLDI 2010]: Automatic Linearizability CheckingGenerate “thread-safety” tests for a class automaticallyUse sequential behavior as oracle for concurrent behaviorCHESS underneath
“CHESS Doesn’t Find This Bug”
RTFM is not helpfulInstead, generate helpful warning messages
“Warning: running CHESS without race detection can miss bugs”Or, turn race detection on for a few executions.
void ForkJoinTest() { int x = 0; var t1 = new Thread(() => { x=x+1; }); var t2 = new Thread(() => { x=x+1; });
t1.Start(); t2.Start(); t1.Join(); t2.Join();
Debug.Assert(x==2);}
“CHESS Can’t Avoid Finding Bugs”
“Solution is working and found two bug with CHESS . To get the second bug, I had to fix first bug first”
“That liveness bug is such a minor performance problem that I won’t fix it.”
Playing CHESS with George
Sealed Methods Asserts Timeouts Livelocks Deadlocks Leaks Pass
5 3 40 0 0 5
+TryDequeue 6 5 0 1 1 40
+WaitForTask 5 5 0 2 1 40
+Reg.Recv.+PostInternal 5 5 0 0 0 43
“CHESS is Confusing Me”
The Nondeterminism Saga: static data, lazily initialized
E F
If replay of p.E fails, yielding p.F, then try again and see if p.F replays
Report lost coverage
p
Nondeterminism Junkie: Too much information
“Why does this test pass instead of say ‘Detected nondeterminism’ outside the control of CHESS"?
“Is this good behavior for CHESS to return three different results for the same code?”
“CHESS Time Isn’t Real Time”: It’s a feature, not a bug.
“The call to WaitOne(60000, false) immediately returns false, which isn’t correct. If I use WaitOne() or WaitOne(Timeout.Infinite, false) instead of WaitOne(60000, false), the WaitHandle waits till the Event is set, returns true and everything goes fine. But waiting without a timeout isn't an option in my case.”
The expected: “I can’t play CHESS on”
x64Multi-process programsMessage passing, distributed systemsThe Boost library.NET without the CLR ProfilerJavaUnix…
Learning from Experience:Forums, Champions
Chris Dern, Steve Hale, Ram Natarajan, Roy Tan
“Congratulations CHESS team!!!!! I have proven outside of CHESS that the issue it is finding in our product on the 106th thread schedule looks like a valid product bug!! I wrote a quick application to launch my CHESS test outside of CHESS and by freezing/thawing threads I was able to reproduce the issue independently. This is incredibly exciting!!! Many thanks for your patience, perseverance, and CHESS bug fixes as I’ve struggled to understand CHESS.”
Steve Hale, Microsoft , 2/12/2009
ConcurrentDictionaryConcurrentDictionary
ConcurrentBagConcurrentBag
SemaphoreSlimSemaphoreSlim ManualResetEventSlimManualResetEventSlim
BarrierBarrier
BlockingCollectionBlockingCollection
TaskTask
TaskSchedulerTaskScheduler
PLINQPLINQ Parallel.ForParallel.For
“As the true value of a test is in its ability to find bugs, let’s take a look at how our CHESS tests did. Over the development cycle to date, the CHESS test found seven bugs, and was used to reproduce another seven for a total of 14, out of the 276 high priority bugs over the same time. While only 14 bugs against 276 appear sadly anemic, it’s important to dig a bit deeper. If we address each of the issues raised, would we find more bugs?”
Chris Dern, PFX_CHESS_Review_Final.docx
“Early on the adoption of CHESS, we made a fatal mistake. Perhaps it was wishful thinking on our part, or perhaps we believed too much in the marketing hype and didn’t read the fine print. We believed early on that CHESS was a turnkey solution capable of using existing tests and test approaches and ‘finding the bugs’. “ C. Dern
“The schedule for any product group is always under attack. Over the life cycle of a product, features are in constant flux, with managers always balancing risk and reward. In the face of this pressure, any untried tool, methodology, or approach faces an uphill battle.” C. Dern
“For tool developers, it’s important that once you engage with a customer you help find then drive to some level of success. Finding a single bug is a priceless commodity when arguing to continue the time investment in a specific tool. Take small bites, set modest goals and drive to success. Perfect is the enemy of good, or at least good enough right now.” C. Dern
Dern’s DO’s and DON’Ts
DO NOT expect that CHESS will ‘magically’ find your bugs. CHESS is a tool, mainly focused at enumerating schedules for a given bound. While it can find specific types of concurrency bugs, e.g. deadlocks, for ‘free’ the value and benefit of CHESS comes with deliberate tests.
DO develop an understanding of what properties, invariants, and behaviors your test is testing
DO run your tests. While this may seem a silly tip, but it’s important to remember that CHESS enables the familiar write, run, refactor test experience for concurrent tests, which we enjoy with sequential tests today.
DO NOT add artificial spinning/busy work in the test. CHESS will explore all schedules for your specified bound. Adding busy work, like you may find in a ‘stress’ test to increase coverage, only increases the test runtime when under CHESS.
AVOID blindly converting an existing ‘stress’ style unit test into a CHESS test. The size, scale, and assertions that one tends to find in those types of tests make for a weak CHESS test at best, or a unusable CHESS test at worst.
Stepping Back from the Fray: High-level Learnings
Proper expectation settingGood methodologyGood default behaviorGood warnings and messagesMinimize cognitive dissonanceCultivate championsListen to them and learn!
Three CHESS Learnings
1. If you wantdeterministic schedulingwith ability to explore all
schedules without changing the
underlying schedulerThen its hard to achievehigh API coveragerobustness
Action: we need observable and controllable schedulers!
2. Concurrency unit testingcan be effective, but requires careful planning and
scoping
3. Search/reduction strategiesare absolutely essential
Uplifting Message andBlatant Advertisement for LineUp Talk
“Partnerships and CollaborationsThe success of the LineUp work is a perfect example of [the benefits of] an open dialog between the teams along with continual experimentation by both sides. Combining innovations from both research and product testing group, we create[d] a complete solution to one area of concurrency testing.” C. Dern
top related