concurrency checking with chess: learning from experience tom ball, sebastian burckhardt, chris...

Concurrency Checking with CHESS: Learning from Experience

Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer

Outline

• What is CHESS?– a testing tool, plus– a test methodology (concurrency unit tests)– a platform for research and teaching

• Chess design decisions

• Learnings from CHESS user forum, champions

What is CHESS?

• CHESS is a user-mode scheduler

• Controls all scheduling nondeterminism– “Hijacks” scheduling control from the OS

• Guarantees:– Every run takes a different thread schedule– Reproduce the schedule for every run

Concurrency Unit Tests

“Generally, in our test environment, we want to test what we call scenarios. A scenario might be a specific feature or API usage. In my case I am trying to test the scenario of a user canceling a command execution on a different thread.”

Steve Hale, Microsoft

A Concurrency Unit Test Pattern:Fork-Join

void ForkJoinTest() { var t1 = new Thread(() => { S1 }); var t2 = new Thread(() => { S2 });

t1.Start(); t2.Start(); t1.Join(); t2.Join();

Debug.Assert(...);}

Concurrency Unit Tests

• Small scope hypothesis– For most bugs, there exists a short-running

scenario with only a few threads that can find it

• Unit tests provide– Better coverage of schedules– Easier debugging, regression, etc.

CHESS as Research/Teaching Platformhttp://research.microsoft.com/chess/

• Source code release – chesstool.codeplex.com

• Courseware with CHESS– Practical Parallel and Concurrent Programming– coming this fall!

• Preemption bounding [PLDI07]– speed search for bugs– simple counterexamples

• Fair stateless exploration [PLDI08]– scales to large programs

• Architecture [OSDI08]– Tasks and SyncVars– API wrappers

• Store buffer simulation [CAV08]• Preemption sealing [TACAS10]

– orthogonal to preemption bounding– where (not) to search for bugs

• Best-first search [PPoPP10] • Automatic linearizability

checking [PLDI10]• More features

– Data race detection– Partial order reduction– More monitors…

CHESS Design Decisions• Stateless state space exploration• No change to underlying scheduler• Ability to enumerate all/only feasible schedules• Schedule points = synchronization points and use

race detection to make up the difference• Serialize concurrent behavior• Suite of search/reduction strategies– preemption bounding, sealing– best-first search

• Monitor API to easily add new checking capability

Stateless model checking [Verisoft]Given a program with an acyclic state spaceSystematically enumerate all paths

Don’t capture program states Not necessary for terminationPrecisely capturing states is hard and expensive

At the cost of potentially revisiting statesPartial-order reduction alleviates redundant exploration

CHESS architecture

CHESSScheduler

UnmanagedProgram

WindowsWindows

ManagedProgramManagedProgram

CLRCLR

CHESSExploration

Engine

CHESSExploration

Engine

Win32 Wrappers

.NET Wrappers

• Capture scheduling nondeterminism• Drive the program along an interleaving of choice

Running Example

Lock (l);bal += x;Unlock(l);

Lock (l);t = bal;Unlock(l);

Lock (l);bal = t - y;Unlock(l);

Lock (l);t = bal;Unlock(l);

Lock (l);bal = t - y;Unlock(l);

Thread 1 Thread 2

Introduce Schedule() points

Schedule();Lock (l);bal += x;Schedule(); Unlock(l);

Schedule(); Lock (l);t = bal;Schedule(); Unlock(l);

Schedule(); Lock (l);bal = t - y;Schedule(); Unlock(l);

Thread 1 Thread 2

Instrument calls to the CHESS scheduler

Each call is a potential preemption point

First-cut solution: Random sleeps

Introduce random sleep at schedule points

Does not introduce new behaviorsSleep models a possible

preemption at each locationSleeping for a finite amount

guarantees starvation-freedom

Sleep(rand());Lock (l);bal += x;Sleep(rand());Unlock(l);

Sleep(rand());Lock (l);t = bal;Sleep(rand());Unlock(l);

Sleep(rand());Lock (l);bal = t - y;Sleep(rand());Unlock(l);

Sleep(rand());Lock (l);t = bal;Sleep(rand());Unlock(l);

Sleep(rand());Lock (l);bal = t - y;Sleep(rand());Unlock(l);

Thread 1 Thread 2

Improvement 1:Capture the “happens-before” graph

Thread 1 Thread 2

Delays that result in the same “happens-before” graph are equivalent

Avoid exploring equivalent interleavings

Sleep(5)

Improvement 2:Understand synchronization semantics

Thread 1 Thread 2 Avoid exploring delays that are impossible

Identify when threads can make progress

CHESS maintains a run queue and a wait queueMimics OS scheduler state

Schedule(); Unlock(l);

Schedule(); Lock (l);t = bal;

Emulate execution on a uniprocessor

Thread 1 Thread 2

Enable only one thread at a time

Linearizes a partial-order into a total-order

Controls the order of data-races

CHESS modes: speed vs coverageFast-mode

Introduce schedule points before synchronizations, volatile accesses, and interlocked operations

Finds many bugs in practice

Data-race modeRepeat

Find data racesIntroduce schedule points before racing memory accesses

Captures all sequentially consistent (SC) executions

Capture all sources of nondeterminism?No.Scheduling nondeterminism? Yes

Timing nondeterminism? YesControls when and in what order the timers fire

Nondeterministic system calls? MostlyCHESS uses precise abstractions for many system calls

Input nondeterminism? NoRely on users to provide inputs

Program inputs, files read, packets received,…Good tradeoff in the short term

But can’t find race-conditions on error handling code

CHESS architecture

CHESSScheduler

UnmanagedProgram

WindowsWindows

ManagedProgramManagedProgram

CLRCLR

CHESSExploration

Engine

CHESSExploration

Engine

Win32 Wrappers

.NET Wrappers

CHESS wrappersTranslate Win32/.NET synchronizations Into CHESS scheduler abstractions

Tasks : schedulable entitiesThreads, threadpool work items, async. callbacks, timer functions

SyncVars : resources used by tasksGenerate happens-before edges during execution

Executable specification for complex APIsMost time consuming and error-prone part of CHESS

Enables CHESS to handle multiple platforms

http://msdn.microsoft.com/en-us/devlabs/cc950526.aspxhttp://social.msdn.microsoft.com/Forums/en-US/chess/threads/

Learning from Experience:User forum, Champions

“CHESS Doesn’t Scale”Hmm… we just ran CHESS on the Singularity operating

system (and found bugs in the bootup/shutdown sequence)

What they usually mean:“CHESS isn’t very effective on a long-running test”“There are a lot of possible schedules!”

Time for enumerative model checking(Time to execute one test) x (# schedules)

Find lots of bugs with 2 preemptionsProgram Lines of code Bugs

Work Stealing Q 4K 4

CDS 6K 1

CCR 9K 3

ConcRT 16K 4

Dryad 18K 7

APE 19K 4

STM 20K 2

TPL 24K 9

PLINQ 24K 1

Singularity 175K 2

37 (total)

“CHESS Isn’t Push Button”

“The more I look at CHESS the more I realize that I could use some general guidance on how to author test code that will actually help CHESS reveal concurrency bugs.”

Daniel Stolt

Challenge -> Opportunity: New “Push button” concurrency tools

Cuzz [ASPLOS 2010]: Concurrency FuzzingAttach to any running executableFind concurrency bugs faster through smart fuzzing

Lineup [PLDI 2010]: Automatic Linearizability CheckingGenerate “thread-safety” tests for a class automaticallyUse sequential behavior as oracle for concurrent behaviorCHESS underneath

“CHESS Doesn’t Find This Bug”

RTFM is not helpfulInstead, generate helpful warning messages

“Warning: running CHESS without race detection can miss bugs”Or, turn race detection on for a few executions.

void ForkJoinTest() { int x = 0; var t1 = new Thread(() => { x=x+1; }); var t2 = new Thread(() => { x=x+1; });

t1.Start(); t2.Start(); t1.Join(); t2.Join();

Debug.Assert(x==2);}

“CHESS Can’t Avoid Finding Bugs”

“Solution is working and found two bug with CHESS . To get the second bug, I had to fix first bug first”

“That liveness bug is such a minor performance problem that I won’t fix it.”

Playing CHESS with George

Sealed Methods Asserts Timeouts Livelocks Deadlocks Leaks Pass

5 3 40 0 0 5

+TryDequeue 6 5 0 1 1 40

+WaitForTask 5 5 0 2 1 40

+Reg.Recv.+PostInternal 5 5 0 0 0 43

“CHESS is Confusing Me”

The Nondeterminism Saga: static data, lazily initialized

If replay of p.E fails, yielding p.F, then try again and see if p.F replays

Report lost coverage

Nondeterminism Junkie: Too much information

“Why does this test pass instead of say ‘Detected nondeterminism’ outside the control of CHESS"?

“Is this good behavior for CHESS to return three different results for the same code?”

“CHESS Time Isn’t Real Time”: It’s a feature, not a bug.

“The call to WaitOne(60000, false) immediately returns false, which isn’t correct. If I use WaitOne() or WaitOne(Timeout.Infinite, false) instead of WaitOne(60000, false), the WaitHandle waits till the Event is set, returns true and everything goes fine. But waiting without a timeout isn't an option in my case.”

The expected: “I can’t play CHESS on”

x64Multi-process programsMessage passing, distributed systemsThe Boost library.NET without the CLR ProfilerJavaUnix…

Learning from Experience:Forums, Champions

Chris Dern, Steve Hale, Ram Natarajan, Roy Tan

“Congratulations CHESS team!!!!! I have proven outside of CHESS that the issue it is finding in our product on the 106th thread schedule looks like a valid product bug!! I wrote a quick application to launch my CHESS test outside of CHESS and by freezing/thawing threads I was able to reproduce the issue independently. This is incredibly exciting!!! Many thanks for your patience, perseverance, and CHESS bug fixes as I’ve struggled to understand CHESS.”

Steve Hale, Microsoft , 2/12/2009

ConcurrentDictionaryConcurrentDictionary

ConcurrentBagConcurrentBag

SemaphoreSlimSemaphoreSlim ManualResetEventSlimManualResetEventSlim

BarrierBarrier

BlockingCollectionBlockingCollection

TaskTask

TaskSchedulerTaskScheduler

PLINQPLINQ Parallel.ForParallel.For

“As the true value of a test is in its ability to find bugs, let’s take a look at how our CHESS tests did. Over the development cycle to date, the CHESS test found seven bugs, and was used to reproduce another seven for a total of 14, out of the 276 high priority bugs over the same time. While only 14 bugs against 276 appear sadly anemic, it’s important to dig a bit deeper. If we address each of the issues raised, would we find more bugs?”

Chris Dern, PFX_CHESS_Review_Final.docx

“Early on the adoption of CHESS, we made a fatal mistake. Perhaps it was wishful thinking on our part, or perhaps we believed too much in the marketing hype and didn’t read the fine print. We believed early on that CHESS was a turnkey solution capable of using existing tests and test approaches and ‘finding the bugs’. “ C. Dern

“The schedule for any product group is always under attack. Over the life cycle of a product, features are in constant flux, with managers always balancing risk and reward. In the face of this pressure, any untried tool, methodology, or approach faces an uphill battle.” C. Dern

“For tool developers, it’s important that once you engage with a customer you help find then drive to some level of success. Finding a single bug is a priceless commodity when arguing to continue the time investment in a specific tool. Take small bites, set modest goals and drive to success. Perfect is the enemy of good, or at least good enough right now.” C. Dern

Dern’s DO’s and DON’Ts

DO NOT expect that CHESS will ‘magically’ find your bugs. CHESS is a tool, mainly focused at enumerating schedules for a given bound. While it can find specific types of concurrency bugs, e.g. deadlocks, for ‘free’ the value and benefit of CHESS comes with deliberate tests.

DO develop an understanding of what properties, invariants, and behaviors your test is testing

DO run your tests. While this may seem a silly tip, but it’s important to remember that CHESS enables the familiar write, run, refactor test experience for concurrent tests, which we enjoy with sequential tests today.

DO NOT add artificial spinning/busy work in the test. CHESS will explore all schedules for your specified bound. Adding busy work, like you may find in a ‘stress’ test to increase coverage, only increases the test runtime when under CHESS.

AVOID blindly converting an existing ‘stress’ style unit test into a CHESS test. The size, scale, and assertions that one tends to find in those types of tests make for a weak CHESS test at best, or a unusable CHESS test at worst.

Stepping Back from the Fray: High-level Learnings

Proper expectation settingGood methodologyGood default behaviorGood warnings and messagesMinimize cognitive dissonanceCultivate championsListen to them and learn!

Three CHESS Learnings

1. If you wantdeterministic schedulingwith ability to explore all

schedules without changing the

underlying schedulerThen its hard to achievehigh API coveragerobustness

Action: we need observable and controllable schedulers!

2. Concurrency unit testingcan be effective, but requires careful planning and

scoping

3. Search/reduction strategiesare absolutely essential

Uplifting Message andBlatant Advertisement for LineUp Talk

“Partnerships and CollaborationsThe success of the LineUp work is a perfect example of [the benefits of] an open dialog between the teams along with continual experimentation by both sides. Combining innovations from both research and product testing group, we create[d] a complete solution to one area of concurrency testing.” C. Dern

concurrency checking with chess: learning from experience tom ball, sebastian burckhardt, chris...

bal unlockl lock

schedule points schedule

graph schedule lock

t y schedule unlockl

t y unlockl lock

x unlockl lock

different thread schedule

t y unlockl thread

Documents

zing: exploiting program structure for model checking...

corral: a solver for...

chess : systematic testing of concurrent programs madan...

iterative context bounding for systematic testing of...

shaz presentation

chess: systematic concurrency testing tom ball, sebastian...

havoc: a precise and scalable verifier for systems software...

c. flanagantype systems for multithreaded software1 cormac...

debugging concurrent software by context-bounded analysis...

sanjit a. seshia - university of california,...

promising directions in hardware design verification shaz...

generalizing reduction and abstraction to simplify...

safe programming of asynchronous interaction: can we do it...

using smt solvers for program analysis shaz qadeer research...

back to the future: revisiting precise program verification...

verifying properties of well-founded linked lists verifying...

types for atomicity in multithreaded software shaz qadeer...

stephen freund -...

predicate abstraction for software verification cormac...

tsml report - shaz