shared secrets of wdf part i doron holan principal architect device & storage technologies peter...

46

Upload: walter-rice

Post on 13-Dec-2015

251 views

Category:

Documents


5 download

TRANSCRIPT

Shared Secrets of WDF Part I

Doron HolanPrincipal ArchitectDevice & Storage Technologies

Peter WielandPrincipal SDEDevice & Storage Technologies

ContributorsKumar Rajeev Senior SDEDevice & Storage Technologies

Praveen RaoSenior SDEDevice & Storage Technologies

Agenda

• Overview of WDF

• Design Philosophies Used in WDF

• Use of Formal State Machines

• Object Based Approach

• Development Best Practices Used in WDF

• Improving Verification

• Improving Diagnosability

• Improving Maintainability

Overview of WDF

• Goal is to build a next-generation driver model that:

• Makes it easy to handle complex components such as Plug and Play, power management, and cancellation of asynchronous requests in drivers

• Meets the needs of several device classes

• Works over a range of shipping operating systems and abstracts incompatibility among various OS versions

• Has flexibility to escape out of driver model

• Supports user-mode drivers for device classes that do not need to be in kernel mode—notably USB

• Has better diagnosability with built-in tracing and verification

KMDF UMDF

Lines of Code 110 K 120 K

Public Interfaces(Methods, Properties, Events)

500 218

Trace Messages 1600 1500

Internal Functions 4000 4000

Overview of WDF

Abstracted functionalities:

PnP, power management, I/O queues, DMA, I/O targets, USB targets, bus enumeration, WMI, timer, DPC, work item

Design Philosophy used in WDF

• Use of Formal State Machines

• Object Based Approach

Use of Formal State Machines

• Problem description• Decision to use state machine • Implementation overview• A state machine allowed us to write code that …• Take away

Problem Description

• Drivers have to handle inputs concurrently from many sources that are not synchronized with each other• Sources: PnP, system power state changes, power policy

changes, user mode I/O requests, hardware state changes, driver-initiated requests, etc.

• Many transitional states: “starting”, “powering down” (as compared to “started” or “off”). I like to call these gerund states.

• While driver is in a transitional state, another input will inevitably arrive that may force a transition to a new state• While powering down due to idle, an I/O request arrives that

requires power• Murphy’s law: these transitions will execute untested code that will

not work as designed nor be easily reproduced by testing

• Rules for handling the inputs are neither clear nor generic

Problem Description

• Every driver implemented its own solution• None were entirely correct• Informal state machine with many state tracking variables

• End result was a mess of fragile and unmaintainable code• This is the code that can easily hang or crash your machine

• Any solution had to meet the following requirements:• Correct, complete, and flexible• Compatible with existing stacks• Simple APIs expose complicated behaviors• No state-tracking required in client driver

Decision to Use State Machine

• We created an informal state diagram to capture the expected behavior of a device driver based on the team’s collective driver writing experience

• No one had done this before, and it quickly became an eye-opening exercise

• The diagram made it very clear that a formal state machine would be a huge improvement over the current procedural approach

Why Not Use a Procedural Approach?

• Scores of state variables need to be tracked• Relationship between state variables becomes too complex

and keeps growing as new rules come in• External contract is not clearly visible. How to figure out all

the rules?• Bug fixes get difficult—a maintenance nightmare

Implementation Overview

• We implemented 3 interlocking state machines that communicated with each other

• We could have used only one machine, but it would have been gigantic!• Each logical domain has its own state machine: PnP, power,

power policy• Event messages (unique integer values) are the only input

into each state machine• Abstracting the input as an event message allowed us to feed

inputs from many different sources into the machine with one processing routine

Resting state (note shape and color)

Atomic state(shape and color)

Send a message to the power policy state

machine

Error response message from power policy

(note color change)

Response message from power policy

Diagram Walk-through

Placeholder for State Machine UML Diagram

A Picture Is Worth 1000s of Lines of Code

• Getting a sense for the entire state machine is very hard when just looking at an array

• The UML diagram is just as important as the code itself• The state machine diagrams should be as descriptive as the

code that implements them• Use of color to show role and error states• Use of shapes and symbols to represent atomicity and

properties

A state machine allowed us to write code that …

• Is simple and self-documenting. The UML diagram is a more comprehensible model to understand

• Is easy to maintain. The impact of a change is well understood, usually localized to one function

• Has external contracts which are readily apparent • Has clear and consistent rules about how to transition from

one state to another• Is testable. We can easily account for code coverage of

every single state and state transition• Is easy to debug. The state machine history, along with the

state diagram and log, helps narrow down the root cause quickly

Take Away

• State machines can be a very powerful design pattern to create code that is readable, maintainable and testable• The code itself is not enough. Most developers need a visual

representation too

• Once you have a hammer, everything looks like a nail• It is easy to go overboard and think a state machine is the

right answer for every problem…• …but it can be the right answer in more cases than you would

have previously considered!• We have lots of state machines in WDF apart from PnP, Power,

and Power Policy: object lifetime, idle timer, self-managed I/O, I/O request queue, DMA

Object Based Approach

• WDF follows object-based approach for encapsulating data and interfaces

• There is a clear line between public and internal data for an object and this boundary is drawn at public API level for WDF• For a monolithic driver, this line can easily be drawn internally,

separating protocols, I/O management, buffer manipulation, etc. For example: Microsoft’s Bluetooth port driver

• Objects are presented to the device driver through public APIs using a handle-based system

Every Object Can Have a Context

• Once we created a handle system we took a step back and said • “How do people use this?”• “What are the programming patterns?“ • “What are developers doing on their own that we should do in

the framework and formalize?”

• This resulted in the decision to add context for every object…• Context is an arbitrary-sized extension allocated and destroyed

by the framework

• …and, to add destroy/cleanup callbacks on each object• Sub-allocations in the extension can be destroyed in destroy

callback , akin to a C++ destructor

Abstracting Allocation and Freeing of Objects

• A common problem is managing allocation and freeing of objects

• This was abstracted in WDF using a reference-counted model• Objects are reference-counted by framework• Runtime and debugger facilities are built in to track leaked

references

• A parent/child object lifetime model manages reference counts on objects

• Each framework object defines its lifetime as to when it is valid• Framework objects are built upon WDM objects that have their

own rules• Example: A WDFREQUEST can exist after

WdfRequestComplete is called. The internal WDM IRP is gone, and request object DDI’s then fail.

Take Away

• Asking questions like, “What are developers doing on their own that we should do in the framework and formalize?” helped us make design decisions

• By having parent-child relationships, we are able to simplify object lifetime management

• Some of the common problems such as memory rundown are eliminated with this approach. Drivers don’t have to worry about deleting a WDF object if they are parented correctly

• Come up with a solution, but be flexible so that it can be useful

• Allowing every object to have its own context memory simplifies code while allowing flexibility in using objects

Development Best Practices Used in WDF

Development Best Practices

• Improving Verification

• Client driver verification

• Build-in WDF Verifier, method input verification

• Internal framework code verification

• Compile-time and run-time verification

• Verification using code analysis tools (SAL, PFD)

• Improving Diagnosability• Tracing• Debugger support

• Improving Maintainability• Coding style• Utilizing language support• Team processes

Built-in Client Driver Verifier

• KMDF has client driver verification built-in as part of design, not as afterthought

• Verifier provides verification of client driver (catches driver mistakes as opposed to internal framework mistakes)

• Interface entry and exit verification

• Verification of transitions and conditions (compared to Driver Verifier, which provides mainly entry and exit verification)

• Object handle verification

• Pool and reference tracking

• Verifier has been of immense help in finding driver bugs

API Input Verification

• API input is validated at entry. Some of these are always on, while others are on when verifier is enabled

• Validate that all non-optional pointer parameters are non-NULL

• Validate structure parameters for correct size

• Validate IRQL (when verifier is enabled)

• If WDF_OBJECT_ATTRIBUTES is present: validate size, parenting rules, context information, execution level, and synchronization scope rules

• Validate method-specific requirements

Runtime Internal Verification

• ASSERT macro is useful for run-time validation. It is used extensively in WDF for validating assumptions

• Better to use ASSERTMSG instead of ASSERT. It allows specifying a message that is helpful during debugging

• One limitation is that ASSERT macros work only on checked build

• Having a trace message along with ASSERT is helpful so that at least a trace message is available for diagnosis on free builds (trace message is present in log dump)

• Lock order verifier prevents deadlocks due to incorrect lock order

• Pool tracking detects memory leaks of framework-allocated memory (not those that driver allocates using ExAllocatePool…)

Compile-time Internal Verification

• C_ASSERT macro is useful for compile-time validation

• WDF uses compile-time asserts extensively for validating structure size changes across versions

• Convert runtime asserts to compile-time asserts if possible

• Code is always compiled with W4 compiler option enabled

Verification Using Code Analysis Tools

• WDF uses SAL and PFD annotations extensively

• All public and private KMDF functions have SAL annotation (~500 external and ~4000 internal functions)

• Many of public and private functions have PFD annotations

• Several bugs were found as a result of these annotations

• Start early. It is a good idea to annotate from very beginning

• __checkReturn annotation very useful in catching code paths that do not check for return status after function call

Tracing

• Always-on tracing mechanism

• Extensive use of trace messages

• Every error path has a trace message (~1600 traces)

• Traces are informative enough that developers, QA, support, and customers do not need to look at source code for diagnostics

• Messages are outward-facing and comprehensible to outside users

• Messages are continuously added and improved based on feedback

• Crash dumps contain trace log, very helpful for debugging dumps

• WDF trace mechanism has been very useful in improving diagnosability and is being adopted by other feature teams internally in Microsoft

Debugging Support

• WDF provides extensive debugging support

• Debugger extension available for almost every WDF object

• Varying levels of information for many commands

• Once in debugger, it is easy to examine driver state using a series of commands

• For example: the WDF log and state machine histories can both be viewed in debugger

• Debugger output is very user friendly

• Provides hints both for using other commands in the output and for finding errors

Improving Maintainability

• WDF utilizes tools for auto-generating public headers and function jump tables

• Function jump tables are very helpful for separating public and private components with less maintenance headache

• Auto-generating headers makes it easier to control and separate out public and private versions of header files

• Auto-generating tools have been of immense help in improving maintainability

• Tools automatically generate build-time versioning checks for structure and enumeration changes

API Naming Convention

• API naming convention followed by WDF improves readability and maintainability

• Intuitive and descriptive names are easy for people to read and comprehend. For example, WdfRequestCancelSentRequest Vs. IoCancelIrp (in WDM)

• Names follow Object–Action pattern e.g. WdfDeviceCreate

• Names and parameters follow consistent patterns

• Length of name does not matter as long as name is intuitive and descriptive

• Failable vs. non-failable interfaces are differentiated in name

• Get/Set interfaces never fail, Assign/Retrieve interfaces can fail

Coding Style

• A coding style was put into place to ensure a consistent code base created by numerous developers. • We made compromises in the overall style to accommodate

individuals

• WDF has numerous rules:• Use descriptive variable and function names (not averse to

vowels)• No double negatives in program statements• Explicit logical checks e.g. if (DeviceExtension != NULL) {• Not averse to using goto keyword if it improves clarity• Beware of the difference in use of #if and #ifdef• Use of increment (++) and decrement (--) operators

discouraged

Coding Style

• Write code for next person who will have to maintain it

• Such simple rules as prefixing static method name with an underscore to differentiate from non-static methods helps in the long run

• Personal discipline is important in maintaining the coding style.

• Peer/team code reviews are also useful in flushing out style issues

• Consistent coding style improves maintainability because it is very easy to read each file and have consistent expectations

"Let us change our traditional attitude to the construction of programs. Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.“ Donald E Knuth, Literate Programming, 1984

Utilizing Language Features

• WDF uses C++• As a better C with V-table support• For better compile-time checks and type-safety

• WDF does not use many other C++ features such as templates, operator overloading (except new and delete), smart pointers etc.

• Use an enum instead of a #define• Provides strong typing• Values can be seen in debugger since enums are symbols and

as such, improve diagnosability

• Use FORCEINLINE instead of macros• WDF uses FORCEINLINEs even if they could be macros because

type safety without any side-effect was considered more important

• Macros are used only when some preprocessor magic is needed

Miscellaneous

• All public structure types are versioned with a ULONG Size as the first field

• Verification tools ensure that if a new member has been added to a structure, the sizeof(struct) will increase• Without the check a new field can be added under the right

circumstances and without increasing the size due to compiler padding in the struct

• Every public structure type has at least one INIT macro to initialize it• WDF_OBJECT_ATTRIBUTES_INIT initializes a

WDF_OBJECT_ATTRIBUTES structure

Development Team Processes

• Every developer maintains a unit test for his feature. This is different from tests owned by QA team

• Every fix is unit-tested by developer before check in. This helps maintain the code quality from the very beginning

• Every bug fix is reviewed by the team for check in

• Most of the code quality and logic issues are flushed out during code reviews

Take Away

• Identify problems early on and code based on that

• State machines and a context on every WDF handle

• Incorporate verification as part of design instead of as afterthought

• Use various mechanisms for compile-time and runtime validation along with code analysis tools. Annotate code early on!

• Consider improving diagnosability by adding WPP traces at least to every failure code path

• Analyze long term cost of maintenance, especially with versioning

• Follow team processes that help maintain code quality from very beginning, e.g. unit tests, code reviews

Additional Resources• Web Resources

• White papers: http://www.microsoft.com/whdc/driver/wdf/default.mspx

• Channel 9 talk:• http://channel9.msdn.com/ShowPost.aspx?PostID=226116

• Blogs• http://blogs.msdn.com/doronh/default.aspx (A Hole In My Head)• http://blogs.msdn.com/peterwie/default.aspx (Pointless Blathering)• http://blogs.msdn.com/iliast/default.aspx (driver writing != bus dri

ving)• Newsgroups and Lists

• Microsoft.public.device.development.drivers• OSR NTDev Mailing List

• Book: Developing Drivers with the Windows Driver Foundation • http://www.microsoft.com/MSPress/books/10512.aspx

WDF DDC 2008 Sessions

• Ask the Experts Table,• Panel Disccussion

Session Day / Time

Shared Secrets about Windows Driver Framework: Part 1 Wed. 8:30-9:30

Shared Secrets about Windows Driver Framework: Part 2 Mon. 4-5 andWed. 9:45-10:45

Getting a Logo for your Windows Driver Foundation Driver Mon. 4-5 andTues. 2:45-3:45

Using WinDBG to Debug Kernel-Mode Windows Driver Framework Drivers Mon. 2:45-3:45 andWed. 11-12

Using Kernel-Mode Driver Framework in Miniport Drivers Mon. 4-5

Packaging and Deploying KMDF and UMDF Drivers Tues. 4-5 andWed. 8:30-9:30

Exploring a KMDF Storage Driver: Parts 1 and 2 Tues. 9:45-12:00

What’s New in Windows Driver Framework Mon. 8:30-9:30 andWed. 9:45-10:45

Discussion: Windows Driver Framework Wed. 1:30-2:30

Ask the Experts Table Tues. evening

APPENDIX

State Machine Developer Design Rules

• Everything is asynchronous• The result of an operation is always modeled as an additional

input into the machine

• Use the state in the machine itself to track state (no state tracking variables)

• All changes start and end with the state diagram, with implementation in between• Entirely discipline based• I could not find a tool which could roundtrip from code to Visio

• Do not take shortcuts• We tried to and it always backfired in the end

State Machine Debug and Triage

• Ability to debug is paramount • Keep a history of N previous states• Record all important runtime decisions in a log• Extensive debugger extensions to dump history, logs, current

state• Be as paranoid as possible

• Mark each state with set of unhandled events it can ignore • Simplifies each state’s entry in the table and in the diagram• If an unhandled event arrives that is not in the list, break in

immediately• When adding an event to the can be ignored list, a comment is

required to describe why it can be ignored

State Machine Implementation

• Litmus tests for adding new state machine • Exponential growth in states when adding a new input or set of

functionality• Different implementations of the same abstract interface. We

needed 2 implementations for power policy, one that was the policy owner and was one that was not

• Sharing state between 2 existing machines. Instead of directly sharing state, use a new machine to arbitrate between them

Verification: When to break or bugcheck?

• Choice of breaking into debugger or causing bugcheck for an error found during verification

• Causing bugcheck for simple driver error is not right as it hampers development

• Cause bugcheck when error can compromise the system, otherwise break into debugger

Utilizing Language Features

• WDF uses RAII pattern at several places to simplify resource management

E.g. A function opens a reg key and associates it with the RAII class/struct. It doesn’t worry about closing the key handle. The key is closed as part of destructor when function goes out of scope

struct FxAutoRegKey {public:

HANDLE m_Key; FxAutoRegKey() { m_Key = NULL; } ~FxAutoRegKey() { if (m_Key != NULL) { FxRegKey::_Close(m_Key); } } };