towards the property-based testing of an l4 microkernel apiceur-ws.org/vol-1431/paper5.pdf ·...

Towards the Property-Based Testing of an L4Microkernel API

Cosmin DragomirFaculty of Automatic Control and Computers

University POLITEHNICA of BucharestSplaiul Independentei nr. 313Sector 6, Bucuresti, [email protected]

Lucian MogosanuFaculty of Automatic Control and Computers

University POLITEHNICA of BucharestSplaiul Independentei nr. 313Sector 6, Bucuresti, [email protected]

Mihai CarabasFaculty of Automatic Control and Computers

University POLITEHNICA of BucharestSplaiul Independentei nr. 313Sector 6, Bucuresti, Romania

[email protected]

Razvan DeaconescuFaculty of Automatic Control and Computers


[email protected]

Nicolae TapusFaculty of Automatic Control and Computers


[email protected]

Software testing has been a significant part of the software development process for the last 30 years and isgaining even more importance with the increasing complexity of software products. As each application hasits own requirements, multiple software testing methodologies exist. It is the decision of the developers tochoose the best suited types of testing methodologies for their product. This paper presents the design andimplementation of a property-based testing framework. Unlike traditional testing methods this methodologyuses the formal specification of the API to automatically generate the input and validate the output. Theframework will be used to test the API of an L4 microkernel (called VMXL4); VMXL4 possesses the constraintsof an embedded environment and of an ongoing development of a stateful system.

Property-Based Testing, L4 Microkernel, API

1. INTRODUCTION

The software industry has been constantly growingin the last decades and the liability and robustness ofthe software products must match their requirementsin order to remain competitive. To obtain a stableproduct, the entire software stack must be reliable.Therefore software testing must be done at eachlayer of the software stack, starting with the lowestlevel: the operating system.

Multiple software testing methodologies are in usenowadays, each of them targeting a degree oftest cases coverage and test writing complexity.Alongside the well known unit testing method,another functional methodology named property-based testing has gained ground among softwaredevelopers. It uses the concept of “tests as

specification”, in which tests are written to covermost of the specification.

Writing a large number of tests for the samespecification implies a sizable effort from thedevelopers. Property-based testing mitigates thisby automaticallly generating the input and creatinggeneral and abstract tests known as properties.Those can be similar to unit tests, except for the wayinput is generated and output is validated.

This paper presents an user space frameworknamed QC that is based on an open source basic im-plementation of a property-based testing frameworkimplemented by Pennebaker (2012). Although thewell-known related released frameworks are writtenin functional programming languages, QC is writtenin C due to the VMXL4 native environment support.

©

• • • •

Porting a new language environment would havemeant a sizable and unnecessary effort.

The QC framework solves issues commonly encoun-tered in unit testing by using properties and testingthose properties on a large number of randomlygenerated inputs and by maintaining the internalstate of the system. As a downside, QC introducesthe problem of formalizing the specification.

The paper is organised as follows: Section 2 isintroducing the concepts necessary to understandthe paper. Section 3 briefly presents the existingsimilar frameworks. Section 4 discusses the designand implementation of QC and Section 5 itsevaluation in comparison with the existing testingmethods for the API of an L4 microkernel, VMXL4.Lastly, in Section 6 we present the overview of thepaper and future work.

2. BACKGROUND

This section presents the basic concepts requiredto understand the next sections of the paper. Itstarts with the overview and importance of softwaretesting, followed by the essential concepts of unittesting and property-based testing. We show thepower of property-based testing in contrast with unittesting. We also do a short introduction of the testingenvironment, describing the VMXL4 microkernel.

2.1. Software Testing

Nowadays the number of different programming lan-guages, hardware platforms and software librariesis increasing. Requirements, for both specificationsand performance, are also more rigorous as timepasses by. As a consequence, software applicationsare becoming more complex and bugs are intro-duced in new software at all levels. As a countermea-sure software testing has gained more importanceand attention from programmers.

It is important that applications and services can bestateful or stateless. In a stateful system, an internalstate is being held and some of the actions have sideeffects which might change the state. The design ofa stateful software system can be modeled usinga finite-state machine and a formal specification ofthe inputs for every possible state. The summarizingdifference between stateful and stateless systemsis that for the first one the output depends on theinput and possibly on the internal state, whereas forthe second one the output always depends only onthe input. A well known example of stateful versusstateless are the TCP and UDP transport layerprotocols from the TCP/IP stack, where TCP createsand maintains a connection between the client andthe server and UDP does not.

At a higher level, software testing can be definedas the process of executing a part or the entireapplication in order to find errors or to evaluate thequality of the user experience. Any moderately sizedapplication has flaws, but finding them is a complexactivity. It is usually unfeasible to do an exhaustivetesting on stateful systems, since the number ofdifferent possibilities for the input values associatedwith the existing internal system state is too large.Even for some stateless applications this testingwould imply a sizable effort. We conclude that it ismore resource and time demanding to test a statefulsystem instead of a stateless system.

There are multiple methodologies in software testingwhich must be used in various steps of thedevelopment process. In this paper we insist onlyon those that can be split in two major categories:functional and nonfunctional testing.

Functional testing verifies the client or designspecifications by testing the system functionality:checking if the program operations and featuresbehave as they should. In summary it is used toensure that the application does not have bugs.There are two categories of functional testing:positive and negative. Positive testing is done usingvalid inputs and comparing the actual output with theexpected output, whereas negative testing is doneby supplying the system functionalities with invalidor unexpected inputs and operations. In the case ofnegative functional testing, usually the system mustnot behave nondeterministically and rather informthe user of the input error.

On the other side, nonfunctional testing is concernedwith the user experience, including tests forperformance, security, availability, usability. Usingthis type of testing, one can measure and comparethe results in different situations and cover the blanksleft by functional testing. For a competitive softwareproduct, developers must test the program usingboth functional and nonfunctional methodologies.

2.2. Unit Testing

One of the most used and successful softwaretesting methodologies is unit testing (Binder (2000);Hunt et al. (2004); Osherove (2010)). It is centeredon the concept of unit of work, meaning a single,invokable, logical and functional use case of thesystem.

Unit testing is composed of a suite of tests thatcan be run anytime during the development cycle totest certain functions, logic and capabilities of thecode. Each test uses a predefined set of inputs,runs a functionality of the system and comparesthe output with the desired output. If the outputs

• • • •

differ, the tested functionality has at least one bug.Although unit tests usually verify only one smallfeature, sometimes it is not easy to find the bug,due to the black-box nature of the methodology. Thismeans that unit testing does not use the internalstructure of the functionality, but only the higher levelinvoked part, and therefore the bug may be in theinternal logic and further debugging must be done.One big advantage of unit testing is its reusablenature. Even if the internal logic changes, usually therequirements of the function remains the same andthe old test can still be used.

Although unit testing is very useful, it has a veryimportant flaw, which may leave hard detectablebugs in the system. Using unit tests a programmeronly verifies a small finite set of inputs and forevery different input set he must write another test.Therefore, using unit test programmers cannot evenget close to an exhaustive testing. In addition tonot finding possible subtle bugs, the work of theprogrammer is hardened by thinking of all the cornercases and writing more code for them. In the end,these drawbacks are less important than the benefitsof unit testing: because it gives good results inpractice, it is very used and every major languagehas frameworks for this testing methodology.

2.3. Property-Based Testing

A software testing methodology which addressesthe problems left by unit testing is property-basedtesting (Fink and Bishop (1997); Fernandez et al.(2004); Machado et al. (2007)). Its main advantageis that it covers substantially more test cases thanunit testing; moreover, it can do exhaustive testingthough the time required to do this is not directlyproportional to the benefits. This is done usingonly one generic test, named property, and somefunctions to generate the inputs, named generators.

A property is the replacement of a unit testand is run multiple times with different automatedgenerated inputs. Every running of the propertygenerates a different test. The input of the propertydepends on the type and domain specified bythe programmer; because the input is generic,the validation conditions of the test must be alsogeneric, following a formal specification. The name“property” comes from the type of test validation,where each test result must pass a general formalproperty. In layman’s terms a property validationcondition specifies in a generic way how the testedfunctionality should behave. A drawback of property-based testing is that the formal specification does notusually exist and the programmer must infer it fromthe business and logic specifications.

A generator is a callback that does random datageneration at each running of the property. Becausecomplicated non-basic data types may be needed, aproperty-based testing tool must allow user definedgenerators. In order to achieve a good test coverage,the generated data must be uniformly distributedacross its domain.

To show the differences between unit testing andproperty-based testing, let’s assume one would wantto test a function named getMax, a function whichreturns the maximum of two integer values. In aunit test he would hard-code two values and testif the output equals to the maximum value. If hewants to test multiple cases, possibly corner cases,then multiple tests need to be written. A pseudocodeimplementation of the unit test is shown in Listing 1.

max = getMax (2 , 5)asser t (max == 5)

Listing 1: Pseudocode for getMax unit testing

Using property-based testing, a pseudocode imple-mentation would be the one from Listing 2.

a = generator ( )b = generator ( )max = getMax ( a , b )asser t ( ( max == a | | max == b ) &&

(max >= a && max >= b ) )

Listing 2: Pseudocode for getMax property-basedtesting

As shown above, the property is generic and morepowerful than the unit test. However, the validationcondition is bigger and must be correctly determinedby the test writer, otherwise the property may givefalse positives or, even worse, false acceptances.

The quality of the automatic testing tool may beimproved by reducing the number of failing testcase inputs in order to obtain the minimum set ofinputs determining a given failure, a method knownas shrinking. This has the advantage of improvingthe debugging process by providing the programmerwith minimal necessary information for debugging;another advantage is that the overhead of thismethod is not significant.

All in all, property-based testing has the benefits ofunit testing and some advantages over it: bigger testcoverage, improved specification completeness andit is easier to maintain because of the reduced codesize, as illustrated by Nilsson (2014).

2.4. VMXL4

VMXL4 is a general purpose, high performanceL4 microkernel (Liedtke (1995)), developed in

Figure 1: VMXL4 testing infrastructure

partnership with VirtualMetrix, Inc1. It providesmechanisms for performance management and aminimal layer of hardware abstraction on whichvirtualized operating systems personalities can bebuilt. Using the VMXL4 API, trust and securitymodels can be implemented. Examples of systemsbuilt using VMXL4 are given in Carabas et al. (2014),Manea et al. (2015) and Mogosanu et al. (2015).

An L4 microkernel was chosen due to the fact thatthe L4 API’s formalization was proven to be feasibleby Kolanski and Klein (2006). The seL4 microkernelis the first operating system kernel to be fully formallyspecified and verified, as shown by Elkaduwe et al.(2008) and Klein et al. (2009). Furthermore, otherimplementations have been proposed for formallyverifiable L4 microkernels (Kauer and Volp (2005)).The property-based testing approach proposed bythis paper is in some respects similar to previouswork, as it also relies on a formal specification.The most important difference between the twoapproaches is that property-based testing is moreefficient in terms of development resources, asopposed a full mathematical refinement proof, whichmay require a significant number of man-months tobe implemented.

Figure 1 shows the architecture of the currentVMXL4 testing infrastructure. The L4 microkernelruns in the privileged CPU mode commonly knownas kernel space, while the tests run as userspace applications. The testing infrastructure isimplemented using support libraries, but the testthemselves call the L4 API directly in order tovalidate it functionally.

Currently most API tests are following unit testingprinciples, so test coverage is not nearly as exten-sive. However, microkernels are stateful systems,some of their core mechanisms being strongly cou-pled. As a result, the current testing framework doesnot employ true unit tests and only partially validatesthe interaction between components.1http://www.virtualmetrix.com/

We propose that the QC testing frameworkpresented in Section 4 use the same testing design,with the addendum that additional support librariesmay be needed, e.g. in order to generate randomnumbers. This converges with our goal to providea POSIX compliant native environment based onVMXL4.

3. FRAMEWORKS FOR PROPERTY-BASEDTESTING

The idea of a property-based testing framework isnot new. Previous frameworks have been developed,but the most successful are for functional languages,due to some of their distinctive features: higher orderprogramming, which is very useful for propertiesand data generators, lack of side effects, time ofdevelopment. Moreover, functional programming fitsbetter for random testing than imperative program-ming because it uses fine-grained properties. Thissection presents an overview of three of the mostinfluential existing frameworks and of some opensource projects.

Haskell QuickCheckHaskell QuickCheck2 is the first well knownframework for property-based testing and futureframeworks were inspired by it. QuickCheck is atool which automates testing for Haskell programs.As shown in Claessen and Hughes (2002, 2011), itdoes this by defining a formal specification language,which is powerful enough to represent commonforms of specifications: algebraic, model-based andpreconditional or postconditional. QuickCheck usescombinators to define properties and test datagenerators and obtain the test generated datadistribution. An important feature of the frameworkis the shrinking of the generated data when a testfails, to give the minimum input which still fails theproperty.

Erlang QuickCheckThe programmers who developed HaskellQuickCheck saw the bigger commercial opportunityoffered by Erlang and developed a new versionof the framework3, which has its specifications inErlang. Linking specification in Erlang to code undertest in other languages is easier than in Haskell.Two very important distinctive features of the Erlangversion are the ability to test stateful systemsby using state machine testing and the ability togenerate and run parallel test cases in order to findrace conditions (Arts and Hughes (2003)).2https://hackage.haskell.org/package/QuickCheck3http://www.quviq.com/products/erlang-quickcheck/

• • • •

ScalaCheckScalaCheck4 is the third main framework used forproperty-based testing and is used for automatedrandomized property-based testing of programsdeveloped in Scala or Java (Odersky (2010)). Itwas inspired by Haskell QuickCheck and implementsmost of its features, but also some in addition to itspredecessor, such as stateful testing. Nilsson (2014)provides a comprehensive guide to ScalaCheck.

Open Source InitiativesDue to the success of Haskell QuickCheck, opensource implementations in most major programminglanguages were started, such as C, C++, Java,Python, but with less features and success. QC wasinspired by one of those open source initiatives,employed by Pennebaker (2012).

4. QC DESIGN AND IMPLEMENTATION

This section presents the design and implementationof the QC framework and the motivation behindit. QC intends to test the L4 microkernel API ina functional manner, following the property-basedtesting methodology. It may be used alongside unittests, because it tries to generalize them, but on along term it may strive to replace unit testing for theVMXL4 microkernel API.

4.1. Implementation Starting Point

The implementation is based on the open sourceproject developed by Pennebaker (2012). It is a basicframework, supporting only two features: randomdata generation and one property per test, whichwas run for a predefined number of times. A partof the implementation is not really usable, becausethe programmer who uses the framework must knowthe size of the generated types and create testsaccordingly, which is error-prone. The only partwhich we partially used is the test data generationcomponent.

The framework is implemented in the C program-ming language because it was the most convenientoption taking into consideration the testing environ-ment. Porting a new language environment can bevery complicated, because the native environmentoffered by a microkernel is very low-level. Moreover,implementing a POSIX environment is equivalentwith the implementation of an entire operating sys-tem. Also, because the API had already been writtenin C, there is no need for further linking betweendifferent languages.4https://www.scalacheck.org/

4.2. General Design

Because a kernel is a stateful system and QC isdeveloped to test the L4 microkernel API, beingevaluated on VMXL4, two more concepts used byQC have to be introduced. Preconditions are a set ofpredicates that must be true prior to the execution ofa property and postconditions are a set of predicatesthat must be true after the execution of an action inthe property. If all the preconditions of a property aretrue, then the property is applicable, otherwise it isnot. If all the postconditions of a property are true,than the property has passed.

Due to the fact that QC is designed for a statefulsystem, it uses tests containing multiple propertiesthat are used as actions with side effects inthe stateful system. Therefore QC borrows someelements from integration testing, a methodologyin which individual software modules are testedtogether. Each test consists of at least one property,randomly generated from the available properties.Each property has a finite number of argumentswith known data types at compile time, a factthat provides the opportunity to use property-basedtesting. When at least one of the postconditionsof a property has failed, then the test fails, theentire generic test completes and the actions takenduring the test are printed alongside their input. Thesecond situation in which a test fails is when its statebecomes inconsistent, meaning that no property hasall of its preconditions passing and therefore nofuture action can be taken. When a test fails, theprogrammer sees all the randomly generated dataused by the test and this facilitates easier debugging.

Properties are divided in two categories: normalproperties and cleanup properties. Normal prop-erties are placeholders for actions that the sys-tem may take anytime, provided the preconditionsare satisfied. Cleanup properties are used to endtests and to free allocated resources. Every testmust have at least one cleanup property. If a testdoes not have allocated resources, QC provides theempty clean property macro for an empty property,whose only purpose is to end the test. Althoughmost of the time only one cleanup property willbe used for a generic test, having multiple cleanupproperties may be useful in some situations. Onecan use fine-grained cleanup properties if the systemcan have many internal states. This makes the codecleaner and in a system with many generic tests, theprobability to reuse cleanup properties is bigger ifthey are fine-grained.

A higher level design of QC is shown in Figure 2. Theprogrammer must call QC with an array of normalproperties and an array of cleanup properties, aspreviously discussed. To generate random input and

Figure 2: QC design

randomly pick properties, QC uses a seed. When atest fails, the programmer will want to reproduce theexact same sequence of properties and inputs to testthe fix for the bug. Because the random generation isdeterministic given the seed, QC shows it for everygeneric test so the programmer can use that seedif he wants to reproduce the tests. Otherwise, QCgenerates a random seed to assure random testsand a good test coverage.

QC will generate a fixed number of tests, previouslygiven by the user. Another available option is theverbosity level for generic tests. The user can seethe sequence used by every test or only by failingtests. Viewing the sequence used by every test canbe useful to improve the generic test and its testcoverage.

There may be cases when tests will end prematurely,after using only a few properties, because QC maychoose and use a cleanup property whenever itspreconditions are satisfied. To mitigate this, the userchooses the minimum number of properties thatmust be used during a test.

The last parameter from Figure 2 represents the typeof statistics shown at the end of the generic test. QCsupports two types of statistics for every generic test:user defined and automatically generated. The usercan choose to see both categories, only one or none.

4.3. Generators

As previously mentioned, generators are thecallbacks that randomly produce the input data.

A QC generator consists of 2 callbacks: one togenerate the data and the other to print thedata, as the C printf function needs differentformats depending on the printed type. Because ofthat, the QC equivalent of a generator is struct

generator printer.

QC offers a set of predefined generators for Cbasic data types, such as int and char, and alsofor bool, string (stored as a char dynamic array)and generic arrays. Moreover, a user can write hisown generators or printers and use them for hisproperties.

In order to generate arrays, only the basic typegenerator is needed, because QC offers a wrapperwhich automatically generates new array types. Thearray type can have fixed or random size in a givenrange, depending on the user needs. All generatedarrays are dynamically allocated and their memoryis freed after their associated property ends. Thisavoids out of memory errors for big tests with manygenerated arrays, but can introduce subtle bugs if theuser forgets to copy the content of the array in casehe needs it after the property ends.

Sometimes basic types may need additional featuressuch as a maximum or minimum value. QC offers twosolutions for this. The first one is that miscellaneousparameters can be added to the generators, inorder to modify the generated value to match therequirements. The second solution is to change andupdate the generated values from the property code.Both solutions are acceptable for code readability,but in general cases the first option should be used,because it’s reusable and only the parameters will bechanged.

All generated data for a property is stored in adynamically allocated array with the size in bytesequal to the maximum number of arguments of aproperty multiplied by the maximum size in bytes ofan argument type. This approach solves the problemwith the variable number of property arguments andtheir different types. The value of the generated datais obtained in the property and the user must onlyknow the data type and the index of the argument,something that he had already defined in the statemachine test specification.

4.4. Statistics

In order to measure various metrics, QC offersthe possibility to attach user defined statistics to aproperty. After a property ends successfully, each ofits statistics callbacks is called and the metrics areupdated. This can help the user to investigate bugsand also measure the test coverage. An example ofthis logging category is shown in Listing 7.

• • • •

By default, QC logs statistics regarding propertiesand their sequence. For every property, QC logs thenumber of total calls, the number of starting testproperty calls and for every property how many timesit followed the current property. An example of thistype of logging is shown in Listing 8. The defaultlogging done by QC can be very useful to detectpreconditions bugs and see if the tests are surfacingthe desired states.

The user decides if he wants to see any of the twostatistics category when the QC framework is calledto generate and validate tests.

4.5. Properties, Preconditions andPostconditions

Since QC supports stateful system testing, usinga property requires the following steps: testingpreconditions, getting the values for the randomlygenerated data, applying actions and testingpostconditions.

The preconditions are optional but if they aremissing, the property can be always chosen by theframework as the next part of the test. Preconditionsare implemented as a callback, differently fromthe property callback. Because the preconditionsdepend only on the internal state, not on thegenerated data, it is better to obtain the newdata only if the property can be applied, avoidinggeneration of useless data, which will be replacedafterwards. Therefore, preconditions must be testedbefore the data generation step and this can beachieved by having another callback, associated witha property. This approach has another benefit: somepreconditions are used by multiple properties andhaving them as functions gives better reusability. Ifa property does not have any preconditions, theircallback will be NULL. To address any possible usage,preconditions can be used from inside the propertytoo, but this is not a good practice, as explainedabove.

Actions are the main content of properties, becausethey change the state of the system and their sideeffects are verified by the postconditions. Actionscan be interleaved with their postconditions, whichare obtained from the formal specification of theAPI. As opposed to preconditions, postconditionsare located in their corresponding property callback,because they depend on the generated data and wedo not obtain a performance improvement if we havethem in separate callbacks. Moreover, if the propertycontains multiple actions, then it is recommended tocheck the postconditions for an action or a group ofactions as soon as possible, in order to have goodcode readability.

Figure 3: Property info callbacks

As can be seen in Figure 3, QC properties arecomposed of multiple callbacks, stored in a structurenamed property info. We need an array of generatorsfor the property input and another array of userdefined statistics to gather various data. On theother side, we need a callback for preconditionsand a callback for the property itself, to make theAPI calls and test the postconditions. Having allof those callbacks, different components of generictests become easier to integrate with each other.

struct p r o p e r t y i n f o {/∗ ca l l back f o r the proper ty ∗ /prop f u n c t i o n ;

/∗ d i s p l a y i n g name ∗ /char const ∗ const name ;

/∗ ar ray o f generators ∗ /struct g e n e r a t o r p r i n t e r ∗gp ;

/∗ number o f generators ∗ /i n t gp s ize ;

/∗ precond i t i ons ca l l back ∗ /p recond i t i on prec ;

/∗ ar ray o f s t a t i s t i c s ca l l backs ∗ /struct u s e r s t a t i s t i c ∗ s t a t s ;

/∗ number o f s t a t i s t i c s ca l l backs ∗ /i n t s t a t s s i z e ;

} ;Listing 3: Struct property info

In QC’s implementation, property descriptions andcallbacks are contained by the property info

structure, as shown in Listing 3. In addition to whatwas previously discussed, the name field assigns a

• • • •

descriptive name to the property and is used for theverbosity option QC INFO or for failing tests.

4.6. QC test logic

Having all the previously discussed elements, QCcan generate and run tests. Figure 4 shows a statemachine with the actions taken by QC during a test.Until a test fails or the required number of tests havebeen run, the framework tries to falsify the generictest by finding a failing test.

Starting a test, QC chooses a property, checksits preconditions (should they exist) and, based onthe result, picks another property or continues withthe current one. If the preconditions have passed,QC generates test data, executes the actions andthe postconditions are checked. If any of thepostconditions fails, the testing is over, because QCfalsified a sequence of properties. Otherwise, thestatistics are updated and if the property is fromthe cleanup group, the current test completes andanother test starts; if the property is from the normalgroup, the test continues and another property ischosen.

4.7. Pseudorandom number generator

QC has a random module which currently supportstwo implementations: the POSIX rand functionand the Mersenne Twister PRNG. The defaultimplementation is Mersenne Twister (presentedin Matsumoto and Nishimura (1998)), because itprovides better data distribution than rand andalways has the same output for a given seed, on32 bits, in contrast with rand, whose result may varydepending on the architecture.

4.8. VMXL4 Influence over QC

Internally QC uses a seed for randomizing thetest data generation and the chosen property atevery step of a test. Because the VMXL4 nativeenvironment is under ongoing development andsome POSIX functions (e.g. rand) are not yetimplemented, inside the testing environment theseed is actually a numerical value obtained froma hardware timer provided by the developmentplatform. However, the framework does not dependon a specific platform and is portable, requiring onlyPOSIX functionality.

The VMXL4 API is currently being tested usingthe Check Unit Testing Framework for C. In orderto be as easy as possible to use and becauseunit tests usually need little changes to becomeproperties, the QC interface has been designed tohave some similarities with the Check framework.For that reason, postconditions can be tested using

prop fail if and prop fail unless, wrapperssimilar to Check’s fail if and fail unless.

5. QC EVALUATION AND TESTING

This section describes evaluation, results andimplications, while the framework is still underdevelopment. To evaluate the performance of the QCframework, its impact on the test coverage and codesize will be detailed.

/∗ normal p r o p e r t i e s ∗ /struct p r o p e r t y i n f o p [ ] = {

{ i n i t p r o p e r t y , ” i n i t ” ,( gp ar ray ){ q c u i n t } , 1 ,q i s n o t i n i t i a l i z e d ,( s t a t s a r r a y ){ queue s i ze s ta t } , 1} ,

{dequeue property , ” dequeue ” ,( gp ar ray ){} , 0 ,q i s i n i t i a l i z e d ,NULL, 0} ,

{enqueue property , ” enqueue ” ,( gp ar ray ){ q c i n t } , 1 ,q i s i n i t i a l i z e d ,( s t a t s a r r a y ){ e lemen t s i gn s ta t } , 1}

} ;

struct proper ty group normal group = {. prop = p , . s i ze = 3

} ;

/∗ cleanup p r o p e r t i e s ∗ /struct p r o p e r t y i n f o c lean p [ ] = {

{ c l ea r p rope r t y , ” c l ea r ” ,( gp ar ray ){} , 0 ,q i s i n i t i a l i z e d ,NULL, 0}

} ;

struct proper ty group clean group = {. prop = clean p , . s i ze = 1

} ;

q c f o r a l l (/∗ proper ty groups ∗ /normal group , clean group ,/∗ minimum p r o p e r t i e s ∗ /5 ,/∗ v e r b o s i t y l e v e l ∗ /QC ERROR,/∗ number o f t e s t s ∗ /1000 ,/∗ s t a t i s t i c s l e v e l ∗ /QC SHOW STATS

) ;

Listing 4: QC initialization and calling to test a circularqueue library API

Figure 4: Test state machine

QC has been tested so far on two modules. Thefirst module is an implementation of a circularqueue, which has been chosen for the followingthree reasons: it is easier to validate new featuresof the framework with a simpler module, it is astateful system, with an internal representation forthe queue, and it is a portable module which canbe used to validate QC against Haskell QuickCheck.The second module is the thread scheduling moduleof VMXL4, currently being tested with unit tests usingthe Check framework.

For the circular queue, the code from Listing 4 wasused to initialize QC in order to test the queuelibrary public API. It can be observed that the codeuses normal properties and cleanup properties, asmentioned in Section 4. With just four properties,similar to unit tests, the framework automaticallygenerates 1000 test cases with a different numberof properties and different sequences of properties,each with automatically generated different inputs.As one can see, there is not much difference in thecode logic complexity for unit testing and property-based testing, but the benefits of property-based aresignificant. When different bugs were introduced onpurpose in the queuing logic, QC detected all ofthem, using only the code from Listing 4 and its fourproperties.

An example of QC finding a bug for the circularqueue is Listing 5. It displays, in order, all theproperties taken during the test and their generatedinput. Judging by the output, it is most likely that

there are problems with the enqueue operation whenthe queue gets full. This is one of the corner caseswhich the programmer should have taken care ofpersonally if he were to use unit testing.

∗∗∗ Test Fa i led ! ∗∗∗Test number 43−−−−−−−−−−−−−−i n i t : 2dequeue :enqueue : −392470180enqueue : −692402

Listing 5: QC failing test

After solving the bug, QC validates the implementa-tion in Listing 6.

+++ Success : passed 1000 t e s t s . +++Listing 6: QC tests passing

For the generic test from Listing 4, the generateduser defined statistics are shown in Listing 7. QCdisplays the total number of statistics for everyproperty. For the init property, we wanted to seein what range is the queue size. For the enqueue

property, we wanted to see the sign of the enqueuednumber. It can be observed that the numbers areshowing a balance for the generated data.

−−−TESTS USER DEFINED STATISTICS−−−

” i n i t ” INFOt o t a l s t a t i s t i c s : 1

• • • •

name : ” queue size ”<20: 201<40: 201<60: 202<80: 203<100: 193

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” enqueue ” INFOt o t a l s t a t i s t i c s : 1

name : ” e lement s ign ”negat ive : 1457p o s i t i v e : 1423

Listing 7: User defined statistics

Last but not least, for the generic test from Listing 4,the QC default statistics are shown in Listing 8. Forevery property, QC displays the total number of calls,how many times it was the first property of the testand afterwards for every property how many times itfollowed the current property. Note that for cleanupproperties QC doesn’t show the following properties,because the cleanup property will be the last fromthe test. Generally, QC default statistics are usefulnot only to balance the tests, but also to debugpreconditions.

−−−TESTS PROPERTY SEQUENCE STATISTICS−−−

” i n i t ” INFOt o t a l c a l l s : 1000s t a r t i n g c a l l s : 1000

i n i t : 0dequeue : 527enqueue : 473c lea r : 0

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” dequeue ” INFOt o t a l c a l l s : 3003s t a r t i n g c a l l s : 0


−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−” enqueue ” INFOt o t a l c a l l s : 2880s t a r t i n g c a l l s : 0


+++++++++++++++++++++++++++++++++++++++++” c l ea r ” INFOt o t a l c a l l s : 1000s t a r t i n g c a l l s : 0

Listing 8: QC statistics

When porting some of the unit tests for the VMXL4thread scheduling module to QC properties, theVMXL4 API Reference Manual was needed tounderstand the behavior of tested functions and toinfer the formal specification, which, in the end, is nota sizable effort for a programmer who is accustomedto the design of the module. For some of the portedunit tests the specification was very simple and theircontent remained almost the same.

Although only a few unit tests were ported toQC properties, the framework already found oneinconsistency in the unit test. The faulty unit testwas verifying if two threads with different prioritiesare scheduled accordingly; however on symmetricmultiprocessing (SMP) the validation condition wasalways true. The test would have passed even if thesystem had a bug.

The inconsistency was found after transforming theunit test into a property and using the same wrongspecification. The property was failing, thereforeonly two causes could have been possible: theproperty was wrong or the module had a bug.Fortunately, the first case was true and the unit testwas the cause. QC found the inconsistency usingits random generation feature. This emphasizesthat unit tests are not very reliable compared toproperties, because usually they do not take intoconsideration many test cases, therefore they mayhide system bugs or even test design bugs.

6. CONCLUSIONS AND FURTHER WORK

Every software system needs testing in order to fulfillits business requirements and, as a consequence,be reliable and successful. This paper concentrateson property-based testing, because although it ismore powerful than unit testing, due to its biggerinput coverage, it is used less frequently than unittesting. In order to emphasize the property-basedtesting applicability and importance, the paper givesan overview of the QC framework.

QC is an automated testing tool written in C whichruns in the native environment of an L4 microkerneland whose purpose is to test the microkernel API ina functional manner. Because the microkernel is astateful system, the framework allows the testing ofmultiple controlled series of operations, besides theusage of random generated input. In order to obtaina thorough testing, QC offers support for generatingany data type, using predefined generators whichcan be combined to obtain new test data generators.To test and evaluate the framework, the nativeenvironment of the VMXL4 microkernel is used.

• • • •

As future work, QC failing tests will be shrinkedto a more suggestive failing test, to ease the workof the debugging programmer. Additionally, we aimto analyze QC’s code coverage, compare it to thatof other L4 testing infrastructures and find ways toimprove it.

REFERENCES

T. Arts and J. Hughes. Erlang/quickcheck. In NinthInternational Erlang/OTP User Conference, 2003.

R. Binder. Testing object-oriented systems: models,patterns, and tools. Addison-Wesley Professional,2000.

M. Carabas, L. Mogosanu, R. Deaconescu, L. Ghe-orghe, and N. Tapus. Lightweight display virtual-ization for mobile devices. In Secure Internet ofThings (SIoT), 2014 International Workshop on,pages 18–25. IEEE, 2014.

K. Claessen and J. Hughes. Testing Monadic Codewith QuickCheck. http://www.cs.tufts.edu/

~nr/cs257/archive/john-hughes/quick.pdf,2002.

K. Claessen and J. Hughes. Quickcheck: alightweight tool for random testing of haskellprograms. Acm sigplan notices, 46(4):53–64,2011.

D. Elkaduwe, G. Klein, and K. Elphinstone. Verifiedprotection model of the sel4 microkernel. InVerified Software: Theories, Tools, Experiments,pages 99–114. Springer, 2008.

J.-C. Fernandez, L. Mounier, and C. Pachon.Property oriented test case generation. In FormalApproaches to Software Testing, pages 147–163.Springer, 2004.

G. Fink and M. Bishop. Property-Based Testing; ANew Approach to Testing for Assurance. http:

//citeseerx.ist.psu.edu/viewdoc/download?

doi=10.1.1.93.5559&rep=rep1&type=pdf, 1997.

A. Hunt, D. Thomas, and P. Programmers. Pragmaticunit testing in Java with JUnit. PragmaticBookshelf, 2004.

B. Kauer and M. Volp. L4. sec preliminarymicrokernel reference manual. 2005.

G. Klein, K. Elphinstone, G. Heiser, J. Andronick,D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt,R. Kolanski, M. Norrish, et al. sel4: Formalverification of an os kernel. In Proceedings ofthe ACM SIGOPS 22nd symposium on Operatingsystems principles, pages 207–220. ACM, 2009.

R. Kolanski and G. Klein. Formalising the l4microkernel api. In Proceedings of the TwelfthComputing: The Australasian Theory Symposium-Volume 51, pages 53–68. Australian ComputerSociety, Inc., 2006.

J. Liedtke. On micro-kernel construction, volume 29.ACM, 1995.

P. D. Machado, D. A. Silva, and A. C. Mota. Towardsproperty oriented testing. Electronic Notes inTheoretical Computer Science, 184:3–19, 2007.

V. Manea, M. Carabas, L. Mogosanu, and L. Ghe-orghe. Native runtime environment for internetof things. In Advanced Computational Meth-ods for Knowledge Engineering, pages 381–390.Springer, 2015.

M. Matsumoto and T. Nishimura. Mersennetwister: a 623-dimensionally equidistributed uni-form pseudo-random number generator. ACMTransactions on Modeling and Computer Simula-tion (TOMACS), 8(1):3–30, 1998.

L. Mogosanu, M. Carabas, R. Deaconescu, L. Ghe-orghe, and V. G. Voiculescu. VMXHAL: A VersatileVirtualization Framework for Embedded Systems,2015.

R. Nilsson. ScalaCheck: The Definitive Guide, 2014.

M. Odersky. Contracts for scala. In RuntimeVerification, pages 51–57. Springer, 2010.

R. Osherove. The art of unit testing. mitp, 2010.

A. Pennebaker. A C port of the QuickCheck unittest framework. https://github.com/mcandre/

qc, 2012.

towards the property-based testing of an l4 microkernel apiceur-ws.org/vol-1431/paper5.pdf ·...

Documents