probabilistic model checking of dtmc models of user activity …muffy/papers/qest_2014.pdf · 2014....

Probabilistic Model Checking of DTMC Modelsof User Activity Patterns

Oana Andrei1, Muffy Calder1, Matthew Higgs1, and Mark Girolami2

1 School of Computing Science, University of Glasgow, G12 8RZ, UK2 Department of Statistics, University of Warwick, CV4 7AL, UK

Abstract. Software developers cannot always anticipate how users willactually use their software as it may vary from user to user, and evenfrom use to use for an individual user. In order to address questions raisedby system developers and evaluators about software usage, we define newprobabilistic models that characterise user behaviour, based on activitypatterns inferred from actual logged user traces. We encode these newmodels in a probabilistic model checker and use probabilistic temporallogics to gain insight into software usage. We motivate and illustrate ourapproach by application to the logged user traces of an iOS app.

1 Introduction

Software developers cannot always anticipate how users will actually use theirsoftware, which is sometimes surprising and varies from user to user, and evenfrom use to use, for an individual user. We propose that temporal logic reasoningover formal, probabilistic models of actual logged user traces can aid softwaredevelopers, evaluators, and users by: providing insights into application usage,including differences and similarities between different users, predicting user be-haviours, and recommending future application development.

Our approach is based on systematic and automated logging and reasoningabout users of applications. While this paper is focused on mobile applications(apps), much of our work applies to any software system. A logged user traceis a chronological sequence of in-app actions representing how the user exploresthe app. From logged user traces of a population of users we infer activity pa-tterns, represented each by a Discrete-Time Markov Chain (DTMC), and foreach user we infer a user strategy over the activity patterns. For each user wededuce a meta model based on the set of all activity patterns inferred from thepopulation of users and the user strategy, and we call it the user metamodel.We reason about the user metamodel using probabilistic temporal logic proper-ties to express hypotheses about user behaviours and relationships within andbetween the activity patterns, and to formulate app-specific questions posed bydevelopers and evaluators.

We motivate and illustrate our approach by application to the mobile, mul-tiplayer game Hungry Yoshi [1], which was deployed in 2009 for iPhone devicesand has involved thousands of users worldwide. We collaborate with the Hun-gry Yoshi developers on several mobile apps and we have access to all logged

2 Andrei et al.

user data. We have chosen the Hungry Yoshi app because its functionality isrelatively simple, yet adequate to illustrate how formal analysis can inform appevaluation and development.

The main contributions of the paper are:

– a formal and systematic approach to formal user activity analysis in a pro-babilistic setting,

– inference of user activity patterns represented as DTMCs,– definition of the DTMC user metamodel and guidelines for inferring user

metamodels from logged user data,– encoding of the user metamodel in the PRISM model checker and temporal

logic properties defined over both states and activity patterns as atomicpropositions,

– illustration with a case study of a deployed app with thousands of users andanalysis results that reveal insights into real-life app usage.

The paper is organised as follows. In the next section we give an overviewof the Hungry Yoshi app, which we use to motivate and illustrate our work.We list some example questions that have been posed by the Hungry Yoshidevelopers and evaluators; while these are specific to the Hungry Yoshi app,they are also indicative questions for any app. In Sect. 3 we give backgroundtechnical definitions concerning DTMCs and probabilistic temporal logics. InSect. 4 we define inference of user activity patterns, giving a small example asillustration and some example results for Hungry Yoshi. In Sect. 5 we definethe user metamodel, we illustrate it for Hungry Yoshi and we give an encodingfor the PRISM model checker. In Sect. 6 we consider how to encode some ofthe questions posed in Sect. 2.1 in probabilistic temporal logic, and give someresults for an example Hungry Yoshi user metamodel. In Sect. 7 we reflect uponthe results obtained for Hungry Yoshi and some further issues raised by ourapproach. In Sect. 8 we review related work and we conclude in Sect. 9.

2 Running Example: Hungry Yoshi

The mobile, multiplayer game Hungry Yoshi [1] is based on picking pieces offruit and feeding them to creatures called yoshis. Players’ mobile devices reg-ularly scan the available WiFi access points and display a password-protectednetwork as a yoshi and a non-protected network as a fruit plantation. Planta-tions grow particular types of fruit (possibly from seeds) and yoshis ask playersfor particular types of fruit. Players score points if they pick the fruit from thecorrect plantations, store them in a basket, and give them to yoshis as requested.There is further functionality, but here we concentrate on the key user-initiatedevents, or button taps, which are: see a yoshi, see a plantation, pick fruit andfeed a yoshi. The external environment (as scanned by device), combined withuser choice, determines when yoshis and plantations can be observed. The gamewas instrumented by the developers using the SGLog data logging infrastruc-ture [2], which streams logs of specific user system operations back to servers on

Probabilistic Model Checking of DTMC Models of User Activity Patterns 3

(a) Main menu (b) Yoshi Zoe (c) Newquay plantation

Fig. 1: Hungry Yoshi screenshots: two plantations (Newquay hill and Zielona valley)and two yoshis (Zoe and Taner) are observed. The main menu shows the availableplantations and yoshis with their respective content and required types of fruit. Thecurrent basket contains one orange seed, one apricot and one apple.

the developing site as user traces. The developers specify directly in the sourcecode what method calls or contextual information are to be logged by SGLog.A sample of screenshots from the game is shown in Fig. 1.

2.1 Example Questions from Developers and Evaluators

Key to our formal analysis is suitable hypotheses, or questions, about user be-haviour. For Hungry Yoshi, we interviewed the developers and evaluators of thegame to obtain questions that would provide useful insights for them. Interest-ingly most of their hypotheses were app-specific, and so we focus on these here,and then indicate how each could be generalised. We note that to date, toolsavailable to the developers and evaluators for analysis include only SQL andiPython stats scripts.

1. When a yoshi has been fed n pieces of fruit (which results in extra pointswhen done without interruption for n equal 5), did the user interleave pickfruit and feed a yoshi n times or did the user perform n pick fruit eventsfollowed by n feed a yoshi events? And afterwards, did he/she continue withthat pick-feed strategy or change to another one? Which strategy is morelikely in which activity pattern? More generally, when there are several waysto reach a goal state, does the user always take a particular route and is thisdependent on the activity pattern?

2. If a user in one activity pattern does not feed a yoshi within n button taps,but then changes to another activity pattern, is the user then likely to feeda yoshi within m button taps? More generally, which events cause a changeof activity pattern, and which events follow that change of activity pattern?

4 Andrei et al.

3. What kind of user tries to pick fruit 6 times in a row (a basket can only hold5 pieces of fruit)? More generally, in which activity pattern is a user morelikely to perform an inappropriate event?

4. If a user reads the instructions once, then does that user reach a goal statein fewer steps than a user who does not read the instructions at all? (Thusindicating the instructions are of some utility.) More generally, if a userperforms a given event, then it is more likely that he/she will perform anothergiven event, within n button taps, than users that have not performed thefirst event? Is this affected by the activity pattern?

3 Technical Background

We assume familiarity with Discrete-Time Markov Chains, probabilistic logicsPCTL and PCTL*, and model checking [3, 4]; basic definitions are below.

A discrete-time Markov chain (DTMC) is a tuple D = (S, s,P, L) where:S is a set of states; s ∈ S is the initial state; P : S × S → [0, 1] is the tran-sition probability function (or matrix) such that for all states s ∈ S we have∑

s′∈S P(s, s′) = 1; and L : S → 2AP is a labelling function associating to eachstate s in S a set of valid atomic propositions from a set AP . A path (or execution)of a DTMC is a non-empty sequence s0s1 . . . where si ∈ S and P(si, si+1) > 0for all i ≥ 0. A path can be finite or infinite. Let PathD(s) denote the set of allinfinite paths of D starting in state s.

Probabilistic Computation Tree Logic (PCTL) [3] and its extension PCTL*allow one to express a probability measure of the satisfaction of a temporalproperty. Their syntax is the following:

State formulae Φ ::= true | a | ¬Φ | Φ ∧ Φ | P./ p[Ψ ]PCTL Path formulae Ψ ::= XΦ | ΦU≤n Φ

PCTL* Path formulae Ψ ::= Φ | Ψ ∧ Ψ | ¬Ψ | XΨ | ΨU≤n Ψ

where a ranges over a set of atomic propositions AP , ./∈ {≤, <,≥, >}, p ∈ [0, 1],and n ∈ N ∪ {∞}.

A state s in a DTMC D satisfies an atomic proposition a if a ∈ L(s). Astate s satisfies a state formula P./ p[Ψ ], written s |= P./ p[Ψ ], if the probabilityof taking a path starting from s and satisfying Ψ meets the bound ./ p, i.e.,Prs{ω ∈ PathD(s) | ω |= Ψ} ./ p, where Prs is the probability measure definedover paths from state s. The path formula XΦ is true on a path starting withs if Φ is satisfied in the state following s; Φ1 U≤n Φ2 is true on a path if Φ2

holds in the state at some time step i ≤ n and at all preceding states Φ1 holds.This is a minimal set of operators, the propositional operators false, disjunctionand implication can be derived using basic logical equivalences and a commonderived path operators is the eventually operator F where F≤n Φ ≡ true U≤n Φ.If n = ∞ then superscripts omitted. We assume the following two additionalnotations. Let ϕ denote the state formulae from the propositional logic fragmentof PCTL, i.e., ϕ ::= true | a | ¬ϕ | ϕ ∧ ϕ, where a ∈ AP . Let D|ϕ denote theDTMC obtained from D by restricting the set of states to those satisfying ϕ.


Many of the properties we will examine require PCTL*, because we wantto examine sequences of events: this requires multiple occurrences of a boundeduntil operator. This is not fully implemented in the current version of PRISM(only a single bounded U is permitted3) and so we combine probabilities obtainedfrom PRISM filtered properties to achieve the same result. Filtered probabilitiescheck for properties that hold from sets of states satisfying given propositions.For a DTMC D, we define the filtered probability of taking all paths that startfrom any state satisfying ϕ and satisfy (PCTL) ψ by:

ProbDfilter(ϕ)(ψ)def= filters∈D,s|=ϕPrs{ω ∈ PathD(s) | ω |= ψ}

where filter is an operator on the probabilities of ψ for all the states satisfyingϕ. In the examples illustrated in this paper we always use state as the filteroperator since ϕ uniquely identifies a state.

4 Inferring User Activity Patterns

The role of inference is to construct a representation of the data that is amenableto checking probabilistic temporal logic properties. Developers want to be ableto select a user and explore that user’s model. While this could be achieved byconstructing an independent DTMC for each user, there is much to be gainedfrom sharing information between users. One way to do this is to constructa set of user classes based on attribute information, and to learn a DTMCfor each class. This is the approach taken in [5] for users interacting with webapplications, and is a natural way to aggregate information over users and tocondition user-models on attribute values. One issue with this approach is thatit assumes within-class use to be homogeneous. For example, all users in thesame city using the same browser are modelled using the same DTMC.

In this work we take a different approach to inference. We have found thecommon representations of context - such as location, operating system, or timeof day - to be poor predictors of mobile application use. For this reason weconstruct user models based on the log information alone, without any ad-hocspecification of user classes. By letting the data speak for itself, we hope to un-cover interesting activity patterns and meaningful representations of users.

4.1 Statistical Model and Inference

We extend the standard DTMC model by introducing a strategy for each userover activity patterns. More formally, we assume there exists a finite K numberof activity patterns, each modelled by a DTMC denoted αk = (S, ιinit,Pk, L),for k = 1, . . . ,K. Note only the transition probability Pk varies over the set ofDTMCs, all the other parameters are the same. For some enumeration of usersm = 1, . . . ,M , we represent a user’s strategy by a vector θm such that θm(k)denotes the probability that user m transitions according to αk.3 Because currently the LTL-to-automata translator that PRISM uses does not support

bounded properties.

6 Andrei et al.

Statistical model. The data for each user is assumed to be generated in thefollowing way. We assume all users to be independent and all DTMCs to beavailable to all users at all points in time. A user chooses an initial state accordingto ιinit . When in state s ∈ S, user m selects the kth DTMC with probabilityθm(k). If the user chooses the kth DTMC, then they transition from state sto s′ ∈ S with probability Pk(s, s′). This simple description specifies all theprobabilistic dependencies required to compute the likelihood of the data giventhe parameters of the model. While it is possible to extend the model so θ isstate-dependent, this will require us to either lose the distributed representationof the user population, or to increase the number of parameters in a way thatleads to a high combinatorial degree of complexity.

Inference. Inference is performed by maximising the log-likelihood of the dataover the parameters of the model. This cannot be done analytically and we usea numerical method: the expectation-maximisation (EM) algorithm of [6]. ForK > 1, the log-likelihood has multiple maxima and we restart the algorithmmultiple times and select the output parameters with the highest log-likelihoodover all runs. For the data considered here, restarting the algorithm 1000 timeswas sufficient to reproduce the same output parameters.

4.2 Example Activity Patterns from Hungry Yoshi

In Fig. 2 we give the activity patterns inferred from a dataset of user tracesfor 164 users randomly selected from the user population, for K = 2. A moredetailed overview is given in the work-in-progress paper [7]. For brevity, wedo not include the exact values of P1 and P2, but thicker arcs correspond totransition probabilities greater than 0.1, thinner ones to transition probabilitiesin [0.01, 0.1], and dashed ones to transition probabilities smaller than 10−12.Intuitively, we can see that given the game is essentially about seeing yoshis andfeeding them, α1 looks like a better way for playing the game. For example inα2 it is quite rare to reach feed from seeY and seeP, and also rare to move fromseeP to pick. Hungry Yoshi is a simple app with only two distinctive activitypatterns, in a more complex setting we might not be able to have any intuitionabout the activity patterns.

5 User Metamodel

We define the formal model of the behaviour of a user m with respect to thepopulation of users, which we call the user metamodel (UMM). The UMM foruser m is a DTMC obtained by “flattening” the transition model over statesand strategies. The resulting DTMC describes how the user transitions betweencomposite states of the form (s, k) where s is an observable state and k indicatesthe activity pattern at that time. The UMM can be defined formally in thefollowing way.


seeY seeP

feed pick

α1

seeY seeP

feed pick

α2

Fig. 2: Two user activity patterns α1 and α2 inferred from Hungry Yoshi usage.

Definition 5.1 (User Metamodel). Given K activity patterns α1, . . . , αK

and θm the strategy of user m for choosing activity patterns, the user metamodelfor m is a DTMC M = (SM, ιinit

M ,PM,LM) where:

– SM = S × {1, . . . ,K},– ιinitM (s, k) = θm(k) · ιinit(s),

– PM((s, k), (s′, k′)) = θm(k′) ·Pk′(s, s′),– LM(s, k) = L(s) ∪ {α = k}.

We label each state (s, k) with the atomic proposition α = k to denote that thestate belongs to the activity pattern αk.

5.1 Example UMM from Hungry Yoshi

An intuitive graphical description of the UMM for the Hungry Yoshi game forK = 2 is illustrated in Fig. 3. For example, θm(1) is the probability that userm continues with activity pattern α1, i.e. takes a transition between states inα1. The probability that the user changes the activity pattern and makes atransition according to α2 is proportional to θm(2). Figure 3 is not a directrepresentation of the transition probability matrix of the UMM DTMC, but itillustrates how that matrix is derived from the matrices of the individual useractivity patterns. Note that the activity patterns have the same sets of states.For instance, in the Hungry Yoshi example, consider we are in state seeY withα1; we can move to state feed following the same pattern α1 with the probabilityθm(1) · P1(seeY , feed), or we can change the activity pattern and move to statefeed following α2 with the probability θm(2) · P2(seeY , feed).

5.2 Encoding a UMM in PRISM

We use the probabilistic model checker PRISM [8]. We assume some familiaritywith the modelling language (based on the language of reactive modules), whichincludes global variables, modules with local variables, labelled-commands cor-responding to transitions and multiway synchronisation of modules. Below we

8 Andrei et al.

θm(1) θm(2)

θm(1)

θm(2)seeY seeP

feed pick

α1

seeY seeP

feed pick

α2

Fig. 3: An intuitive view of computing the transition probability matrix of the usermetamodel for the Hungry Yoshi app.

illustrate the PRISM encoding of the UMM for user m, where K is the numberof activity patterns, n is the number of states in each activity pattern αk.

module UserMetamodel ms : [0 .. n] init 0;k : [0 ..K] init 0;

[] (s = 0) −→ θm(1) ∗ ιinit(1) : (s′ = 1) & (k′ = 1) + . . .+θm(K) ∗ ιinit(n) : (s′ = n) & (k′ = K);

[] (s = 1) −→ θm(1) ∗P1(1, 1) : (s′ = 1) & (k′ = 1) + . . .+θm(K) ∗PK(1, n) : (s′ = n) & (k′ = K);

...[] (s = n) −→ θm(1) ∗P1(n, 1) : (s′ = 1) & (k′ = 1) + . . .+

θm(K) ∗PK(n, n) : (s′ = n) & (k′ = K);endmodule

The representation is straightforward, consisting of one module with (n + 1)commands for all n states of any activity pattern and for one initial state. Theinitial state (s = 0, k = 0) is a dummy that encodes the global initial distributionιinit for the user activity patterns. All activity patterns have the same set ofstates and we enumerate them from 1 to n; we can label them convenientlywith atomic propositions. For instance, in a Hungry Yoshi UMM the states(0, k) to (4, k) are labelled by the atomic proposition init , seeY , feed , seeP , pickrespectively. For each state (s, ·), with s > 0, we have a command defining allpossible n ·K probabilistic transitions. Pk(i, j) is the transition probability fromstate i to state j in αk, and θm(k) is the probability of user m to choose theactivity patterns αk, for all i, j ∈ {1, . . . , n}, k ∈ {1, . . . ,K}. If the probabilityof an update is null, then the corresponding transition does not take place.


Fig. 4: Question 1: the probability of feeding a yoshi for the first time within N buttontaps for the activity pattern α1 on the left and for α2 on the right.

6 Analysing the Hungry Yoshi UMM

In this section, we give some example analysis for a UMM. Namely, we encodeand evaluate quantitatively several example questions from Sect. 2.1 for theUMM with the user strategy for transitioning between activity patterns definedby θ = (0.7, 0.3). The PRISM models and property files are freely available4.

Recall that to score highly, a user must feed one or more yoshis (the appro-priate fruit) often. An informal inspection of α1 and α2 indicates that α2 is aless effective strategy for playing the game, since paths from seeP and seeY tofeed are unlikely. Now, by formal inspection of the UMM (encoded in PRISM),we can investigate this hypothesis more rigorously. We consider properties thatare parametrised by a number of button taps (e.g. N , N1, N2) and by activitypattern (e.g. α1, α2), so we use the PRISM experiment facility that allows us toevaluate and then plot graphically results for a range of formulae.

Question 1. How many button taps N does it take to feed a yoshi for the firsttime? We encode this by the probabilistic until formula:

p1(i) = ProbMinit((¬feed) U≤N ((α = i) ∧ feed))

and equivalently in PRISM: P=?[(!"feed") U<=N (alpha=i)&("feed")].For activity pattern α1, Figure 4 shows that within 2 button taps the prob-

ability increases rapidly, and after 5 button taps the probability is more than70%. Contrast this with the results for α2: the probability increases rapidly after3 button taps but soon it reaches the upper bound of 0.003. Comparing the tworesults, α1 is clearly more effective.

Now we consider more complex questions concerning sequences of feedingand picking; recall that a basket can hold at most 5 fruits and extra points aregained by feeding a yoshi its required 5 fruits without any other interruption. In

4 Available from http://dcs.gla.ac.uk/~oandrei/yoshi .

10 Andrei et al.

Question 2 we consider feeding a full basket to a yoshi, without any interruptions;in Question 3 we consider picking a full basket, without being interrupted by afeed , followed by feeding the full basket to a yoshi, which is again defined by fiveconsecutive feeds, without any interruptions. Note that when considering feedingthe full basket to a yoshi, we exclude all interruptions, i.e. any interleavings withpick, seeY, and seeP.

Question 2. What is the probability of feeding the same yoshi a full basket? Wecalculate the probability of reaching the state feed within N button taps andthen visiting it (with the same activity pattern i ∈ {1, 2}) for another four timeswithout visiting any other state:

p2(i) = ProbMinit(F≤N (α = i ∧ feed)) · (ProbM|α=i

feed (X feed))4

We calculate this probability in PRISM using the property:

P=?[F<=N((alpha=i)&"feed")] *

pow(filter(state,P=?[X(alpha=i&"feed")],(alpha=i&"feed")),4)

The results are shown in Fig. 5 for both activity patterns and a range of numberof button taps. While the results for α1 (converging to 0.018) are higher than forα2 (effectively 0); they are both small. There could be several causes for this. Forexample, players are only made aware of the possibility of extra points at the endof the instructions pages, or available fruit depends on the external environment.If designers/evaluators want this investigated further, then we would require torecord and extract more detail from the logs, for example to log numbers ofavailable WiFi access points and scrolls through instruction pages.

Question 3. What is the probability of filling up a basket of fruit without feedinga yoshi, and only after the basket is full feeding the same yoshi the whole basket?We calculate the probability of reaching the state feed only after visiting the statepick five times (without feeding) and then visiting the state feed four more timeswithout visiting any other state, for each activity pattern i ∈ {1, 2}:

p3(i) = ProbMinit [(¬pick) U≤N ((α = i) ∧ pick)]·(ProbM|α=i

pick [X((¬feed ∧ ¬pick) U (pick))])4·ProbM|α=i

pick [(¬feed) U feed ] · (ProbM|α=i

feed [X feed ])4

The corresponding PRISM property is:

P=?[!("pick") U<=N ((alpha=i)&"pick")] * pow(filter(state,

P=?[X ((alpha=i)&(!"feed")&(!"pick") U ((alpha=i)&"pick"))],

((alpha=i)&"pick")),4) * filter(state,

P=?[(alpha=i)&(!"feed")U((alpha=i)&"feed")],((alpha)=i&"pick")) *

pow(filter(state,P=?[X((alpha=i)&"feed")],((alpha=i)&"feed")),4)

The results are presented in Fig. 6. Again, while the probabilities are low(presumably for the reasons outlined above for Question 2) the user that picks


Fig. 5: Question 2: the probability offeeding one yoshi the whole fruit basketwithout interruptions.

Fig. 6: Question 3: the probability ofpicking five pieces of fruit and thenfeeding one yoshi the whole basket.

a full basket and feeds it to a yoshi by following activity pattern α1 does it witharound 0.00019 probability within 20 steps into the game, whereas if they followα2 from the beginning, they almost never empty the basket. So again, α1 provesto be more effective.

Now we turn our attention to a question that involves a change of activitypattern, i.e. a change in the playing strategy.

Question 4. What is the probability of starting with an activity pattern andnot feeding a yoshi within N button taps, then changing to the other activitypattern and eventually first feeding a yoshi within N2 button taps? We computethis probability as follows, where L0 = {feed , pick , seeY , seeP}:

p4(i) =∑

`∈L0ProbMinit((¬(α = i) ∧ ¬feed) U≤N ((α = i) ∧ `))·ProbM|α=i

` ((¬feed) U≤N2 feed)

The corresponding PRISM property is:

P=?[(!(alpha=i)&!("feed")) U<=N (alpha=i&"feed")] *

filter(state,P=?[(alpha=i)&!("feed") U<=N2 (alpha=i&"feed")],

alpha=i&"feed") + P=?[(!(alpha=i)&!("feed")) U<=N (alpha=i&"pick")] *


alpha=i&"pick") + P=?[(!(alpha=i)&!("feed")) U<=N (alpha=i&"seeY")] *


alpha=i&"seeY") + P=?[(!(alpha=i)&!("feed")) U<=N (alpha=i&"seeP")] *

filter(state,P=?[(alpha=i&!"feed")U<=N2(alpha=i&"feed")],alpha=i&"seeP")

Figure 7 shows the results for switching from activity patterns α1 to α2 andvice-versa respectively for less than 10 button taps to feed a yoshi after switchingthe activity pattern, while Figure 8 shows the same but for an unbounded numberof button taps (to feed a yoshi). We can see that success is much more likely byswitching from α2 to α1, than switching from α1 to α2, and a user needs about4-5 button taps to switch from α2 to α1 to maximise their score. This latter

12 Andrei et al.

Fig. 7: Question 4 for N2 ≤ 10 and i = 1 on the left and i = 2 on the right.

Fig. 8: Question 4 for N2 =∞ and i = 1 on the left and i = 2 on the right.

result is not surprising, considering that users might first inspect the game,which would involve visiting the 4 states.

All analyses were performed on a standard laptop. Note that for brevity, themobile app analysed here, and its formal model, are relatively small in size; morecomplex applications will yield more meaningful activity patterns and complexlogic properties that can be analysed on the metamodels. While state-spaceexplosion of the UMM could be an issue, it is important to note that the state-space does not depend on the number of users, but on the granularity of thestates (logged in-app actions) we distinguish.

7 Discussion

We reflect upon the results obtained for the Hungry Yoshi example and furtherissues raised by our approach.

Hungry Yoshi usage. Our analysis has revealed some insight into how users haveactually played the game: α1 corresponds to a more successful game playingstrategy than α2 and a user is much more likely to be effective if they changefrom α2 to α1 (rather than vice-versa), thus we conclude that α1 is expert be-haviour and α2 is ineffective behaviour. (Note that users can, and do, switch


between both behaviours, e.g. a user who exhibits expert behaviour can stillexhibit ineffective behaviour at some later time.) This interpretation of activitypatterns can inform a future redesign that helps users move from ineffective toexpert behaviour, or induces explicitly populations of users to follow selectedcomputation paths to reach certain goal states. We note that the developers hadvery little intuition about how often, or if, users were picking a full basket andthen feeding a yoshi (e.g. Questions 3 and 4 in Sect. 6), and so the results, whichindicate this scenario is quite rare, provided a new and useful insight for them.

Why DTMCs? Our choice of DTMC models is based on the work of of [9] inmodelling web-browsing activity, usage of Microsoft Word commands, and tele-phone usage across populations of individuals. Girolami et al. used probabilisticconvex combinations of DTMCs and demonstrated empirically that such modelwas superior in predictive performance to single DTMCs and mixture (point-mass) of DTMCs. Future work involves developing algorithms for inference ofHierarchical Hidden Markov models, where the first abstract level in the hierar-chy is the activity patterns.

Temporal properties. The properties refer to propositions about user-initiatedevents (e.g. seeY, feed) and activity patterns (e.g. α1, α2). A future improve-ment would be a syntax that parametrises the temporal operators by activitypattern. We note that PCTL properties alone were insufficient for our analysisand we have made extensive use of filtered properties. We also note that for someproperties we have used PRISM rewards, e.g. to compare scores between activitypatterns, but these are omitted in this short paper.

Reasoning about users. Model checking is performed on the UMM resulting fromthe augmentation of the set of K activity patterns with a strategy θm. It is simpleto select a user by selecting a θm and to analyse the resulting UMM. Metrics onthe set {θm | m = 1, . . . ,M} will be used in future work to characterise how theresults of the analysis change depending on the value of one θm, in the hope thatresults of the analysis for one user can be generalised to users close by (underthe given metric).

Formulating hypotheses: domain specific and generic. We have considered do-main specific hypotheses presented by developers and evaluators, but could aformal approach help with hypothesis generation? For example, we could framequestions using the specifications patterns for probabilistic quality properties asdefined in [10] (probabilistic response, probabilistic precedence, etc.). Referringto our questions in Sect. 2.1, we recognise in the first item the probabilistic prece-dence pattern, in the second one the probabilistic response pattern, and in thelast two the probabilistic constrained response pattern. However, these patternsrefer only to the top level structure, whereas all our properties consist of mul-tiple levels of embedded patterns. Perhaps more complex patterns are requiredfor our domain? The patterns of [10] were abstracted from avionic, defence, andautomotive systems, which are typical reactive systems; does the mobile app

14 Andrei et al.

domain, or domains with strong user interaction exhibit different requirements?We remark also that analysis of activity patterns is just one dimension to con-sider: there are many others that are relevant to tailoring software to users, forexample software variability and configuration, and user engagement. These areall topics of further work.

Choosing K activity patterns. What is the most appropriate value for K, can weguide its choice? While we could use model selection or non-parametric methodsto infer it, there might be domain-based reasons for fixingK. For example, we canstart with an estimate value of K and then compare analysis activity patterns: ifproperties for two different activity patterns give very close quantitative resultsthen we only need a smaller K.

What to log? This is a key question and depends upon the propositions we exa-mine in our properties, as well as the overheads of logging (e.g. on system per-formance, battery, etc.) and ethical considerations (e.g. what have users agreed).Formal analysis will often lead to new instrumentation requirements, which inturn will stimulate new analysis. For example, our analysis of Hungry Yoshi hasindicated a need for logged traces to include more information about currentcontext, e.g. the observable access points (yoshis).

8 Related Work

Our work is a contribution to the new software analytics described in [11], focus-ing on local methods and models, and user perspectives. It is also resonates withtheir prediction that by 2020 there will be more use of analytics for mobile appsand games. Recent work in analysis of user behaviours in systems, especiallyXBox games, is focused on understanding how features are used and how todrive users to use desirable features. For example, [12] investigates video gameskills development for over 3 million users based on analysis of users’ TrueSkillrating [13]. Their statistical analysis is based on a single, abstract “skill score”,whereas our approach is based on reasoning about computation paths relating toin-app events and temporal property analysis of activity patterns. Our approachcan be considered a form of run-time quantitative verification (by probabilisticmodel checking) as advocated by Calinescu et al. in [14]. Whereas they considerfunctional behaviour of service-based systems (e.g. power management) and soft-ware evolution triggered by a violation of correctness criteria because softwaredoes not meet the specification, or environment change, we address evolutionbased on behaviours users actually exhibit and how these behaviours relate tosystem requirements, which may include subtle aspects such as user goals andquality of experience. Perhaps of more relevance is the work on off-line runtimeverification of logs in [15] that estimates the probability of a temporal prop-erty being satisfied by program executions (e.g. user traces). Their approachand results could help us determine how logging sampling in-app actions andapp configuration affects analysis of user behaviour. The work of [16] employing


Hidden Markov Chains models (HMMs) is related to our approach, however ourfocus on capturing behavioural characteristics that are shared across a popula-tion forces us to consider a model whose distributed representation cannot becaptured by HMMs. Finally we note the very recent work of [5] on a similar ap-proach and comment the major differences in Sect. 4. In addition they analyseREST architectures (each log entry corresponds to a web page access), whereasthe mobile apps we are analysing are not RESTful, we can include more finegrained and contextual data in the logged user data.

9 Conclusions and Future Work

We have outlined our contribution to software analytics for user interactive sys-tems: a novel approach to probabilistic modelling and reasoning over actual userbehaviours, based on systematic and automated logging and reasoning aboutusers. Logged user traces are computation paths from which we infer activitypatterns, represented each by a DTMC. A user meta model is deduced for eachuser, which represents users as mixtures over DTMCs. We encode the user meta-models in the probabilistic model checker PRISM and reason about the meta-model using probabilistic temporal logic properties to express hypotheses aboutuser behaviours and relationships within and between the activity patterns.

We motivated and illustrated our approach by application to the HungryYoshi mobile iPhone game, which has involved several thousands of users world-wide. We showed how to encode some example questions posed by developersand evaluators in a probabilistic temporal logic, and obtained quantitative re-sults for an example user metamodel. After considering our formal analysis oftwo activity patterns, we conclude the two activity patterns distinguish expertbehaviour from ineffective behaviour and represent different strategies abouthow to play the game. While in this example the individual activity patternDTMCs are small in number and size, in more complex settings it will be im-possible to gain insight into behaviours informally, and in particular to insightsinto relationships between the activity patterns, so automated formal analysisof the user metamodels will be essential.

In this paper we have focused on defining the appropriate statistical and for-mal models, their encoding, and reasoning using model checking. We have notexplored here the types of insights we can gain into user behaviours from ourapproach, nor how we can employ these in system redesign and future system de-sign, especially for specific subpopulations of users. Further, in this short paper,we have not considered the role of prediction from analysis and the possibili-ties afforded by longitudinal analysis. For example, how do the activity patternsand properties compare between users in 2009 and users in 2013? This is ongo-ing work within the A Population Approach to Ubicomp System Design project,where we are working with system developers on the practical application ofour formal analysis in the design and redesign of several new apps. We are alsoinvestigating metrics of user engagement, tool support, and integration of thiswork with statistical and visualisation tools.

16 Andrei et al.

Acknowledgments. This research is supported by EPSRC Programme Grant APopulation Approach to Ubicomp System Design (EP/J007617/1). The authorsthank all members of the project, and Gethin Norman for fruitful discussions.

References

1. McMillan, D., Morrison, A., Brown, O., Hall, M., Chalmers, M.: Further into theWild: Running Worldwide Trials of Mobile Systems. In Floreen, P., Kruger, A.,Spasojevic, M., eds.: Proc. of Pervasive’10. Volume 6030 of LNCS., Springer (2010)210–227

2. Hall, M., Bell, M., Morrison, A., Reeves, S., Sherwood, S., Chalmers, M.: Adaptingubicomp software and its evaluation. In: Proc. of EICS’09, New York, NY, USA,ACM (2009) 143–148

3. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press (2008)4. Kwiatkowska, M.Z., Norman, G., Parker, D.: Stochastic Model Checking. In

Bernardo, M., Hillston, J., eds.: SFM. Volume 4486 of LNCS., Springer (2007)220–270

5. Ghezzi, C., Pezze, M., Sama, M., Tamburrelli, G.: Mining Behavior Models fromUser-Intensive Web Applications. In Jalote, P., Briand, L.C., van der Hoek, A.,eds.: Proc. of ICSE’14, ACM (2014) 277–287

6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from IncompleteData via the EM Algorithm. Journal of the Royal Statistical Society. Series B(Methodological) 39(1) (1977) pp. 1–38

7. Higgs, M., Morrison, A., Girolami, M., Chalmers, M.: Analysing User BehaviourThrough Dynamic Population Models. In: Proc. of CHI’13, Extended Abstractson Human Factors in Computing Systems. CHI EA’13, ACM (2013) 271–276

8. Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: Verification of Probabi-listic Real-Time Systems. In: Proc. of CAV’11. Volume 6806 of LNCS., Springer(2011) 585–591

9. Girolami, M., Kaban, A.: Simplicial Mixtures of Markov Chains: Distributed Mod-elling of Dynamic User Profiles. In Thrun, S., Saul, L., Scholkopf, B., eds.: Advancesin Neural Information Processing Systems 16. MIT Press, Cambridge, MA (2004)

10. Grunske, L.: Specification patterns for probabilistic quality properties. In Schafer,W., Dwyer, M.B., Gruhn, V., eds.: Proc. of ICSE’08, ACM (2008) 31–40

11. Menzies, T., Zimmermann, T.: Software Analytics: So What? IEEE Software 30(4)(2013) 31–37

12. Huang, J., Zimmermann, T., Nagappan, N., Harrison, C., Phillips, B.: Masteringthe art of war: how patterns of gameplay influence skill in Halo. In Mackay, W.E.,Brewster, S.A., Bødker, S., eds.: Proc. of CHI’13, ACM (2013) 695–704

13. Herbrich, R., Minka, T., Graepel, T.: TrueskillTM: A Bayesian skill rating system.Proc. of NIPS’06 (2006) 569–576

14. Calinescu, R., Ghezzi, C., Kwiatkowska, M.Z., Mirandola, R.: Self-adaptive soft-ware needs quantitative verification at runtime. Commun. ACM 55(9) (2012)69–77

15. Stoller, S.D., Bartocci, E., Seyster, J., Grosu, R., Havelund, K., Smolka, S.A.,Zadok, E.: Runtime Verification with State Estimation. In: Proc. of RV’11. Volume7186 of LNCS., Springer (2011) 193–207

16. Bartocci, E., Grosu, R., Karmarkar, A., Smolka, S.A., Stoller, S.D., Zadok, E.,Seyster, J.: Adaptive Runtime Verification. In Qadeer, S., Tasiran, S., eds.: Proc.of RV’12. Volume 7687 of LNCS., Springer (2012) 168–182

probabilistic model checking of dtmc models of user activity …muffy/papers/qest_2014.pdf · 2014....

Documents