state-merging dfa induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · a...

23
State-merging DFA induction algorithms with mandatory merge constraints From MSM to ASM Bernard Lambeau, Christophe Damas and Pierre Dupont Computing Science and Engineering Department Université catholique de Louvain – Belgium September 23, 2008 (UCL Machine Learning Group) From MSM to ASM September 23, 2008 1.

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

State-merging DFA induction algorithms withmandatory merge constraints

From MSM to ASM

Bernard Lambeau, Christophe Damas and Pierre Dupont

Computing Science and Engineering DepartmentUniversité catholique de Louvain – Belgium

September 23, 2008

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 1.

Page 2: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Motivations

Requirements Engineering (RE)It has been claimed that the hardest part in building a softwaresystem is deciding precisely what the system should doOne can automate parts of this RE process by learning behaviormodels from scenariosScenarios are strings of possible events which can be generalizedto form a language of acceptable behaviorsSuch languages are conveniently represented by finite-statemachines

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 2.

Page 3: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

ScenariosA train system example

Scenarios describe interactions between the software-to-be andits environmentScenarios are typical examples of system usage provided by anend-user involved in the requirements elicitation process

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 3.

Page 4: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Synthesis of behavior models and DFA inductionA win-win situation

Regular languages are considered to be powerful enoughDFAs offer a convenient representation for model checking andcode generationThe typical size of such DFAs is about 20 . . . 100 states

I ⇒ hard to design exactly by a software analystI ⇒ not problematic for state-of-the-art DFA induction algorithm

(RPNI, BlueFringe)

Typical alphabet size ≈ 10 . . . 20The end-user can really be used as an oracle in practice

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 4.

Page 5: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

State-merging induction with membership queries

Our previous work: the QSM algorithm [Damas et al. 05],[Dupont et al. 08]

An extension to RPNI or BlueFringe (also known as redBlue) withmembership queries

The limited amount of positive and negative scenarios provided initiallyby an end-user can be enriched by asking membership queries

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 5.

Page 6: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

A high-level Message Sequence ChartFlow-charting of various scenarios

This information defines Mandatory Merge Constraints between somestates of the prefix tree acceptor (PTA)

x

start

open

x

stop

alarm

xclose

emergency stop emergency open

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 6.

Page 7: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

A high-level Message Sequence ChartFlow-charting of various scenarios

This information defines Mandatory Merge Constraints between somestates of the prefix tree acceptor (PTA)

x

start

open

x

stop

alarm

xclose

emergency stop emergency open

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 6.

Page 8: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

State-merging DFA induction

Prefix-Tree Acceptor (PTA)

0

1a

2

b 3a

4

b

5a

6a

7b

⇓Quotient automaton

0

1a

{2,4}

bb

{3,6}a

5a

7b

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 7.

Page 9: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

State-merging DFA induction algorithm

Algorithm STATE-MERGING DFA INDUCTION ALGORITHMInput: A positive and negative sample (S+, S−)

Output: A DFA A consistent with (S+, S−)

// Compute a PTA, let N denote the number of its statesPTA← Initialize(S+, S−); π ← {{0}, {1}, ..., {N − 1}}

// Main state-merging loopwhile (Bi , Bj )← ChoosePair(π) do

πnew ← Merge(π, Bi , Bj )if Compatible(PTA/πnew , S−) then

π ← πnew

return PTA/π

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 8.

Page 10: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

The Merge function also reduces non-determinism

0

a

2b

a

4b

6a

0

a

2b

a

4b

10a

6

a

0

a

2b

a9b

4

b

10a

6a

0

a

2b

a7a

4

b

9b

6a

10a

0

a

2b

a7a

8b

4

b

9b

6a

10a

0

a

2b

a5a

4

b

7a

8b

6a

9b

10a

0

a

2b

3a

4

b

5a

6a

7a

8

b

9b

10a

Merging 10 and 6

Merging 9 and 4

Merging 7 and 2

Merging 3 and 2

Merging 8 and 4

Merging 5 and 2 (for determinization)

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 9.

Page 11: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Tree invariant

Tree invariant propertyAt least one of the 2 states implied in a merging operation is theroot of a (sub)-treeTrue for RPNI, BlueFringe (= redBlue), etcSimplification of the actual implementation

0

a

2b

3a

4b

5a

6a

7a

8b

9b

10a

Consolidated

Fringe

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 10.

Page 12: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Incompatibility constraints

0

1a

2

b

3b

4a

5

b6

a8

b

7a

Augmented PTA with positively accepting states (= grey) andnegatively accepting states (= black)The Merge function reduces non-determinism and checks suchcoloring constraints

I States having different colors may not be mergedI States having the same color can be merged

Coloring constraints define incompatibility between states frompositive and negative information or additional domain knowledge[Coste et al. 04], [Dupont et al. 08]

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 11.

Page 13: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Mandatory merge constraints

x

start

open

x

stop

alarm

xclose

emergency stop emergency open

Another kind of domain knowledge defines mandatory mergeconstraints between states sharing the same labelsLabeling constraints are the logical counterpart to the coloringconstraints

I States with the same label must be mergedI States with different labels can be merged

A fully labeled PTA does not define a trivial induction problemI Without coloring constraints (such as those provided by the

negative sample) all states will be merged

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 12.

Page 14: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

MSM algorithm

Algorithm MSMInput: A non-empty initial positive and negative sample (S+, S−)

Input: Labeling and coloring constraintsOutput: A DFA A consistent with (S+, S−) and all constraints

// Compute a PTA, let N denote the number of its statesPTA← Initialize(S+, S−); π ← {{0}, {1}, ..., {N − 1}}

// Merge all states according to labeling constraintswhile (Bi , Bj )← FindSameBlocks(π) do

π ← Merge(π, Bk , Bl )

// Main state-merging loopwhile (Bi , Bj )← ChoosePair(π) do

tryπ ← Merge(π, Bi , Bj )

catch avoid// inconsistency between coloring and labeling constraints

return PTA/π

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 13.

Page 15: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

MSM does not satisfy the tree invariant property

MSM is a straightforward extension to standard state-mergingalgorithmsHowever...

Labeling constraints can force to merge states such that theresulting automaton has a general graph structureThe tree invariant property is no longer satisfiedRecursive merging to reduce non-determinism naturally stopseven for general graphs

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 14.

Page 16: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

MSM does not satisfy the tree invariant property

MSM is a straightforward extension to standard state-mergingalgorithmsHowever...

Labeling constraints can force to merge states such that theresulting automaton has a general graph structureThe tree invariant property is no longer satisfiedRecursive merging to reduce non-determinism naturally stopseven for general graphs

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 14.

Page 17: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Experiments on synthetic data

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 15.

Page 18: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Requirements engineering case study

0 1alarm

2open

3

start

open

close

stop4alarm

5

leaving

stop

6alarm

7approaching

8

high

9stop

16stop

atstation

12

alarm

13stop

low

10

alarm

11approaching

start

low

low

18alarm 14

stop

start

15open

close

17open

close

low

Algorithm |lab| AccuracyRPNI - 0.55BlueFringe - 0.83MSM 0 0.55

3 0.716 0.7310 0.8815 0.90

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 16.

Page 19: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

DFA induction from a positive DFA and a negativesample

Algorithm ASMInput: A positive DFA A+ and a negative sample S−Output: A DFA A consistent with (A+, S−)

// Augment the automaton A+ with states

// marked/added from S−

M ← Augment(A+, S−)

// Compute the natural order on M

π ← NatOrder(M)

// Main state-merging loop

π ← Generalize(π)

return M/π

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 17.

Page 20: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

DFA induction from positive and negative DFAs

Algorithm ASM∗

Input: A positive DFA A+ and a negative DFA A− such that L(A+) ∩ L(A−) = ∅Output: A DFA A consistent with (A+, A−)

// Augment the automaton A+ with states

// marked/added from S−

M ← Product(A+, A−)

// Compute the natural order on M

π ← NatOrder(M)

// Main state-merging loop

π ← Generalize(π)

return M/π

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 18.

Page 21: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Product DFA

A+ A− M

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 19.

Page 22: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Take home message

Mandatory merge constraints are introduced to model domainknowledge, for instance, from a Requirements EngineeringperspectiveMandatory merge constraints form the logical counterpart toincompatibility constraintsThe MSM algorithm deals with both types of constraintsMSM is a straightforward extension to RPNI or BlueFringe but

I without satisfying the tree-invariant propertyI using recursive merging extended to general graphs

MSM gives rise to ASM∗ to induce DFAs from prior positive andnegative DFAs

I interesting from a practical viewpointI may require a new theoretical framework

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 20.

Page 23: State-merging DFA induction algorithms with mandatory ...pdupont/pdupont/pdf/icgi08_pres.pdf · A high-level Message Sequence Chart Flow-charting of various scenarios This information

Future work

MSM implementation with the BlueFringe search order (easy)

MSM as an extension to QSM for active learning with queries(somewhat more challenging)

Other applicative contexts where mandatory merge constraintsare natural

Further analyze ASM∗I theoretically: characteristic samples?I practically: experimental protocol?

(UCL Machine Learning Group) From MSM to ASM September 23, 2008 21.