applicability of process mining techniques in business environments

41

Upload: andreaburattin

Post on 01-Dec-2014

330 views

Category:

Science


4 download

DESCRIPTION

Presentation provided at the annual meeting of the IEEE Task Force on Process Mining, for the Best Dissertation Award, during BPM 2014 (in Eindhoven, the Netherlands, http://bpm2014.haifa.ac.il).

TRANSCRIPT

Page 1: Applicability of Process Mining Techniques in Business Environments

Applicability of Process Mining

Techniques in Business Environments

Annual Meeting IEEE Task Force on Process Mining

Andrea Burattin

andreaburattin

September 8, 2014

Page 2: Applicability of Process Mining Techniques in Business Environments

Brief Curriculum Vitæ

2009, M.Sc.Computer Science (A.I. program)University of Padova

2009 � 2012, Ph.D.Supervisor: Prof. Alessandro SperdutiJoint school University of Bologna�PadovaThesis defended on April 2013

2013 � 2014, PostdocPrompt project (prompt.processmining.it)

University of PadovaSpecola, Padova. http://flic.kr/p/cEW5bo

2 of 17

Page 3: Applicability of Process Mining Techniques in Business Environments

Ph.D. Inception

Ph.D background

Inception during M.Sc. thesis� Companies: study on process mining

A company (Siav S.p.A.) funded my PhD

www.siav.it� Aim: investigate applicability of process

mining techniques in business scenarios� Interaction with companies: interesting! (but sometimes. . . )

Outcome� �Applicability of Process Mining Techniques in Business

Environments�

3 of 17

Page 4: Applicability of Process Mining Techniques in Business Environments

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Page 5: Applicability of Process Mining Techniques in Business Environments

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Page 6: Applicability of Process Mining Techniques in Business Environments

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Page 7: Applicability of Process Mining Techniques in Business Environments

Quick Recap of Process Mining

Imagination

Process Mining

Incarnation / Environment

Observation

OperationalModel

AnalyticalModel Event Logs

InformationSystem

OperationalIncarnation

support

protocol/ audit

Discovery

Conformance

Extension

control

augment

comparecompare

analyze

mine

basis

create

(re-)design

implement

describe

Source: C. Günther, �Process mining in Flexible Environments�. PhD thesis, TU/e, Eindhoven, 2009.

4 of 17

Page 8: Applicability of Process Mining Techniques in Business Environments

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Page 9: Applicability of Process Mining Techniques in Business Environments

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Page 10: Applicability of Process Mining Techniques in Business Environments

Theoretical vs. Industrial-related Open Problems

Some literature open problems

Duplicate tasks

Exploiting all data available

Holistic mining

Di�erent perspectives from

di�erent sources

Noise and incompleteness

Case studies open problems

Using process mining tools

and con�guring algorithms

Results interpretation

Readable results

Computational power and

storage capacity required

4 Not overlapping sets

5 of 17

Page 11: Applicability of Process Mining Techniques in Business Environments

Possible Industry Scenarios

Four possible industry scenarios

Process aware vs. Process unaware

Process aware software vs. Process unaware software

Company 1 Company 2

Company 3Company 4

Process Unaware

Information Systems

Process Aware

Information Systems

Process Aware

Companies

Process Unaware

Companies

6 of 17

Page 12: Applicability of Process Mining Techniques in Business Environments

Thesis Structure and Organization

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

6 of 17

Page 13: Applicability of Process Mining Techniques in Business Environments

Overview � Data Preparation

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

6 of 17

Page 14: Applicability of Process Mining Techniques in Business Environments

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Page 15: Applicability of Process Mining Techniques in Business Environments

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Page 16: Applicability of Process Mining Techniques in Business Environments

Problems with Data Preparation

Problems at di�erent complexity and abstraction levels. Examples:

Adaptation of existing data (Syntax problem, easy)

Introduction of new information (Di�cult)

Typical set of required �elds

(case-id; activity; timestamp; [process-name]; [originator])

Our context: Company process aware; IS process unaware

Structure of available log

(activity; timestamp; originator; info1; . . . ; infon)

7 of 17

Page 17: Applicability of Process Mining Techniques in Business Environments

Problems with Data Preparation (cont.)

Case-id from infoi �elds

Candidate case-id �eldsA-priori knowledge

Events chainsStrings similarity functions

Selection of maximal chainMost activities or simplest chain

Process name is not a problem

All events belonging to the same process

Act. info1 info2

a1 AB-01 BB-01

a2 AA-02 AB-01

a3 AB-01 BB-02

a4 AB-01 BB-03

a1 AA-03 BB-04

a5 AA-03 BB-05

8 of 17

Page 18: Applicability of Process Mining Techniques in Business Environments

Problems with Data Preparation (cont.)

Case-id from infoi �elds

Candidate case-id �eldsA-priori knowledge

Events chainsStrings similarity functions

Selection of maximal chainMost activities or simplest chain

Process name is not a problem

All events belonging to the same process

Act. info1 info2

a1 AB-01 BB-01

a2 AA-02 AB-01

a3 AB-01 BB-02

a4 AB-01 BB-03

a1 AA-03 BB-04

a5 AA-03 BB-05

8 of 17

Page 19: Applicability of Process Mining Techniques in Business Environments

Overview � Control-�ow Mining

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

8 of 17

Page 20: Applicability of Process Mining Techniques in Business Environments

Exploiting Data Available

Events with duration instead of

instantaneous event

Generalization of Heuristics Miner to

exploit this new information

Start

End

Main

ac�vity

Sub‐ac�vity 1

Sub‐ac�vity 2

Sub‐ac�vity n‐1

Sub‐ac�vity n

Tim

e

AB

CD

DCBA

A

B

C

D

A B C D

Process with events as �me intervals

Process with instantaneous events

Time

9 of 17

Page 21: Applicability of Process Mining Techniques in Business Environments

Exploiting Data Available

Events with duration instead of

instantaneous event

Generalization of Heuristics Miner to

exploit this new information

Start

End

Main

ac�vity

Sub‐ac�vity 1

Sub‐ac�vity 2

Sub‐ac�vity n‐1

Sub‐ac�vity n

Tim

e

AB

CD

DCBA

A

B

C

D

A B C D

Process with events as �me intervals

Process with instantaneous events

Time

9 of 17

Page 22: Applicability of Process Mining Techniques in Business Environments

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Page 23: Applicability of Process Mining Techniques in Business Environments

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Page 24: Applicability of Process Mining Techniques in Business Environments

Not-expert Users

Our users: not-expert in process mining, with notions of BPM

ObservationsProcess mining algorithms require con�gurationsTypically, algorithm con�gurations are threshold on measures

The mining log is �niteOnly a �nite amount of con�gurations possible

We are able to discretize the parameter values

F

A

B

C

DE

A

B

C

DE

AB

C

D

A B C D

?τ1 = ?τ2 = ?τ3 = ?τ4 = ?

10 of 17

Page 25: Applicability of Process Mining Techniques in Business Environments

Model Selection Approaches

User-guided Approach

Hierarchical clustering of models

Average linkage

Any model-to-model metric

0.34

0.45

0.63

0.69

0.76

0.49

0.71

0.74

0.84

Pro

cess

1

Pro

cess

10

Pro

cess

9

Pro

cess

8

Pro

cess

5

Pro

cess

6

Pro

cess

4

Pro

cess

7

Pro

cess

2

Pro

cess

3 0 0.2 0.4 0.6 0.8 1

Navigation of the dendrogram

Automatic Approach

Hill climbing with

Maximum plateau steps

Random restarts

(Local optimum)

hMDL = argminh∈H

L(h) + L(D|h)

MDL encodings

MDL by Calders et al.

Simpli�ed heuristics

11 of 17

Page 26: Applicability of Process Mining Techniques in Business Environments

Model Selection Approaches

User-guided Approach

Hierarchical clustering of models

Average linkage

Any model-to-model metric

0.34

0.45

0.63

0.69

0.76

0.49

0.71

0.74

0.84

Pro

cess

1

Pro

cess

10

Pro

cess

9

Pro

cess

8

Pro

cess

5

Pro

cess

6

Pro

cess

4

Pro

cess

7

Pro

cess

2

Pro

cess

3 0 0.2 0.4 0.6 0.8 1

Navigation of the dendrogram

Automatic Approach

Hill climbing with

Maximum plateau steps

Random restarts

(Local optimum)

hMDL = argminh∈H

L(h) + L(D|h)

MDL encodings

MDL by Calders et al.

Simpli�ed heuristics

11 of 17

Page 27: Applicability of Process Mining Techniques in Business Environments

Overview � Results Evaluation

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

11 of 17

Page 28: Applicability of Process Mining Techniques in Business Environments

Evaluation Metrics

Model-to-model Metric

Complex process into

Permitted relations

Forbidden relations

Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A

A ‖ B ⇒ A > B, B > A

A # B ⇒ A ≯ B, B ≯ A

Comparison as Jaccard similarity on two sets (> and ≯)

Model-to-log Metric

Declare constraint π and a trace σ ⇒ healthiness measures

Activation sparsity: 1− na(σ,π)n(σ)

Violation ratio: nv (σ,π)na(σ,π)

Ful�llment ratio:nf (σ,π)na(σ,π)

Con�ict ratio: nc (σ,π)na(σ,π)

12 of 17

Page 29: Applicability of Process Mining Techniques in Business Environments

Evaluation Metrics

Model-to-model Metric

Complex process into

Permitted relations

Forbidden relations

Generation rules (based on Alpha alg.)A→ B ⇒ A > B, B ≯ A

A ‖ B ⇒ A > B, B > A

A # B ⇒ A ≯ B, B ≯ A

Comparison as Jaccard similarity on two sets (> and ≯)

Model-to-log Metric

Declare constraint π and a trace σ ⇒ healthiness measures

Activation sparsity: 1− na(σ,π)n(σ)

Violation ratio: nv (σ,π)na(σ,π)

Ful�llment ratio:nf (σ,π)na(σ,π)

Con�ict ratio: nc (σ,π)na(σ,π)

12 of 17

Page 30: Applicability of Process Mining Techniques in Business Environments

Overview � Process Extension

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

12 of 17

Page 31: Applicability of Process Mining Techniques in Business Environments

Multiperspective Mining

Given

Log with information on originators

Process model

We add roles to the model

Assumption

Roles are characterized byconsistent set of originators

1 Dependencies as handover of roles

2 Remove dependencies below threshold

Connected components are candidate roles

3 Merge candidate roles if users sets

similarities above threshold

4 Entropy-based metric to tune thresholds

13 of 17

Page 32: Applicability of Process Mining Techniques in Business Environments

Multiperspective Mining

Given

Log with information on originators

Process model

We add roles to the model

Assumption

Roles are characterized byconsistent set of originators

1 Dependencies as handover of roles

2 Remove dependencies below threshold

Connected components are candidate roles

3 Merge candidate roles if users sets

similarities above threshold

4 Entropy-based metric to tune thresholds

13 of 17

Page 33: Applicability of Process Mining Techniques in Business Environments

Overview � Stream Control-�ow Mining

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

13 of 17

Page 34: Applicability of Process Mining Techniques in Business Environments

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Page 35: Applicability of Process Mining Techniques in Business Environments

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Page 36: Applicability of Process Mining Techniques in Business Environments

Stream Context

Stream Mining Peculiarities

Cannot store the entire stream

Approximation

Backtracking not feasible

One pass over data

Variable system condition

Ex. �uctuating stream rates

Adapt the model to new data

Concept drifts

4 Completely new problems!

Principle

Recent observations are more

important than older ones

3 version of Heuristics Miner

Based on Sliding Window

Based on Lossy Counting

Based on Budget Lossy

Counting

14 of 17

Page 37: Applicability of Process Mining Techniques in Business Environments

Overview

Process MiningCapable Event Logs

Process Representa�on

Model Evalua�on

Process MiningCapable Event Stream

Data Prepara�on

Control‐flow Mining Stream Control‐flow Mining

Results Evalua�on

Process Extension

14 of 17

Page 38: Applicability of Process Mining Techniques in Business Environments

Extra: Processes and Logs Generator

Companies are reluctant to share their data

Researchers need to do tests

(No BPI challenges at that time)

Processes and Logs Generator

Stochastic context free grammar

generates random processes

Rules to simulate a process and

produce an event log

Reference model used for evaluation

control-�ow mining algorithms

P

astart G

(G ;G )

A

a

(G ′ " G )

(G ;G )

A; (G ∧ G );A

b A

c

A

d

e

A

f

A

g

aend

15 of 17

Page 39: Applicability of Process Mining Techniques in Business Environments

Extra: Processes and Logs Generator

Companies are reluctant to share their data

Researchers need to do tests

(No BPI challenges at that time)

Processes and Logs Generator

Stochastic context free grammar

generates random processes

Rules to simulate a process and

produce an event log

Reference model used for evaluation

control-�ow mining algorithms

P

astart G

(G ;G )

A

a

(G ′ " G )

(G ;G )

A; (G ∧ G );A

b A

c

A

d

e

A

f

A

g

aend

15 of 17

Page 40: Applicability of Process Mining Techniques in Business Environments

Detailed Map of Performed Activities

Process Representa�on(e.g. Dependency Graph, Petri Net)

Legacy, Process‐unaware Informa�on Systems

Process MiningCapable Event LogsData Prepara�on

Control‐flow Mining AlgorithmExploi�ng More Data

Event Logs GeneratorUser‐guided DiscoveryAlgorithm Configura�on

Automa�cAlgorithm Configura�on

Process MiningCapable Event Stream

Stream Control‐flowMining Framework

Model Evalua�on(wrt Log / Original Model)

Model‐to‐model Metric Model‐to‐log MetricRandom ProcessGenerator

Extension of Process Modelswith Organiza�onal Roles

16 of 17

Page 41: Applicability of Process Mining Techniques in Business Environments

Thanks!

Doing the Ph.D. has been amazing!

A huge Thank you! to

My supervisor, Alessandro Sperduti

Siav S.p.A. and Roberto Pinelli

My internal examiners: Tullio Vardanega, Paolo Baldan

My external examiners: Barbara Weber, Diogo Ferreira

All the process mining community!

17 of 17