process mining: discovering processes from event logs all truths are easy to understand once they...
TRANSCRIPT
Process Mining:Discovering processes from event
logs
All truths are easy to understand once they are discovered; the point is to discover them. Galileo Galilei (1564 - 1642)
Prof.dr.ir. Wil van der AalstEindhoven University of Technology
Department of Information and TechnologyP.O. Box 513, 5600 MB Eindhoven
Outline • Process Mining
– overview– alpha algorithm– genetic mining
• ProM– Architecture– Convertors (e-mail, Staffware, InConcert, SAP, etc.) – Process mining plug-ins
• Alpha-algorithm• Multi-phase mining• Genetic mining
– Analysis plug-ins– Conformance testing plug-in– LTL checker plug-in– Social network plug-in
• Conclusion
Process Mining
processdesign
implementation/configuration
processenactment
diagnosis
Motivation: Reversing the process
• Process mining can be used for:– Process discovery (What is the process?)
– Delta analysis (Are we doing what was specified?)
– Performance analysis (How can we improve?)
process mining
Registerorder
Prepareshipment
Shipgoods
Receivepayment
(Re)sendbill
Contactcustomer
Archiveorder
Overview
1) basic performance metrics
2) process modelStart
Register order
Prepareshipment
Ship goods
(Re)send bill
Receive paymentContact
customer
Archive order
End
3) organizational model 4) social network
5) performance characteristics
If …then …
6) auditing/security
www.processmining.org
Let us focus on mining process models …
1) basic performance metrics
2) process modelStart
Register order
Prepareshipment
Ship goods
(Re)send bill
Receive paymentContact
customer
Archive order
End
3) organizational model 4) social network
5) performance characteristics
If …then …
6) auditing/security
... and a very simple approach: The alpha algorithm
Alpha algorithm
α
Process log• Minimal information in
log: case id’s and task id’s.
• Additional information: event type, time, resources, and data.
• In this log there are three possible sequences:– ABCD– ACBD– EF
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
>,,||,# relations
• Direct succession: x>y iff for some case x is directly followed by y.
• Causality: xy iff x>y and not y>x.
• Parallel: x||y iff x>y and y>x
• Choice: x#y iff not x>y and not y>x.
case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
A>BA>CB>CB>DC>BC>DE>F
AB
AC
BD
CD
EF
B||CC||B
Basic idea (1)
x y
xy
Basic idea (2)
xy, xz, and y||z
x
z
y
Basic idea (3)
xy, xz, and y#z
x
z
y
Basic idea (4)
xz, yz, and x||y
x
y
z
Basic idea (5)
xz, yz, and x#y
x
y
z
It is not that simple: Basic alpha algorithm
Let W be a workflow log over T. (W) is defined as follows.
1. TW = { t T W t },
2. TI = { t T W t = first() },
3. TO = { t T W t = last() },
4. XW = { (A,B) A TW B TW a Ab B a W b a1,a2 A a1#W a2
b1,b2 B b1#W b2 },
5. YW = { (A,B) X (A,B) XA A B B (A,B) = (A,B) },
6. PW = { p(A,B) (A,B) YW } {iW,oW},
7. FW = { (a,p(A,B)) (A,B) YW a A } { (p(A,B),b) (A,B) YW b B
} { (iW,t) t TI} { (t,oW) t TO}, and
8. (W) = (PW,TW,FW). The alpha algorithm has been proven to be correct for a large class of free-choice nets.
Examplecase 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
A
B
C
D
E F
(W)
W
DEMOAlpha algorithm
A
E
G
invitereviewers
D
get review 2
time-out 2
collectreviews
H
decide
I
accept
J
reject
inviteadditionalreviewer
K
M
L
get review X
time-out X
C
B
get review 1
time-out 1
G
F
get review 3
time-out 3
48 cases16 performers
Challenges• Refining existing algorithm for (control-flow/process
perspective)– Hidden tasks– Duplicate tasks– Non-free-choice constructs– Loops– Detecting concurrency (implicit or explicit)– Mining and exploiting time– Dealing with noise– Dealing with incompleteness
• Mining other perspectives (data, resources, roles, …) • Gathering data from heterogeneous sources• Visualization of results• Delta analysis
Genetic mining
Approach
Genetic mining: The two main questions
• How to represent an individual? (Petri net?)• How to define the genetic operators? (e.g.,
crossover)
How to represent an individual?
• Problems with Petri nets:– Places do not exist in log– difficulties defining mutation and crossover– problems describing subtle rules without adding transitions
A
B
C
D
Representation of the goal processtrue A A A D D E^F BvCvG
→ A B C D E F G H
A 0 1 1 1 0 0 0 0 BvCvD
B 0 0 0 0 0 0 0 1 H
C 0 0 0 0 0 0 0 1 H
D 0 0 0 0 1 1 0 0 E^F
E 0 0 0 0 0 0 1 0 G
F 0 0 0 0 0 0 1 0 G
G 0 0 0 0 0 0 0 1 H
H 0 0 0 0 0 0 0 0 true
A
B
D
E
C
F
G
H
A more compact representation
ACTIVITY INPUT OUTPUT
A {} {{B,C,D}}
B {{A}} {{H}}
C {{A}} {{H}}
D {{A}} {{E},{F}}
E {{D}} {{G}}
F {{D}} {{G}}
G {{E},{F}} {{H}}
H {{B,C,G}} {}
A
B
D
E
C
F
G
H
Any Petri net can be mapped onto a causal matrix:
ACTIVITY INPUT OUTPUT
A {...} {{C,D},...}
B {...} {{C,D},...}
C {{A,B},...} {...}
D {{A,B},...} {...}
A
B D
C
but ...
Mapping a causal matrix onto a Petri net?
ACTIVITY INPUT OUTPUT
A {{i11,i12,i13},{i21,i22,i23}} {{o11,o12,o13},{o21,o22,o23}}
A
i11i12i13
i21i22i23
o11o12o13
o21o22o23
Wiring based on input and output sets
A
i11i12i13
i21i22i23
B
o11o12o13
o21o22o23
A
B
?
Using place fusion or silent transitions.
Example: Event logcase id activity id originator timestamp
case 1 activity A John 9-3-2004:15.01
case 2 activity A John 9-3-2004:15.12
case 3 activity A Sue 9-3-2004:16.03
case 3 activity D Carol 9-3-2004:16.07
case 1 activity B Mike 9-3-2004:18.25
case 1 activity H John 10-3-2004:9.23
case 2 activity C Mike 10-3-2004:10.34
case 4 activity A Sue 10-3-2004:10.35
case 2 activity H John 10-3-2004:12.34
case 3 activity E Pete 10-3-2004:12.50
case 3 activity F Carol 11-3-2004:10.12
case 4 activity D Pete 11-3-2004:10.14
case 3 activity G Sue 11-3-2004:10.44
case 3 activity H Pete 11-3-2004:11.03
case 4 activity F Sue 11-3-2004:11.18
case 4 activity E Clare 11-3-2004:12.22
case 4 activity G Mike 11-3-2004:14.34
case 4 activity H Clare 11-3-2004:14.38
A
B
D
E
C
F
G
H
Goal
Example: Starting pointcase id activity id
case 1 activity A
case 2 activity A
case 3 activity A
case 3 activity D
case 1 activity B
case 1 activity H
case 2 activity C
case 4 activity A
case 2 activity H
case 3 activity E
case 3 activity F
case 4 activity D
case 3 activity G
case 3 activity H
case 4 activity F
case 4 activity E
case 4 activity G
case 4 activity H
+ 500 randomly generated initial individuals
Two individuals
ACTIVITY INPUT OUTPUT
A {} {{B,C,D}}
B {{A}} {{H}}
C {{A}} {{H}}
D {{A}} {{E}}
E {{D}} {{G}}
F {} {{G}}
G {{E},{F}} {{H}}
H {{C,B,G}} {}
ACTIVITY INPUT OUTPUT
A {} {{B,C,D}}
B {{A}} {{H}}
C {{A}} {{H}}
D {{A}} {{E,F}}
E {{D}} {{G}}
F {{D}} {{G}}
G {{E},{F}} {{H}}
H {{C},{B},{G}} {}
Crossover
ACTIVITY INPUT OUTPUT
A {} {{B,C,D}}
B {{A}} {{H}}
C {{A}} {{H}}
D {{A}} {{E, F}}
E {{D}} {{G}}
F {{D}} {{G}}
G {{E},{F}} {{H}}
H {{C,B,G}} {}
ACTIVITY INPUT OUTPUT
A {} {{B,C,D}}
B {{A}} {{H}}
C {{A}} {{H}}
D {{A}} {{E}}
E {{D}} {{G}}
F {} {{G}}
G {{E},{F}} {{H}}
H {{C},{B},{G}} {}
Resulting CM with fitness 1.0
true A A A D D E^F BvCvG
→ A B C D E F G H
A 0 1 1 1 0 0 0 0 BvCvD
B 0 0 0 0 0 0 0 1 H
C 0 0 0 0 0 0 0 1 H
D 0 0 0 0 1 1 0 0 E^F
E 0 0 0 0 0 0 1 0 G
F 0 0 0 0 0 0 1 0 G
G 0 0 0 0 0 0 0 1 H
H 0 0 0 0 0 0 0 0 true
Mapping
A
B
D
E
C
F
G
H
true00000000H
H10000000G
G01000000F
G01000000E
E^F00110000D
H10000000C
H10000000B
BvCvD00001110A
H G F E D C B A→
BvCvGE^F D D A A A true
true00000000H
H10000000G
G01000000F
G01000000E
E^F00110000D
H10000000C
H10000000B
BvCvD00001110A
H G F E D C B A→
BvCvGE^F D D A A A true
ProM framework
ProMStaffware
InConcert
MQ Series
workflow management systems
FLOWer
Vectus
Siebel
case handling / CRM systems
SAP R/3
BaaN
Peoplesoft
ERP systems
common XML format for storing/exchanging workflow logs
input/outputCore
Plugins
ProMframework
visualization analysis
alpha algorithmgenetic
algorithmTsinghua alpha
algorithmMulti phasealgorithms
social networkminer
case dataextraction
property verifier
ExternalTools
NetMiner Viscovery ......
...
Converter plug-in: EMailAnalyzer
XML format
ProM architecture
UserInterface
+User
Interaction
StaffwareFlowerSAPInConcert...
Heuristic NetAris Graph Format(Aris AML Format)PNMLTPN...
MiningPlugin
ImportPlugin
ExportPlugin
AnalysisPlugin
ConversionPlugin
Heuristic Net PNMLAris Graph format TPNNetMiner file Agna fileAris PPM Instances DOTComma Seperated Values …...
Log Filter
VisualisationEngine
XML Log
ResultFrame
Mining plug-in: Alpha algorithm
Mining plug-in: Genetic Miner
Mining plug-in: Multi-phase mining
Step 1: Get instances
A BD
CE A B
D
CF
A BD
CG
H
IB
D
CE
Step 2: Project
A BD
CG
1
H
I
E
1 2
2
2
1
11
1
11
11
1
1
1
1
ts 11
tf 11
A BD
CE1
1 1
1
1
1tf 1
1ts 11
A BD
CF1
1 1
1
1
1tf 1
1ts 11
Step 3: Aggregate
A BD
CG
3
H
I
E
3 4
4
4
2
11
1
22
11
1
1
1
1
ts 33
tf 32
F1
1
1
Step 4: Map onto EPCets
ts
B
C D
H I
G
etf
E F
eB
eH eI
eC eD
eG
eE eF
A
eA
Step 5: Map onto Petri net (or other language)
A B
C
D
E
F
G
H
I
ts
Mining plug-in: Social network miner
Cliques
SN based on hand-over of work metric
density of network is 0.225
SN based on working together (and ego network)
Analysis plug-in: LTL checker
Analysis plug-in: Conformance checker
Do they agree?
Fitness is not enough
Screenshot
(Also runs on Mac.)
Other analysis plug-ins
More demos?
Conclusion
• Process mining provides many interesting challenges for scientists, customers, users, managers, consultants, and tool developers.
• Involves multiple perspectives (process, data, resources, etc.)
• Get ProM-ed!• You can contribute by applying ProM and developing
plug-ins
processdesign
implementation/configuration
processenactment
diagnosis
More information
http://www.workflowcourse.com
http://www.workflowpatterns.com
http://www.processmining.org