process mining and predictive process monitoring

40
Process Mining and Predictive Process Monitoring Marlon Dumas [email protected] 1

Upload: marlon-dumas

Post on 08-Apr-2017

356 views

Category:

Education


4 download

TRANSCRIPT

1

Process Mining and Predictive Process

MonitoringMarlon Dumas

[email protected]

Business Process MonitoringDashboards & Reports

Process MiningEventstream

DB

2

Offline Process Mining

3

/

event log

discovered modelDiscovery

Conformance

Deviance

Differencediagnostics

Performance

input model

Enhanced modelevent log’

Offline Process Mining: The Apromore Approach

4

/

event log

discovered modelDiscovery

Conformance

Deviance

Differencediagnostics

Performance

input model

Enhanced modelevent log’

BPMN Miner

Log Delta

Analysis

Behavioral Alignment

All integrated into:http://apromore.org

Automated Process Discovery

5

CID Task Time Stamp …

13219 Enter Loan Application 2007-11-09 T 11:20:10 -

13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 -

13220 Enter Loan Application 2007-11-09 T 11:22:40 -

13219 Compute Installments 2007-11-09 T 11:22:45 -

13219 Notify Eligibility 2007-11-09 T 11:23:00 -

13219 Approve Simple Application 2007-11-09 T 11:24:30 -

13220 Compute Installements 2007-11-09 T 11:24:35 -

… … … …

Automated Process Discovery:Before BPMN Miner (Heuristics Miner)

Automated Process Discovery: BPMN Miner

Conformance Checking

8

Conformance Checking with Trace Alignment

Too low-level!

Difference statements

Event log

Input model

PESM

unfold

PESL

merge

Partially Synchronized Product (PSP)

compare

extract differences

Conformance Checking with Behavioral Alignment

Conformance Checking with Behavioral Alignment

Desired conformance output:• task C is optional in the log• the cycle including IGDF is not observed in the log

Log traces:ABCDEHACBDEHABCDFHACBDFHABDEHABDFH

L. Garcia-Banuelos, N.R. van Beest, M. Dumas, M. La Rosa, W. Mertens, Complete and Interpretable Conformance Checking of Business Processes, Technical Report, IEEE Transactions on Software Engineering, in press.

Given two logs, find the differences and root causes for variation or deviance between the two logs

Simple claims and quick Simple claims and slow

Deviance Mining

MODEL

S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013

Deviance Mining via Sequence Classification

• Apply discriminative sequence mining methods to extract features characteristic of one class

• Build classification models (e.g. decision trees)• Extract difference diagnostics from classification model

C. Sun et al. Mining explicit rules for software process evaluation. ICSSP’2013.

Difference statements

Event log

Input model

PESM

unfold

PESL

merge

Partially Synchronized Product (PSP)

compare

extract differences

Log Delta Analysis

Difference statements

Event log

Input model

PESM

unfold

PESL

merge

Partially Synchronized Product (PSP)

compare

extract differences

22

Difference statements

Event log

Input model

PESM

unfold

PESL

merge

Partially Synchronized Product (PSP)

compareextract

differences

N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405

Sequence classification vs. log delta analysis

L1 - Short stay448 cases

7329 events

L2 - Long stay363 cases

7496 events

Sequence classification 106-130 statements

IF |“NursingProgressNotes”| > 7.5 THEN L1IF |“Nursing Progress Notes”| ≤ 7.5 AND |“Nursing Assessment”| > 1.5 THEN L2…

Log delta analysis48 statements

In L1, “Nursing Primary Assessment” is repeated after “Medical Assign” and “Triage Request”, while in L2 it is not…

N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs. BPM 2015: 386-405

Apromore Process Analytics Platform (apromore.org)Open-source, highly scalable, SaaS BPM analytics platform

M. La Rosa, H. Reijers, W. van der Aalst, R. Dijkman, J. Mendling, M. Dumas, L. Garcia-Banuelos “APROMORE: an advanced process model repository”, EXP.SYS.APP. 2011

How likely is it that a running process will become “deviant”?

Will it end up in a negative

outcome?

Will it fail to meet its SLAs in the next 24

hours?

Will it generate abnormal

effort, costs or rework?

Beyond Deviance Mining:Predictive Process Monitoring

19

Deviance Mining and Predictive Monitoring

20

Debt repayment due Call the debtor Send a reminder Payment received

Predictive Monitoring Example: Debt Recovery Process

Debt repayment due Call the debtor Send a reminder Send a warning Call the debtor Call the debtor

Send to external debt collection agency

Call the debtorSend a reminder Send a warning Call the debtor Call the debtorCall the debtor

Call the debtor

Call the debtor

Call the debtor

Call the debtor Call the debtor

21

Predictive Monitoring Example: Debt Recovery Process

Event log

Classifier

/Outcome

Predictions

Attributes

Trac

es

Predictive Process Monitoring: General Approach

22

Event log

Regressor / structured predictor

Future “paths” prediction

Attributes

Trac

es

23

PredictorDecision tree

learning

Decision tree

Class estimation

Current trace[Data+] Prediction

Predictive Monitoring: Runtime Nearest-Neighbors Approach

Trace ProcessorkNN extraction

(string-edit distance)

Current trace[Event+]

Event log

Similar execution traces

Feature extraction

Labeled samples

Current trace[Data+]

F.M. Maggi, C. Di Francescomarino, M. Dumas, C. Ghidini. Predictive Monitoring of Business Processes. CAiSE'2014

24

• BPI Challenge 2011 dataset• Healthcare process at Dutch hospital• 1141 cases, avg length 14 events/case• Split normal-deviant via 5 predicates: φ1–φ5• Prediction made at:• Start event (initial event)• Early event (ca. ¼ of the trace)• Middle

Evaluation Setup

25

• Reasonably accurate at mid-point (AUC 0.78-0.88)

• High runtime overhead 5-10 secs / prediction

Evaluation Results

26

Predictive Process Monitoring: Cluster & Classify

Pre-processing

Historical execution

traces

Running trace

Runtime

Clustering ClustersControl

flow encoding

Encoded control

flow

CONTROL FLOW

Prefix extraction

Trace Prefixes

Predictive MonitoringControl

flow encoding

Data encoding

Cluster(s) identification

Classification

Prediction Problem

Prediction

Supervised Learning Classifiers

Data encoding

Encoded data

DATALabeling function

AUC of 0.6 to 0.85 with a lot of variation

27

Each technique has its own hyperparametersOther parameters:• Trace prefix size• Voting mechanism• Interval choice in case of interval time predictions

Predictive Process Monitoring: Cluster & Classify with Hyperparameter Optimization

• Four outcome labellings of a large real-life patient treatment dataset

Experimental Settings

Dataset preparation:• Training set (70%)• Validation set (20%)• Testing set (10%)

Identification of the most suitable configurations (among 160)

Evaluation of the identified

configurations (with the testing set)

• No unique best configuration.• Accuracy is consistently high and accuracy on testing set

consistent with the tuning.

Evaluation Results

Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi, Irene Teinemaa . Clustering-Based Predictive Process Monitoring. IEEE Transactions on Services Computing, 2017.

Computation Time!!!

31

• Idea: One classifier per index• Classifier for prefixes of length 1• Classifier for prefixes of length 2• Etc.

• Traces of length m encoded using an index-based schem

• At runtime, classify a trace of length m using the corresponding classifier

Index-Based Multi-Classifier

Anna Leontjeva, Raffaele Conforti, Chiara Di Francescomarino, Marlon Dumas, Fabrizio Maria Maggi: Complex Symbolic Sequence Encodings for Predictive Monitoring of Business Processes. Proc. Of BPM 2015, pp. 297-313.

32

• Same as before, but feature vector of a prefix extended with Log-Likelihood Ratio of being in the deviant or regular class according to a Hidden-Markov Model

Index-Based Multi-Classifier + HMM

Evaluation Setup

33

Evaluation Results

34

Predictive Monitoring with Unstructured Data

35

Text mining

36

Text-Extended Index-Based Encoding

37

• Bag-of-N-grams• Weighted bag-of-N-grams• Latent Dirichlet Allocation (LDA)• Paragraph Vector (PV)

Debt Recovery Lead-to-contract

# normal cases 13608 385# deviant cases 417 390Avg # words per doc 11 8# lemmas 11822 2588

Evaluation Setup

38

• Data split: 80% train, 20% test (randomly)• Handling imbalance: oversampling• Classifiers: random forest and logistic regression• Evaluation metrics: F-Score and earliness• Parameter-tuning: grid search with 5-fold cross validation on

training set

Evaluation Results

39

Ongoing workLSTM-Based Predictive Process Monitoring

40Niek Tax, Ilya Verenich, Marcello La Rosa, Marlon Dumas: Predictive Business Process Monitoring with LSTM Neural Networks. CoRR abs/1612.02130 (2016).

• Accurate, robust techniques to predict case outcome, covering control-flow, structured and textual data

• LSTM-based architecture to predict• Next task + timestamp + resource or other attributes• Remaining execution path and time

• All code available:• Clustering-based method: http://goo.gl/ykozBf• Index-based method: https://goo.gl/BQFk7k• Index-based method with textual features: https://

goo.gl/a2DoWT• LSTM-based method: https://goo.gl/mkQDyy

Online predictive process monitoring

41