complete and interpretable conformance checking of business processes

61
Complete and Interpretable Conformance Checking of Business Processes Luciano García-Bañuelos University of Tartu, Estonia Nick van Beest Data61 | CSIRO, Australia Marlon Dumas University of Tartu, Estonia Marcello La Rosa Queensland University of Technology, Australia Willem Mertens Queensland University of Technology, Australia

Upload: marlon-dumas

Post on 11-Jan-2017

200 views

Category:

Education


0 download

TRANSCRIPT

Achieving Intention-Centric BPM Through Automated Planning

Complete and Interpretable Conformance Checking of Business ProcessesLuciano Garca-BauelosUniversity of Tartu, EstoniaNick van BeestData61 | CSIRO, AustraliaMarlon Dumas University of Tartu, EstoniaMarcello La RosaQueensland University of Technology, AustraliaWillem MertensQueensland University of Technology, Australia

Conformance checkingCompliance auditingdetect deviations with respect to a normative model (unfitting behavior)Model maintenance unfitting behavioradditional model behaviorAutomated process model discoveryIterative model improvement

detect deviations in the process execution with respect to the behavior stipulated by a normative model

Model maintenance:The first situation suggests that the model needs to be extended to capture the unfitting behaviorThe second situation suggests that there are paths in the model that have become spurious overtime, meaning they are no longer used and need to be pruned if they are found to have lost relevance

Iterative improvement of the process model. First, an initial model is discovered using any discovery algorithm, then conformance checking is measured and based on this result the behavior of the model is refined to better align it to that of the log, e.g. it is restricted to remove paths in the model never observed, in order to avoid over-generalization, and extended to add paths observed in the log but not captured in the model 2

Given a process model M and an event log L,explain the differences between the process behavior observed in M and L

State of the artCurrent approaches:Are designed to identify the number and exact location of the differencesDont provide a high-level diagnosis that easily allows analysts to pinpoint differences:Are unable to identify differences across traces Are unable to fully characterize extra model behavior not present in the log

Related work

MEASURING UNFITTING BEHAVIORReplay fitness and ICS extension: replay a trace on the Petri net, to detect missing tokens that need to be added to a place to replay the trace, and remaining tokens that remain in the Petri net once the trace has been fully replayed. ICS extension trades accuracy for performanceVanden Bourcke et al. proposes a SESE decomposition to improve performance and offer more localized feedback

General limitation: error recovery is performed locally each time an error is detected, thus these methods may not identify the minimum number of differences (errors) to explain the unfitting behavior. This limitation is solved by:- trace alignment, which however still provides feedback at the level of individual traces, rather than behavioral relations between events.

MEASURING ADDITIONAL MODEL BEHAVIORNegative events (Negative Events Precision): additional negative events are added and if the model can reply any of these negative events when replaying a trace, then this is a case of extra model behavior. However this method is heuristic, and cannot guarantee that all extra behavior is detected. While there are improvements on performance (the method is exponential as it is exponentially large the number of possible negative events), these still do not guarantee 100% accuracy in detecting extra behaviorTrace automata (ETC Conformance): a prefix automaton is built from the log, such that each state in the automaton corresponds to a given trace prefix. Then for each state, by replaying the prefix on the model, a corresponding state in the model is detected, and if the set of enabled transitions in this stage includes events that cannot be taken at that state in the log, then this is detected as an escaping edge (i.e. a sink) in the automaton, pinpointing the beginning of extra model behavior and the reply is continued. Two limitations: i) unable to handle duplicate labels and silent transitions, and ii) assumes fully-fitting log. These limitations are solved by the:Alignment-based ETC Conformance: first, an optimal alignment is calculated, including silent moves on model. Then the model-projection of this alignment is used to compute the automaton, after which traces are replayed to detect escaping edges.

General limitation of trace automata: they cannot fully characterize the extra behavior in the model, but only pinpoint where this extra behavior takes place and with what task it starts.4

An exampleDesired conformance output:task C is optional in the logthe cycle including IGDF is not observed in the log

Log traces:ABCDEHACBDEHABCDFHACBDFHABDEHABDFH

The first statement characterizes the behavior observed in the log but not in the model: in the model, task C is compulsory, while in the log C is skippableThe second statement characterizes the behavior observed in the model but not in the log

Trace alignment would produce two optimal alignments:One between ABDEH of the log and ABCDHE of the model, the other between ABDFH of the log and ABCDFH of the model. From this one can infer that task C is optional in the log (move on log only).

1) However the number of misaligned traces is often very large, rendering this inference quite hard in practice. Visualizations, e.g. on top of Petri net, and at an aggregate level, can help, but fundamentally the problem is that trace alignment provides feedback at the level of individual traces, not at the level of behavioral relations observed in the log but not captured in the model.

2) Moreover, trace alignment would detect that there is escaping behavior starting with Request addition information at a trace prefix finishing with Notify rejection, but it will not identify that the extra behavior includes tasks IG and that IGDF is behavior that can be repeated in the model but not in the log. For example, task Assess application can be repeated in the model but not in the log.5

Our approachA method for business process conformance checkingthat: Identifies all differences between the behavior in the model and the behavior in the log

Describes each difference via a natural language statement

How does it work?

Input model in BPMN and event log in XES or MXML

BPMN converted to Petri nets using Remco, Chun and Marlons technique (Information and Software Technology)

7

How does it work?

The Petri net is unfolded using McMillans complete prefix unfolding technique

From the log, first a set of partially-ordered runs are extracted over a concurrency relation discovered from the log. Then these runs are prefix-merged into a Prime Event Structure (PES)

8

How does it work?

The PSP is a representation of a synchronized traversal of two input PESs, such that when a discrepancy is detected it is explicitly recorded and the traversal resumes from a suitable configuration in each of the two PESs.

If the discrepancy if of type unfitting log behavior this will be recorded by a node of the PSP, so we can enumerate all unfitting behavior.

To expose additional model behavior, we define a notion of coverage of the PES extracted from the model by the PES extracted from the log. The parts of the model PES not covered by the log PES are the additional model behavior.

9

How does it work?

Each discrepancy falls under one of a set of disjoint patterns. For each pattern, we have a verbalization of the difference.

10

Prime event structure (PES)A Prime Event Structure (PES) is a graph of events, where each event e represents the occurrence of a task in the modeled system (e.g. a business process)

As such, multiple occurrences of the same task are represented by different events

Pairs of events in a PES can have one of the following binary relations: Causality: event e is a prerequisite for e'Conflict: e and e' cannot occur in the same executionConcurrency: no order can be established between e and e'

An occurrence of a task in the business process, as represented by the model or the log

Note that conflict is inherited by causality. For example, if e # e and e