Multi-phase process mining
Download
Report
Transcript Multi-phase process mining
Process Mining: An iterative algorithm using the
Theory of Regions
Kristian Bisgaard Lassen
Boudewijn van Dongen
Wil van der Aalst
http://www.processmining.org/
Overview
1. Introduction to Theory of Regions
2. Introduction to Process Mining
3. Applying Theory of Regions to Process Mining
4. Conclusion
http://www.processmining.org/
Theory of Regions (for Transition Systems)
A Region in a Transition System is a set of states, such that for all transitions in
the system holds that:
1) If that transition enters the region, then all equally labeled transitions enter the
region,
2) If that transition exists the region, then all equally labeled transitions exit the
region,
3) If that transition does not cross the region, then no equally labeled transition
crosses the region.
http://www.processmining.org/
Theory of Regions (for Transition Systems)
When all regions are found, a Petri net is built, where these regions correspond to
places in the net.
The resulting Petri net is such that its statespace is bisimilar to the transition system
that served as input.
http://www.processmining.org/
Process Mining: an overview
http://www.processmining.org/
Log Files
Information systems typically log all kinds of events. We use a XML format for
storing event logs. The basic assumption is that the log contains information about
specific tasks executed for specific process instances (cases, event-lists, audit
trails). Any knowledge of the underlying process is not assumed.
http://www.processmining.org/
Process Mining VS. Theory of Regions
Process Mining
Theory of Regions
-Event logs
-State-based models / (regular) languages
Big chunks of data, unable to fit in memory.
-Completeness unknown
Entire model needs to be present in memory.
-Complete information provided
Completeness of information is very
unlikely.
-Abstract representation required
Completeness of information is guaranteed by
the input model.
-Exact and compact representation required
Main conceptual difference
http://www.processmining.org/
Some existing Process Mining approaches
Event
logs
Abstraction
Ordering
relations
Partial order
Generation
Instance
graphs
Alpha-algorithm
Aggregation
Translation
Aggregation
graphs
Translation
Translation
Petri nets
Translation
http://www.processmining.org/
EPCs
The goal: Applying Theory of Regions in the context of PM
Event
logs
Abstraction
Ordering
relations
Partial order
Generation
Instance
graphs
Alpha-algorithm
Aggregation
Translation
Aggregation
graphs
Translation
Translation
Theory of Regions
Assume an event log is
A Transition System,
such that each trace
starts in a global state
Petri nets
Translation
http://www.processmining.org/
EPCs
Example Log
Log:
Transition systems
A,B,C,D
A,C,B,D
A,B,C,D
A,C,B,D
A,E,D
(W,-1)
(W,-1)
(W,-1)
(W,-1)
(W,-1)
A
A
A
A
A
(case1 ,0)
(case2 ,0)
(case3 ,0)
(case4 ,0)
(case5 ,0)
B
C
B
C
E
(case1 ,1)
(case2 ,1)
(case3 ,1)
(case4 ,1)
(case5 ,1)
C
B
C
B
D
(case1 ,2)
(case2 ,2)
(case3 ,2)
(case4 ,2)
(case5 ,2)
D
D
D
D
(case1 ,3)
(case2 ,3)
(case3 ,3)
(case4 ,3)
http://www.processmining.org/
Merging the initial state
(W,-1)
A
A
A
A
A
(case1 ,0)
(case2 ,0)
(case3 ,0)
(case4 ,0)
(case5 ,0)
B
C
B
C
E
(case1 ,1)
(case2 ,1)
(case3 ,1)
(case4 ,1)
(case5 ,1)
C
B
C
B
D
(case1 ,2)
(case2 ,2)
(case3 ,2)
(case4 ,2)
(case5 ,2)
D
D
D
D
(case1 ,3)
(case2 ,3)
(case3 ,3)
(case4 ,3)
http://www.processmining.org/
Identifying regions
(W,-1)
A
A
A
A
A
A
(case1 ,0)
(case2 ,0)
(case3 ,0)
(case4 ,0)
(case5 ,0)
B
C
B
C
E
(case1 ,1)
(case2 ,1)
(case3 ,1)
(case4 ,1)
(case5 ,1)
C
B
C
B
D
(case1 ,2)
(case2 ,2)
(case3 ,2)
(case4 ,2)
(case5 ,2)
D
D
D
D
(case1 ,3)
(case2 ,3)
(case3 ,3)
(case4 ,3)
B
E
D
http://www.processmining.org/
C
Making the algorithm iterative (i.e. linear in the log)
Trace 1
Trace 2
...
Trace n
TS 1
TS 2
...
TS n
Regions 1
Regions 2
...
Regions n
Regions
1,2
Regions
1,2,…,n
Petri net
http://www.processmining.org/
Future work, other approaches
Several other approaches are possible:
1) Constructing a transition system for the whole log in a smart way:
Rubin et al. propose 36 ways of doing so, but they require the entire
transition system to be build in memory. Their approach however can handle
“incomplete” information.
2) Considering the event log as a regular language and use language-based
regions as proposed by Darondeau et al. and Lorenz et al.
http://www.processmining.org/
Conclusions
Using our approach, the Theory of Regions can be applied in the context of process
mining, in such a way that the approach is linear in the number of cases in the log.
Downsides remain the completeness assumption and the resulting model, since this
is not an abstraction of the log, which is often required in process mining.
http://www.processmining.org/