Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491
ICT, STREP
FERARI ICT-FP7-619491
Flexible Event pRocessing for big dAta
aRchItectures
Collaborative Project
D4.2
Goal driven model and methodology for specification of event processing applications
01.02.2015 – 31.01.2016(preparation period)
Contractual Date of Delivery: 31.01.2016
Actual Date of Delivery: 31.01.2016
Author(s): Fabiana Fournier and Inna Skarbovsky
Institution: IBM
Workpackage: Flexible Event Processing
Security: PU
Nature: R
Total number of pages: 42
D4.2 Goal driven model and methodology for specification of EP applications
Project coordinator name Michael Mock Revision: 1
Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
URL: http://www.iais.fraunhofer.de
Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way
for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling
business users to express complex analytics tasks through a high-level declarative language that
supports distributed complex event processing as an integral part of the system architecture.
In this report we provide a model and methodology to support this goal. The proposed approach
addresses both the functional and non-functional properties of event processing applications by
supporting non-technical users with a declarative language expressed by a set of diagrams and tables.
The outcome model can be then automatically translated into an event processing network and
eventually into a running application. Our methodology supports the model driven engineering
approach and encompasses the phases of constructing the computation independent model and its
translation into a platform independent model and from it into a platform specific model. In this report
we detailed the construction of the computation independent model and exemplified it with the mobile
phone use case we have in the project.
D4.2 Goal driven model and methodology for specification of EP applications
Revision History Administration Status
Project acronym: FERARI ID: ICT-FP7-619491
Document identifier: D4.2 Goal driven model and methodology for specification of event processing applications
(01.02.2015 – 31.01.2016)
Leading Partner: IBM
Report version: 1
Report preparation date: 31.01.2016
Classification: PU
Nature: REPORT
Author(s) and contributors: Fabiana Fournier and Inna Skarbovsky
Status: - Plan
- Draft
- Working
- Final
x Submitted
Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the
consortium and the European Commission.
www.ferari-project.eu
D4.2 Goal driven model and methodology for specification of EP applications
Document History Version Date Author Change Description 0.1 0.2
15/12/2015 1/1/2016
Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First draft Second draft including section 9
0.3 0.4 0.5
15/1/2016 17/1/2016 25/1/2016
Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)
First complete version Inclusion of abstract Updates per internal review
1.0 30/1/2016 Fabiana Fournier (IBM) Final fixes and cleanup
D4.2 Goal driven model and methodology for specification of EP applications
Table of Contents 1 Introduction .......................................................................................................................................... 1
1.1 Purpose and scope of the document ............................................................................................ 1
1.2 Relationship with other documents ............................................................................................. 1
2 Preliminaries ......................................................................................................................................... 1
2.1 Event Processing Network (EPN) .................................................................................................. 2
2.2 Pattern Matching Process ............................................................................................................. 2
2.3 Pattern Policies ............................................................................................................................. 3
2.4 Illustrative Example - The Mobile phone fraud use case .............................................................. 4
3 TEM in a nutshell ................................................................................................................................... 4
3.1 TEM and Concept Computing ....................................................................................................... 4
3.2 TEM Building Blocks ...................................................................................................................... 5
3.3 TEM Basic Terms ........................................................................................................................... 5
4 TEM Diagrams ....................................................................................................................................... 7
5 TEM Logic concepts ............................................................................................................................... 8
5.1 TEM Event Derivation Tables ...................................................................................................... 10
5.1.1 Event Derivation Tables Structure ...................................................................................... 11
5.1.2 Event Derivation Tables Conditions .................................................................................... 11
5.2 TEM Computation Tables ............................................................................................................ 13
5.2.1 Computation Tables Structure ............................................................................................ 14
5.3 TEM Policy Tables ....................................................................................................................... 14
5.3.1 Policy Tables Structure ........................................................................................................ 16
6 TEM Glossary concepts ....................................................................................................................... 16
6.1 TEM Concepts Lexicon Table ...................................................................................................... 17
6.2 TEM Fact Types Table ................................................................................................................. 19
6.3 TEM Actors Table ........................................................................................................................ 19
6.4 IT Elements Table ........................................................................................................................ 20
6.5 TEM Events View Table ............................................................................................................... 20
7 TEM Methodology .............................................................................................................................. 21
7.1 Lifecycle Overview and Methodology ......................................................................................... 22
D4.2 Goal driven model and methodology for specification of EP applications
7.2 Construct the Computational Independent Model .................................................................... 22
7.3 Transform to the Platform Independent Model ......................................................................... 23
7.4 Generate the code and create the Platform Specific Model ...................................................... 23
7.5 Operate the application and support modifications .................................................................. 24
8 Extending TEM to non-functional requirements ................................................................................ 26
8.1 Extending the TEM diagrams to cope with performance requirements .................................... 26
8.2 Event Derivation Tables for non-functional requirements ......................................................... 27
8.3 Extending the TEM methodology to include performance requirements .................................. 28
9 Summary and future steps .................................................................................................................. 29
10 References ...................................................................................................................................... 30
11 Appendix A – TEM Syntax ............................................................................................................... 31
List of Tables Table 1: Long call at night EDT .................................................................................................................... 10
Table 2: Frequent long calls at night EDT.................................................................................................... 10
Table 3: Frequent long calls EDT ................................................................................................................. 10
Table 4: Frequent each long call EDT .......................................................................................................... 10
Table 5: Expensive calls EDT ....................................................................................................................... 10
Table 6: Example of filter on pattern conditions ........................................................................................ 12
Table 7: call_start_dates<Frequent long calls at night> computation table .............................................. 13
Table 8: calls_count<Frequent long calls at night> computation table ...................................................... 13
Table 9: calls_length_sum<Frequent long calls> computation table ......................................................... 14
Table 10: calls_cost_sum<Expensive calls> computation table ................................................................. 14
Table 11: Frequent long calls at night policy table ..................................................................................... 15
Table 12: Frequent long calls policy table................................................................................................... 15
Table 13: Frequent each call policy table ................................................................................................... 15
Table 14: Expensive calls policy table ......................................................................................................... 15
Table 15: Lexicon table for the mobile phone fraud use case .................................................................... 18
Table 16: Fact type table for the mobile phone fraud use case ................................................................. 19
D4.2 Goal driven model and methodology for specification of EP applications
Table 17: Actors table for the mobile phone fraud use case ...................................................................... 20
Table 18: IT elements table for the mobile phone fraud use case ............................................................. 20
Table 19: Events view for the mobile phone fraud use case ...................................................................... 21
Table 20: New Expensive calls logic EDT ..................................................................................................... 25
Table 21: CDR Throughput Violation EDT ................................................................................................... 28
Table 22: Expensive calls Latency Violation ................................................................................................ 28
List of Figures Figure 1: Illustration of an event processing network .................................................................................. 2
Figure 2: Event recognition process in an EPA .............................................................................................. 3
Figure 3. TEM diagram icons ......................................................................................................................... 7
Figure 4. TEM diagram for the mobile phone fraud use case ....................................................................... 8
Figure 5. Structure of TEM logic concepts .................................................................................................... 8
Figure 6: Structure of TEM Glossary concepts ............................................................................................ 17
Figure 7: TEM Glossary concepts relationships .......................................................................................... 17
Figure 8: New TEM diagram for the mobile phone fraud use case ............................................................ 25
Figure 9: TEM diagram annotated with non-functional requirements ...................................................... 27
D4.2 Goal driven model and methodology for specification of EP applications
Acronyms CEP Complex Event Processing
CIM Computation Independent Model
EPA Event Processing Agent
EPN Event Processing Network
FERARI Flexible Event pRocessing for big dAta aRchItectures
PIM Platform Independent Model
PROTON IBM PROactive Technology Online
PSM Platform Specific Model
TDM The Decision Model
TEM The Event Model
WP Work Package
1
D4.2 Goal driven model and methodology for specification of EP applications
1 Introduction 1.1 Purpose and scope of the document The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way
for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling
business users to express complex analytics tasks through a high-level declarative language that
supports distributed complex event processing as an integral part of the system architecture. Work
package 4 (WP4) “Flexible Event Processing” deals with all the relevant tasks around event processing
technologies in order to achieve this goal. Specifically, Deliverable 4.2 (D4.2) “Goal driven model and
methodology for specification of event processing applications”, aims at providing a comprehensible
model along with a methodology for event processing applications adequate for business users.
This report presents The Event model (TEM) a new way to model, develop, validate, maintain, and
implement event-driven applications. TEM is based on a set of well-defined principles and building
blocks, and does not require substantial programming skills, thus making it suitable for business users
and the project goals. A methodology is also described as part of the report.
Note that we use complex event processing and event processing, as well as tool, engine and system,
interchangeable throughout this report.
This report is structured as follows: Section 2 briefs complex event processing basic terms required for
the understanding of this report. Section 3 gives an overall overview of the event model while these
concepts are elaborated through Sections 4 to 6. In section 7 we describe our model driven
methodology for creating event driven application. In Section 8 we extend the event model to cope with
non-functional requirements. We conclude the report with summary and future steps in Section 09.
1.2 Relationship with other documents FERARI stands for Flexible Event pRocessing for big dAta aRchItectures, therefore there is a tight
connection between event processing components and the rest of the components that form the
FERARI architecture, specifically, this deliverable is strongly related to D2.1 - Architecture definition in
WP2. The requirements for the event processing engine are dictated from the use cases in the project,
thus, this report is also strongly related to D1.1 - Application Scenario Description and Requirement
Analysis in WP1. In addition, WP4 interacts with WP5 which addresses algorithms for robust and flexible
stream processing, and therefore related to D5.2 - Algorithms for Robust Distributed Stream Monitoring
and Supporting Data Integrity.
2 Preliminaries Each complex event processing (CEP) engine uses its own terminology and semantics. We follow the
semantics presented in Etzion’s and Niblet’s book [4]. In our previous deliverable (D4.1 – Requirements
2
D4.2 Goal driven model and methodology for specification of EP applications
and state of the art overview on Flexible Event Processing) we covered the main constructs and terms in
complex event processing. For the sake of clarity of this report, we only briefly mention below again two
main concepts: Event processing Network (EPN) and pattern matching process.
2.1 Event Processing Network (EPN) An Event Processing Network (EPN) is a conceptual model, describing the event processing flow
execution. An EPN comprises a collection of event processing agents (EPAs), event producers, events
and consumers (Figure 1). The network describes the flow of events originating at event producers and
flowing through various event processing agents to eventually reach event consumers. For example, in
Figure 1, events from Producer 1 are processed by EPA 1. Events derived by EPA 1 are of interest to
Consumer 1 but are also processed by EPA 3 together with events derived from EPA 2.
Figure 1: Illustration of an event processing network
2.2 Pattern Matching Process An EPA performs three logical steps, a.k.a pattern matching process or event recognition (see Figure 2).
The filtering step, in which relevant events from the input events are selected for processing
according to the filter conditions. The output of this step is a set of participant events.
The matching step that takes all events that passed the filtering and looks for matches between
these events, using an event processing pattern or some other kind of matching criterion. The
output of this step is the matching set.
The derivation step that takes the output from the matching step and uses it to derive the
output events by applying derivation formulae.
Event Producer 1
Event Producer 2
Event Consumer 1
Event Consumer 2
EPA 1
EPA 3EPA 2
Event Processing Agent
Incoming/input events
Derived/output events
within context
filtering
matching
deriving
participant events
matching set
3
D4.2 Goal driven model and methodology for specification of EP applications
Figure 2: Event recognition process in an EPA
An event pattern is a template specifying one or more combinations of events. Given any collection of
events, if it’s possible to find one or more subsets of those events that match a particular pattern, it can
be said that such a subset satisfies the pattern. Some common examples of patterns:
Sequence, means that at least one instance of all participating event types must arrive in a
specified order for the pattern to be matched.
Count, means that the number of instances in the participant event set satisfies the pattern’s
number assertion.
All, means that at least one instance of all participating event types must arrive for the pattern
to be matched; the arrival order in this case is immaterial.
Trend, events need to satisfy a specific change (increasing or decreasing) over time of some
observed value; this refers to the value of a specific attribute or attributes.
Sum, means that the value of a specific attribute, summed up over all participant events,
satisfies the sum threshold assertion.
Average (AVG), means that the value of a specific attribute, averaged over all participant events,
satisfies the average threshold assertion.
Note that the first two steps are optional but a derivation must take place (even if it is merely copying
values from the input events to the derived/output event).
2.3 Pattern Policies A pattern policy (or simply policy) is a named parameter that disambiguates the semantics of the pattern
and the pattern matching process. Pattern policies fine-tune the way the pattern detection process
works. We distinguish among four types of pattern policies:
Evaluation policy – when the matching sets are produced? The EPA can either generate output
incrementally (in this case the evaluation policy is called Immediate) or at the end of the temporal
context (called Deferred).
Cardinality policy – how many matching sets are produced within a single context partition? Cardinality
policy helps limiting the number of matching sets generated, and thus the number of derived events
produced. The policy type can be single, meaning only one matching set is generated; or unrestricted,
meaning there are no restrictions on the number of matching sets generated.
Repeated/Instance Selection type policy – what happens if the matching step encounters multiple
events of the same type? The override repeated policy means that whenever a new event instance is
encountered and the participant set already contains the required number of instances of that type, the
new instance replaces the oldest previous instance of that type. The every repeated policy means that
4
D4.2 Goal driven model and methodology for specification of EP applications
every instance is kept, meaning all possible matching sets can be produced. First means that every
instance is kept, but only the earliest instance of each type is used for matching. Last is the same as first,
but the latest instance of each type is used for matching.
Consumption policy – what happens to a particular event after it has been included in the matching set?
Possible consumption policies are consume, meaning each event instance can be used in only one
matching set; and reuse, meaning an event instance can participate in an unrestricted number of
matching sets.
Policy relevance can be dictated by the event pattern. For example, the evaluation policy for an absence
pattern is always deferred (as we are testing the existence of an event instance for a specified temporal
context). Also, not all possible policies combinations are meaningful. For example, the choice of
consumption policy is irrelevant if the cardinality policy is single, because this means that the matching
step runs only once.
2.4 Illustrative Example - The Mobile phone fraud use case We illustrate the model throughout this report using the mobile phone fraud use case previously
analyzed and implemented in the scope of D4.1. The goal in is to identify users, who use a network
service without the intention to pay for that use. Many fraud mining systems in telecommunications use
some form of rules, often defined by fraud experts or automatically by some software, to raise alarms.
These alarms are checked by fraud investigators on a case-by-case basis. It is their duty to decide
whether a suspicious behavior is fraudulent or legal. This depends on the current call, the call history,
the customer history and the subscription plan of the customer.
We remind that in this specific scenario we are seeking to fire alerts in the following situations:
A long call to premium distance is made during night hours (LongCallAtNight).
As before, but this time we are looking for at least three of these “long distance calls” at night
per calling number (FrequentLongCallsAtNight).
Multiple long distance calls per calling number that last more than a certain threshold value
(FrequentLongCalls).
Same as before, but each occurrence cost exceeds the threshold (FrequentEachLongCall)
We are looking for high usage of a line for long distance calls (Expensivecall).
3 TEM in a nutshell 3.1 TEM and Concept Computing As aforementioned in the introductory section, TEM enables to model, develop, validate, maintain, and
implement event-driven applications. TEM is based on a set of well-defined principles and building
blocks, and does not require substantial programming skills, therefore making it appropriate for
business users and FERARI’s mission. In the core of TEM there is the event derivation logic expressed
5
D4.2 Goal driven model and methodology for specification of EP applications
through a collection of related normalized tables that can be transformed into code generation. This
idea has already been successfully proven in the domain of business rules by The Decision Model
(TDM)[7]. The Decision Model groups the rules into natural logical groups to create a structure that
makes the model relatively simple to understand, communicate, and manage.
The Event Model follows the Model Driven Engineering approach [1][2] and can be classified as a CIM
(Computation Independent Model), providing independence in the physical data representation, and
omitting details which are obvious to the designer. This model can be directly translated to an
execution model (PSM – Platform Specific Model in the Model Driven Architecture terminology) through
an intermediate generic representation (PIM – Platform Independent Model) as described in Section 7.
TEM also follows the paradigm of concept computing1, according to which all model artifacts are
concepts. A concept is a meaningful term within the user’s domain of discourse. The model consists of
concepts and semantic relationships among them. These concepts are based on the user’s cognitive
terms, and are independent of the IT terms or specific implementation. The vision is to strive for
automatic transformation along with the model-driven engineering. The vision is to have a concept-
oriented model and transform it in a mostly automated fashion to create the execution model. While
the concept computing vision aims at simplification, the model still needs to be expressive enough to
allow this automatic transformation.
3.2 TEM Building Blocks TEM is composed of the following five building blocks, described in detail and illustrated in the
subsequent sections.
TEM Concepts: As stated before, TEM follows the concept computing paradigm according to which
anything is defined as a concept. A TEM concept can be either a glossary concept or a logic concept. A
glossary concept is a term in the specific domain which has a meaning. Some of the concepts denote
computational entities, and a logic concept is a description of how such a computational entity is
computed. A TEM model consists of a collection of concepts of various types and the relations among
them. Relations between two concepts are defined only once.
TEM Glossary: The knowledge model that stores all glossary concepts of a specific application.
TEM Diagrams: The set of diagrams that describes the event causality dependencies (and hence the
event flow) in the event-driven application.
TEM Logic: The knowledge model that describes all logic concepts of a specific application. The
knowledge model is represented as a collection of tables.
3.3 TEM Basic Terms In this section we introduce some of the basic terms used in TEM.
1 http://www.slideshare.net/Mills/understanding-concept- computing
6
D4.2 Goal driven model and methodology for specification of EP applications
Fact Type: A TEM glossary concept type that denotes a named type of a piece of data atomic to the
scope. This is analogous to the attribute of entity or event in most data models.
Fact: A specific instance of a fact type, contained in a specific entity or event.
Event: Something that happened or is thought to have happened in the real world. Examples are: a
temperature sensor is read or a piece of luggage is lost.
Event instance: The computerized entity that denotes a specific instance of an event type. Examples:
temperature of sensor 123 is 40c; luggage lost with tag ID z. The term “event” in common use is also
used as a synonym for “event instance”.
Event Type: A TEM glossary concept type that denotes a set of event instances sharing the same
meaning and structure (associated data). An event type is a container of fact types, and consequently an
event is a container of facts. Event type examples: temperature read; luggage lost.
Actor: A TEM glossary concept type that denotes anyone or anything that plays a role in an event
processing system.
Raw event: An event originating from an external actor. In this case the actor’s role is defined as event
producer.
Derived event: An event that is generated by applying a function on event(s) instances over time. A
derived event is an event whose instances are created by applying a logic concept.
Derivation: The specification of the logic applied to generate a derived event.
Situation: An event of interest that may require a course of action. A Situation is a derived event emitted
outside the event processing system and consumed by an actor of type consumer.
Context: A named specification of conditions to partition the event occurrences so these partitions can
be processed separately.
Partition by: Context partition criterion based on the values of one or more Fact Types contained in
event(s), also known as segmentation context.
When?: Context partition criterion based on the instance time of events, also known as temporal
context.
Conditions: Expressions executed against event instances.
Implementation independent: A model free of references to technical concerns; this means it can be
implemented in any technology that supports TEM principles. The Event Model (TEM) is an
implementation independent model, understandable by business and technical audiences, to depict the
logic of detecting and deriving situations of interest from a stream of open-ended event instances.
7
D4.2 Goal driven model and methodology for specification of EP applications
4 TEM Diagrams One way to simplify a model is to apply a top-down methodology that provides a high level view and
understanding of the system in hand.
The Event Model diagram is a simple drawing that illustrates the structure of the logic by showing a
situation along with the flow direction of derivations in a top-down manner. At the top of the diagram
there is a goal which is the situation that is required to be derived. This goal is connected with the raw
and derived events that are identified as participants in the situation derivation. This is done in a
recursive way until raw events or facts are encountered as depicted in Figure 4 for our mobile phone
fraud use case example.
TEM diagram employ nine icons that express all the relevant terms (see Figure 3)
Figure 3. TEM diagram icons
For each situation in TEM, there is a corresponding TEM diagram.
Each node in the diagram, except of producers and consumers, is composed of blocks represented as
rectangle shapes and, separated by a black thick line. Each node has a 1:1 mapping to a corresponding
Event Derivation Table (EDT) artifact. EDTs are explained in Section 5.1. The rectangle in the background
of each block represents the context for the block. The contexts can be collapsed or expanded, as in the
case of the Frequent each long call derived event. Solid lines describe events transitions inside the
event-driven system. Dotted lines specify event flows to and from the event-driven system (see Figure 4).
Figure 4 depicts the TEM diagrams for our mobile phone fraud use case example. The situations to be
derived address potential cases of mobile phone fraud, which require alert notifications and human
intervention. As described in Section 2.4, we would like to emit five situations: LongCallAtNight,
FrequentLongCallsAtNight, FrequentLongCalls, FrequentEachLongCall, and Expensivecall. We have one
consumer of the situations (Operator, who gets the system alerts) and one producer, CDR System that
sends records corresponding to calls from mobile phones. The Context part of the Frequent each long
call derived event is expanded in the diagram to show a temporal context temporal window of one day.
Situation
Fact
Consumer
Producer
Partition by
When?
Raw event
Detected derived event
Derived event
8
D4.2 Goal driven model and methodology for specification of EP applications
We partition the events according to the Calling_number ID domain fact type (for the definition and role
of domain fact types, refer to Section 6), since we are looking for attempts of fraud per a calling number.
Figure 4. TEM diagram for the mobile phone fraud use case
The diagrams serve as a major design tool that provides a top down view. All blocks that describe
situations or derived events require the definition of logic concepts as described next.
5 TEM Logic concepts Logic concepts are descriptions of concepts that are computed by the application. The Event Model
Logic consists of three logic concept types (Figure 5) which are represented as tables.
Figure 5. Structure of TEM logic concepts
Event Derivation: A single logic artifact (represented as a table) for each derived event. Each derivation
table name is composed of the Derived event type + “Logic” as suffix. The derived event mentioned in
CDR system
Operator
Long call at night
CDR
Frequent long calls at night
Long call at night
Expensive calls
CDR
Frequent long calls
CDR
Frequent each long call
CDR
dayCalling
number
Logic Concept
Event Derivation Logic
Computation Logic
Policy logic
9
D4.2 Goal driven model and methodology for specification of EP applications
the name is associated with the table in the sense that the table specifies the conditions for generations
of new instances of this event type.
Computation: A single logic artifact (represented as a table) for each computed fact type in a derived
event. It specifies the computation of assignments of the values of a fact type (attribute) associated with
a derived event. Each computation table name is composed of the Derived fact type + “Computation”
as suffix. The derived fact type mentioned in the name is associated with the table in the sense that the
table specifies the value assignment for this fact type. Note that if the value of a derived fact type can
be implicitly inferred, then the computation table for this derived fact type can be omitted (see
Section 5.2).
Policy: A single logic artifact (represented as a table) for each derived event. It specifies the fine tuning
semantics of the derivations. Each policy table is composed of the Derived event type + “Policy” as suffix.
The derived event type mentioned in the name is associated with the table in the sense that the table
specifies the policy assignments for this event type. Note that TEM uses default policy values for
derived events. Whenever the default policies hold, then the corresponding policy tables can be omitted
(see Section 5.3).
Although the names of concepts in TEM can be determined freely by the system designer, we use some
naming conventions in the logic tables for the sake of clarity. For example, domain fact types (see
Section 6) as well as event types and actors start with a capital letter; fact types start with a lowercase
letter. Non unique fact types names are denoted fact type<Event type> or fact type <Actor>. Fact types
of sub-type lists are denoted by {fact type}. Patterns that are operators are denoted OPERATOR(fact
type) or OPERATOR(Event type), see for example COUNT(CDR) or SUM(conversation_duration<CDR>) in
Table 3 below. Operators’ operands are denoted operand.Operator, e.g., countSum.SUM and
count.COUNT in the Computation tables shown in Section 5.2.
We also underline event types in condition columns that have an Event Derivation Table of their own
(hyperlinks), to stress the fact that these events are themselves derived from another piece of logic, and
enabling to follow paths of inference by pressing these links (see for example Long call at night in Table
2).
As aforementioned, TEM models all artifacts as concepts within the user’s domain of discourse.
Furthermore, the model uses keywords and syntax close to the “business world”, and independent of
any specific event processing language. For example, to define a temporal context that spans a week, we
just use the weekly keyword; every N time units expresses overlapping sliding time windows; and
MIN(fact type) selects the minimal value of the values in the matching set. For a complete dictionary of
TEM terms refer to Appendix A.
The model consists of concepts and semantic relationships among them. These concepts are based on
the user’s cognitive terms, and are independent of the IT terms or specific implementation.
10
D4.2 Goal driven model and methodology for specification of EP applications
5.1 TEM Event Derivation Tables An Event Derivation Table (EDT) is a two-dimensional representation of logic leading to a derived event,
based on events and facts. Thus, an EDT designates the circumstances under which a derived event of
interest is reached. In our mobile phone fraud scenario there are five EDTs shown in Tables 1-5.
Table 1: Long call at night EDT
Table 2: Frequent long calls at night EDT
Table 3: Frequent long calls EDT
Table 4: Frequent each long call EDT
Table 5: Expensive calls EDT
1always member
of
premium
services
= 1 is
between
19:00,
7:00
> 40
other_party_tel_nu
mber <CDR>
call_direction
<CDR>
call_start_date
<CDR>
conversation_dura
tion <CDR>
Long call at night Logic
Row #When
Expression
When
Start
When
End
Partition by Pattern Filter on patternFilter on event
1for every
day
> 2
Frequent long calls at night Logic
Row #When
Expression
When
Start
When
End
Partition by Filter on event Pattern Filter on pattern
Calling number COUNT(Long call at
night)
same
1for every
day
member
of
premium
services
= 1 > 60 > 9
Frequent long calls Logic
FrequentLongCallsAtNight Pattern
COUNT(CDR)
Filter on event
call_direction
<CDR>
Filter on pattern
Calling number other_party_tel_numb
er <CDR>
SUM(conversation_
duration<CDR>)
Row #When
Expression
When
Start
When
End
Partition by
same
1for every
day
member
of
premium
services
= 1 > 9
Frequent each long call LogicWhen
Start
When
End
Partition by Filter on pattern
Calling number other_party_tel_numb
er <CDR>
COUNT(CDR)
Filter on event
same
call_direction
<CDR>
Pattern Row #
When
Expression
1every 6
hours
first CDR member
of
premium
services
= 1 > 100same
Expensive calls Logic
Row #When
Expression
When
Start
When
End
Partition by Pattern Filter on event Filter on pattern
Calling number other_party_tel_nu
mber <CDR>
call_direction
<CDR>
SUM(total_call_charge
_amount<CDR>)
11
D4.2 Goal driven model and methodology for specification of EP applications
Note the following:
Data has been anonymized in a way that the business logic has been preserved due to privacy
issues.
In addition, due to privacy issues, the values chosen for specific variables and thresholds
selected are not the correct ones. This does not alter the logic of the rules, just the assignment
of the different variables and thresholds values.
“Premium services” is a closed list of potential geographical distant locations/destinations for
which the rules are relevant (e.g., “Maldives”).
In this use case night hours are considered between 19:00 and 7:00, and 24 hours are
considered from 24:00 to 24:00 the day after.
We are only are interested in outgoing calls (incoming calls are not relevant to fraud detection),
indicated whenever the call_direction field equals 1.
5.1.1 Event Derivation Tables Structure
The first row in an EDT indicates its name. The EDT name is the derived event name + “Logic”. For
example: Long call at night Logic in Table 1. The table consists of two parts, context and conditions,
separated by a red vertical line. The context part consists of two logical sections. The temporal context,
represented by When expression, When start, and When end columns; and the segmentation context
represented by the Partition by column.
For example, Table 2 describes a non-overlapping sliding fixed interval temporal context [4] of 24 hours
length (a day) and a segmentation context that partitions the events by Calling number domain (refer to
Section 6 for definition of domain fact types and to Appendix A for TEM syntax and keywords).
An event derivation consists of a collection of conditions that issue a disjunctive normal form, namely all
conditions in a single row has conjunction relationship among them, while the relationships among
multiple rows is a disjunction. In other words, in any EDT, the relationship between any two rows is a
disjunction. The relations among conditions within a single row, both context conditions (When,
Partition by) and the derivation conditions (filter on event, pattern, filter on pattern), have conjunction
semantics.
In addition, the relationships between the set of derived events in a TEM model and the set of EDTs in
the same model is a bijection (one to one mapping both ways), thus the cardinality of the EDTs set is
identical to the cardinality of the derived events set. This principle, in essence, forces the designer to
unify the different cases that bring to the derivation of a single event into a single table. Note that in
current event processing models there is no restriction on the cardinality of the number of logic artifacts,
since an event can be derived in unbounded multiple ways.
5.1.2 Event Derivation Tables Conditions
The conditions part consists of three types of conditions: filter conditions, pattern conditions, and filter
on pattern conditions. Each condition type is composed of a set of conditions, whereas each condition is
12
D4.2 Goal driven model and methodology for specification of EP applications
composed of a header (fact types, event types, or functions/operators/patterns) along with two
columns (predicate and object). For TEM syntax refer to Appendix A.
As previously stated, the conditions are interrelated as a disjunctive normal form; conjunction within
the same row and disjunction among rows. The column head can be: event type, fact type, or an
operator (function) over event types and fact types. Each condition cell in an EDT is divided into two: the
predicate and the object. Possible keywords for EDT conditions are specified in Appendix A.
The conditions are logically applied in the following order:
Filter conditions are expressions evaluated against the content of a single event instance. The role of
filter conditions is to determine whether an event instance satisfies the filtering condition and should
participate in the derivation. For example, the Filter on event columns in tables 2-5 describe one
condition on the fact type other party tel number and one condition on the fact type call_direction, both
fact types belong to the CDR input event. The value of the other_party_tel_number must be any value in
the premium services set and call_direction must equal to 1, i.e., an outgoing call. Filter conditions relate
to the Filtering step in Figure 2.
Pattern conditions are expressions on the participant events (input events that passed the filtering
conditions) such as Detected, Absent, Thresholds over Aggregations, or Fact Type value changes. The
role of pattern conditions is to detect the specified relationships among participant event instances. For
example, in Table 2, the Pattern condition describes a COUNT pattern over at least 2 occurrences of
event type Long call at night, which means that we emit a derived event whenever we have at least two
instances of the input event that passes the filter condition within the specified context (for each day
and for each calling number). Filter conditions relate to the Matching step in Figure 2.
Filter on pattern conditions are expressions on the matching set (i.e., the events that satisfied the
pattern conditions), including comparisons, memberships, and time-relationships among the event
instances in the matching set. The role of the filter on pattern conditions is to filter the pattern result
based on conditions among the different events that issue this pattern. Filter conditions relate to the
Derivation step in Figure 2.
In order to illustrate the Filter on pattern condition, let’s assume the following in our mobile phone
fraud use case: we want to fire an alert for “Expensive calls” only if we have at least one instance of the
value “Maldives” in the other_party_tel_number fact type in the matching set. Table 6 specifies the
corresponding EDT in this case for Expensive calls.
Table 6: Example of filter on pattern conditions
1
every 6
hours
first CDR member
of
premium
services
= 1 > 100 contains Maldives
other_party_tel_num
ber <CDR>
same
Calling number other_party_tel_nu
mber <CDR>
call_direction
<CDR>
SUM(total_call_charge
_amount<CDR>)
Expensive calls Logic
Row #When
Expression
When
Start
When
End
Partition by Filter on event Pattern Filter on pattern
13
D4.2 Goal driven model and methodology for specification of EP applications
5.2 TEM Computation Tables A derived event, like any event, is a container that contains facts (attributes) which are instances of the
fact types contained in the derived event’s event type. Part of the derivation is the assignment of values
to these facts. Some of the computed facts are merely copy of values from the input events to the
derived event. In TEM, we only specify the computation details for fact types’ values which are not
copied while we omit those that are copied (these are implicitly assigned by TEM compiler).
A Computation Table is a two-dimensional representation of logic leading to a computed fact type that
needs to be explicitly specified. Table 7- Table 10 specify the computation tables for our use case.
For example, Frequent long calls at night derived event has four fact types (see Events view Table 19):
calling_number, call_direction, call_start_dates, and calls_count. The first two fact values are copied
from the input event CDR, as all Long call at night input events have the value “1” in the call_direction
fact and they are partitioned by calling_number, therefore the matching set shares the same value in
this fact type (see the corresponding EDTs in Table 1 and Table 2). Accordingly we only compute two
fact types for this derived event as specified in Table 7 and Table 8. The call_start_dates fact type is the
list of all call_start_date fact types’ values of the CDR input events that result in the matching set (Table
7). We denote this by {fact type<Event type>}. The calls_count fact type holds the result of the COUNT
pattern matching denoted by count.COUNT,namely the number of events in the matching set.
In the same way we compute the call_start_dates values for the other three derived event containing
the fact type call_start_dates (Frequent long calls, Frequent each long call, and Expensive calls), and
calls_count (Frequent long calls and Frequent each long call), therefore we omit these computation
tables from the report as they are identical to Table 7 and Table 8 respectively.
Table 7: call_start_dates<Frequent long calls at night> computation table
Table 8: calls_count<Frequent long calls at night> computation table
Similarly, Table 9 specifies the calculation of the fact type calls_length_sum value which equals to the
countSum variable value of the SUM function (the actual summarization of the conversation_duration of
the calls in the matching set, see the corresponding EDT in Table 3). The calls_cost_sum fact type value
in Table 10, also receives the value of the SUM function, meaning it holds the total cost of the calls in
the matching set (see EDT in Table 5).
Row # Row in Event
derivation Table
1 {call_start_date<CDR>} 1
call_start_dates<Frequent long calls at night> Computation
Row # Row in Event
derivation Table
1 count.COUNT 1
calls_count<Frequent long calls at night> Computation
14
D4.2 Goal driven model and methodology for specification of EP applications
Table 9: calls_length_sum<Frequent long calls> computation table
Table 10: calls_cost_sum<Expensive calls> computation table
5.2.1 Computation Tables Structure
The first row in a computation table indicates the fact type name + “Computation”. For example, Table 7
is a computation table that describes the logic to compute the call_start_dates fact type associated with
the Frequent long calls at night derived event.
The second row is the headings row. The third row and on, include the row number, the expression
value of the computed fact type, and a reference to the row number in the corresponding EDT.
Following the call_start_dates example, in all derivations the value of this fact type equals the set
{call_start_date}, that is, the values of the call_start_date fact types of the events in the matching set. In
all these cases the relevant derived event is in the first row in the EDT and therefore the value “1” in the
“Row in event derivation table” column.
5.3 TEM Policy Tables As explained before, policies are used to fine tune the semantic of derivations (see Section 2.3). In TEM,
we defined default policies per patterns and only specify the policy details (tables) for those policies
which are not the default values implicitly assigned by the TEM compiler during the transformation of
the computation independent model to the platform independent model (see Section 7). TEM supports
implicitly the following default policy values (for patterns in TEM refer to Appendix A):
The evaluation policy default: The evaluation of the following patterns is immediate, that is, done when
the derivation conditions are satisfied: is DETECTED (including the AND and OR implicit patterns),
OCCURS BEFORE, OCURRS AFTER, and OCCURS AT THE SAME TIME. The evaluation for the rest of the
patterns is deferred.
The cardinality policy default: The single policy. That is, for each context partition there is at most one
derivation when the temporal context is restricted (not always), and unrestricted number of derivations
when the temporal context is always. Note that according to this default policy the policy table for the
Long call at night derived event can be omitted as the temporal window is always.
The repeated type policy default: The override policy. If there are multiple events of the same event
type, the newest one overrides the previous one.
Row # Row in Event
derivation Table
1 countSum.SUM 1
calls_length_sum<Frequent long calls> Computation
Row # Row in Event
derivation Table
1 countSum.SUM 1
calls_cost_sum<Expensive calls> Computation
15
D4.2 Goal driven model and methodology for specification of EP applications
The consumption policy default: The consume policy. As the cardinality policy default is single, the
consumption policy is relevant only to the case of always temporal context which may have multiple
derivations. In this case, an event in the matching set is consumed, and cannot be used again.
A Policy Table is a two-dimensional representation of logic leading to a policy that needs to be explicitly
specified. Table 11 - Table 14 show the policy tables for our use case, as in these situations, the policies
values differ from the default ones implicitly assigned. Note that we explicitly assign values only to the
policies which differ from the default values.
Table 11: Frequent long calls at night policy table
Table 12: Frequent long calls policy table
Table 13: Frequent each call policy table
Table 14: Expensive calls policy table
For example, for the Frequent long calls at night, we derive a new event every time (evaluation policy =
immediate) the pattern assertion is satisfied (i.e., COUNT(Long call at night) >2) while each event in the
matching set is used multiple times (consumption policy = reuse). In simple words, as soon as the
number of Long call at night input events is larger than 2, we start deriving a new situation until the
temporal context is closed, whereas the matching set includes at each time the new Long call at night
input event. For the other situations, we derive only one derive event (consumption policy = single as in
default) but as soon as the pattern conditions are satisfied (evaluation policy = immediate).
Row # Evaluation Cardinality Repeated ConsumptionRow in Event
derivation Table
1 immediate unrestricted reuse 1
Frequent long calls at night Policy
Row # Evaluation Cardinality Repeated ConsumptionRow in Event
derivation Table
1 immediate 1
Frequent long calls Policy
Row # Evaluation Cardinality Repeated ConsumptionRow in Event
derivation Table
1 immediate 1
Frequent each long call Policy
Row # Evaluation Cardinality Repeated ConsumptionRow in Event
derivation Table
1 immediate 1
Expensive calls Policy
16
D4.2 Goal driven model and methodology for specification of EP applications
5.3.1 Policy Tables Structure
The first row in a policy table indicates the event type name + “Policy”. For example, Table 14 is a policy
table that describes the logic to assign the pattern policies for the Expensive calls derived event.
The second row is the headings row. The third row and on, include the row number, and the policy
assignments for the row number in the corresponding EDT as specified in the last column. In the
Expensive calls EDT there is only one row, therefore the row in the EDT in the policy table shows “1”.
Note that from table normalization point of view, we could add the policy table columns to the EDT as a
third part in the table (separated by a red vertical line as in the case for context and conditions). The
reason for having a separate logic artifact for the policies is twofold: first, we believe that this simplifies
the tables’ structure. Assuming that the default policy values hold in some of the cases, there is no need
of having columns that remain empty many times. Second, policy fine tuning is sometimes done after
the main logic of the application is defined by the business user, therefore we can leave the design of
the policy tables for a later phase.
While the logic artifacts may be defined first, the glossary concepts eventually need to be completed.
The next Section discusses the glossary concepts.
6 TEM Glossary concepts As noted, TEM concepts are partitioned into TEM glossary concepts and TEM logic concepts. In this
Section we show the hierarchy of glossary concepts and provide the Glossary tables for our mobile
phone fraud example.
As depicted in Figure 5, there are four concepts in The Event Model Glossary (Glossary for simplicity):
Event, Fact, Actor, and IT element. Each concept has a type. For example, an Event can be either of type
Raw or type Derived. A concept may be further classified into a sub type. For example, a Concrete fact
type can be of sub type Regular. Domain Fact Types serve as abstract fact types to enable segmentation
contexts in the EDTs (e.g., Calling number). A concrete fact type may be mapped to a domain.
17
D4.2 Goal driven model and methodology for specification of EP applications
Figure 6: Structure of TEM Glossary concepts
The Glossary is composed of the following artifacts represented as tables and detailed in the subsequent
subsections:
Concepts Lexicon Table: one entry for each concept (both glossary concepts and logic concepts).
Fact Types Table: one entry for each concrete Fact Type.
Actors Table: one entry for each combination of Actor-Role-Event. An actor is identified as a noun in a
natural language sentence. It can denote a person or a computerized artifact in the modelled domain.
IT Elements Table: one entry for each of the IT Elements. An IT element represents the connection to the
physical world of implementation and provides the pointer to the actual IT element it represents.
Figure 7 shows the relationships among the Glossary concepts.
Figure 7: TEM Glossary concepts relationships
6.1 TEM Concepts Lexicon Table Each entry in the Concepts Lexicon table describes a concept along with its type, sub type, description,
and a reference to its IT element. Table 15 describes the concepts in our use case. For example, CDR is a
Glossary Concept
Fact Type
Domain Concrete
Regular
List
Composite
Constant
Actor Type
Static
Event Type
Raw
Derived
IT Element Type
Software
Module
API
App
Hardware
Sensor
Actuator
Data
File
Database
Event
Global Variable
1
1
1 1
1
1
1
Fact Event
ActorIT Element
Role
1
NN
N
1
18
D4.2 Goal driven model and methodology for specification of EP applications
raw event, whereas Long call at night is a derived event. Calling number is a domain fact type. The CDR
event refers to the CDR IT element, which is of sub-type Data. More details on this IT element, such as,
its URI, can be obtained from the table for IT Elements (Table 18), under the corresponding element
name).
Table 15: Lexicon table for the mobile phone fraud use case
Concept name Concept TypeConcept Sub type
Description IT Element Reference
CDR Event Raw Call detail records CDR IT element
Long call at night Event Derived Check for “long” calls (defined as more than 40 min) to premium
locations during night hours (limited from 19:00 to 7:00)
Long call at night IT element
Frequent long calls at night Event Derived Same as Long call at night , but we are seeking for at least 3 calls
made to premium locations during night hours lasting longer
than “40 minutes” per a calling number
Frequent long calls at night IT
element
Frequent long calls Event Derived A situation resulting from at least 10 calls made to a premium
location summing up at least 60 min length in a day
Frequent long calls IT element
Frequent each long call Event Derived A situation resulting from at least 10 long (last at least 60 min
each) calls made to a premium location in a day
Frequent each long call IT
element
Expensive calls Event Derived A situation in which calls dialed to premium locations sum up
more than a pre-defined cost (e.g. 100 HRK) per calling number
Expensive calls IT element
object_id Fact Concrete Sequential number as maintianed by internal system
billed_msisdn Fact Concrete The msisdn to be charged. MSISDN is a number uniquely
identifying a subscription in a GSM or a UMTS mobile network.
Simply put, it is the telephone number to the SIM card in a
mobile/cellular.
call_start_date Fact Concrete Date and time for the call
calling_number Fact Concrete The msisdn calling
called_number Fact Concrete The msisdn that is called
other_party_tel_number Fact Concrete Telephone number called
call_direction Fact Concrete "1" for Outgoing and "0" for Inbound
tap_related Fact Concrete Roaming data
conversation_duration Fact Concrete Length of call
total_call_charge_amount Fact Concrete Charge amount for the call
call_start_dates Fact Concrete The call start dates of the events in the matching set
calls_count Fact Concrete Number of events in the matching set
calls_cost_sum Fact Concrete Total cost of calls of events in the matching set
calls_length_sum Fact Concrete Total duration of calls in the matching set
Calling number Fact Domain Universal name for calling number
CDR system Actor Static System that produces the CDR raw events CDR system IT element
Operator Actor Static Manual inspection of potential fraud mobile numbers Operator IT element
CDR IT element IT ElementData Reference to the producer of the CDR raw events
Long call at night IT element IT ElementData Reference to the consumer of the alert
Frequent long calls at night
IT element IT Element
Data Reference to the consumer of the alert
Frequent long calls IT
element IT Element
Data Reference to the consumer of the alert
Frequent each long call IT
element IT Element
Data Reference to the consumer of the alert
Expensive calls IT element IT Element Data Reference to the consumer of the alert
CDR system IT element IT Element Data Reference to the consumer of the alert
Operator IT element IT Element Software Reference to event driven application
19
D4.2 Goal driven model and methodology for specification of EP applications
6.2 TEM Fact Types Table Each entry in the Fact Type table describes a given concrete fact type. As shown in Figure 6, a fact type
can be either a domain or a concrete fact type. Domain designates an equivalence class of all concrete
fact types that reference it. For example Calling number is a domain fact type and there are concrete
fact types that refer to Calling number and are associated with various events or actors. A concrete fact
type can be either contained in an event or contained in an actor. A concrete fact type is further sub
typed into a regular (atomic), list (multiple homogenous instances), composite (multiple heterogeneous
instances, contains a collection of lower level fact types), or constant (a literal that is a substitute for a
constant value). Table 16 describes the fact type table in our scenario. Fact type calling_number is a
regular fact type that is contained in CDR and it also refers to the Calling number domain fact type.
Table 16: Fact type table for the mobile phone fraud use case
6.3 TEM Actors Table An Actor is described through its roles in the model. Actor roles can be one of these types: producer, an
actor that emits events; consumer, an actor that consumes situations; actuator, similar to consumer, but
has actions associated with; event subject, an actor that the event is about; event descriptor, an actor
that one of the fact types associated with the event is about; data provider, an actor that provides fact
values; and data receiver, an actor that receives derived facts. An actor may have multiple roles; each
role may have multiple events in the same role. Table 17 shows the actors of the mobile phone fraud
scenario with their roles and respective events. For example, the Operator actor consumes five events:
Long call at night, Frequent long calls at night, Frequent long calls, Frequent each long call, and
Expensive calls; therefore, it has five distinct entries in the table.
Fact type nameFact type
Sub typeContained in event type/actor
Data
type
Domain fact
type
Deafult
valueUnits
object_id Regular CDR String
billed_msisdn Regular CDR String
call_start_date Regular CDR,Long call at night Date
calling_number Regular CDR, Long call at night,Frequent long calls at
night,Frequent long calls,Frequent each long
call, Expensive calls
String Calling number
called_number Regular CDR,Long call at night String
other_party_tel_number Regular CDR,Long call at night String
call_direction Regular CDR,Long call at night,Frequent long calls at
night
String
tap_related Regular CDR String
conversation_duration Regular CDR,Long call at night Integer minutes
total_call_charge_amount Regular CDR Double HRK (Croatian
Kuna)
call_start_dates List Frequent long calls at night,Frequent long
calls,Frequent each long call, Expensive calls
Date
calls_count Regular Frequent long calls at night,Frequent long
calls,Frequent each long call
Integer
calls_cost_sum Regular Expensive calls Double HRK (Croatian
Kuna)calls_length_sum Regular Frequent long calls Long
20
D4.2 Goal driven model and methodology for specification of EP applications
Table 17: Actors table for the mobile phone fraud use case
6.4 IT Elements Table An IT element represents the connection to the physical world of implementation and provides the
pointer to the actual IT element it represents. Each IT element is referred to by another concept, such
as actor and event in the concept lexicon table. The IT elements table defines the sub type of the
element and the physical URI to obtain its value. IT elements sub types are depicted in Figure 6. Table 18
contains the IT elements of the mobile phone fraud scenario. It can be seen that there are entries for IT
elements that are events, file, and app, and their corresponding URIs.
Table 18: IT elements table for the mobile phone fraud use case
6.5 TEM Events View Table An event schema is an example of a useful view that can be obtained from TEM Glossary tables. It is
inferred from the references to event types in the Fact type table. The events view includes the event
name and its associated fact types. Table 19 includes a view of the mobile phone fraud scenario’s raw
and derived events.
Actor Name Role Event TypeCDR System Producer CDR
Operator Consumer Long call at night
Operator Consumer Frequent long calls at night
Operator Consumer Frequent long calls
Operator Consumer Frequent each long call
Operator Consumer Expensive calls
IT Element Name IT Element Sub type
URI
CDR IT element Event CDR file URI
Long call at night IT element Event PROTON's dashboard URI
Frequent long calls at night IT element Event PROTON's dashboard URI
Frequent long calls IT element Event PROTON's dashboard URI
Frequent each long call IT element Event PROTON's dashboard URI
Expensive calls IT element Event PROTON's dashboard URI
CDR system IT element File CDR file URI
Operator IT element App PROTON's dashboard URI
21
D4.2 Goal driven model and methodology for specification of EP applications
Table 19: Events view for the mobile phone fraud use case
7 TEM Methodology A TEM model describes an application or a collection of related applications, which are event-driven by
nature. As previously described, the model is represented as a collection of concepts, where the
concepts’ definitions also contain relations to other concepts. While the methodology is goal-driven
(situations as goals) and at design time the model may be incomplete in the sense that a concept may be
referred to before being fully defined, eventually for a model to be valid it should be complete. A TEM
model is functionally complete in the sense that the content of the model is sufficient to generate a
functional execution of the model.
This section discusses the lifecycle of an event-driven application developed with TEM. The lifecycle
starts with the construction of the computation independent model (CIM) and continues to its
transformation into the platform independent model (PIM) and finally to the platform specific model
(PSM). We also discuss the process of modifying an existing model.
Event name Associated fact typesobject_id
billed_msisdn
call_start_date
calling_number
called_number
other_party_tel_number
call_direction
tap_related
conversation_duration
total_call_charge_amount
calling_number
conversation_duration
other_party_tel_number
called_number
call_direction
calling_number
call_direction
call_start_dates
calls_count
calling_number
call_start_dates
calls_length_sum
calls_count
calling_number
call_start_dates
calls_count
calling_number
call_start_dates
calls_cost_sum
Expensive calls
CDR
Long call at night
Frequent long calls at night
Frequent long calls
Frequent each long call
22
D4.2 Goal driven model and methodology for specification of EP applications
7.1 Lifecycle Overview and Methodology The top-down TEM methodology design enables us to build the design as goal-oriented, and makes the
fetching of events and data a requirement. This approach of goal-oriented design has been used in
other areas such as database design [6] or agent systems design [5]. A lifecycle that supports this type
of design consists of the following phases:
1. Construct the computational independent model (CIM)
2. Transform the CIM to the platform independent model (PIM)
3. Generate the code and create the platform specific model (PSM).
4. Operate the application and support modifications.
Next we discuss each of these phases.
7.2 Construct the Computational Independent Model The previous sections provided an example of how to construct a computational independent model.
The methodology roughly follows the Zachman top-down framework [8] and consists of the following
phases:
1. Identify the goals in terms of situations that need to be derived from the application and identify a
consumer for each situation (the “WHAT” phase).
2. For each such situation, construct a diagram that drills down to what is needed to be known or
detected in order to derive this situation (the high level “HOW” phase).
3. For each node in the diagram, construct a corresponding EDT and optionally computation and policy
tables that specify the logic for the node. This step is done bottom-up starting from the leaves of the
diagram and finishing with the situations to be detected.
4. For each event or fact type that is referred in the logic artifacts, locate its origin or create a
requirement to fetch or instrument it. If it is not feasible, refine the requirements.
5. Complete the glossary.
6. Validate the model against TEM Principles.
7. Test against test cases (business people can do step 6 and 7 if software supports these steps)
Phases 1-3 can be performed by business analysts or business specialists that possess moderate or no
programming skills. Phase 4 can be either done completely by business analysts or specialists, if the
organization has accessible meta-data repositories and all the required items are available. Alternatively,
it can serve as a starting point for the handshake between the business and IT, if there is a requirement
to disambiguate terms, or to create instrumentation to detect events. Phase 5 completes the process.
At the end of Phase 5, the CIM is complete and ready for validation.
Phase 6 deals with the validation of a CIM model. We are currently studying CSP techniques (see for
example [3]) as a possible approach to performing this validation.
A constraint satisfaction problem (CSP), P = <V, D, C> involves a set of variables, V = {v1, .., vn}, which
take discrete values from their corresponding finite domains D ={Dv1, ... ,Dvn}, and a set of constraints
23
D4.2 Goal driven model and methodology for specification of EP applications
C={C1, …, Cm}. All sets are finite. A constraint is an entity that restricts the values of the variables it
involves. A solution to the CSP problem is a single assignment of each variable such that the value of the
variable belongs to its corresponding domain, and all the constraints are satisfied. One particular
application of CSP is its ability to verify that a given solution satisfies all the constraints. The CSP engine
will let us know whether the model is valid or which constraints are the ones violated.
The validation phase may inflict iterative refinement of the previous phases. At the end of this phase a
valid CIM model is ready for the transformation.
Following our illustrative mobile phone fraud use case example, the following steps have been done:
1. Goals identification – The five situations for mobile phone fraud were identified – Long call at night,
Frequent long calls at night, Frequent long calls, Frequent each long call, and Expensive calls.
2. Construct the TEM diagrams – The corresponding TEM diagram (Figure 4) was articulated in a top-
down approach, starting from the goals stated in the previous step, all the way to the raw events
and the event processing application consumers.
3. Specify the EDTs and computation tables – The corresponding EDTs (Table 1 to Table 5) were
articulated along with their Computation tables (Table 7 to Table 10).
4. Build the Glossary – the corresponding Lexicon table (Table 15), Fact types table (Table 16), Actors
table (Table 17), and IT Elements table (Table 18) are specified.
7.3 Transform to the Platform Independent Model The platform independent model (PIM) is a generic representation of an event processing application.
The CIM might omit some details that can be implicitly inferred or specified by IT people at a later phase.
Examples of omitted details are: assignment of fact types associated with derived events whose values
are copied, and the physical realization of data elements and the way they are fetched (part of original
event, enrichment of events, or query of data stores). The implementation details are beyond the scope
of this report.
We adopted the approach of transforming the CIM to a PIM rather than do direct transformation to a
PSM model, since the aim of TEM is to be generic and fit multiple implementations. For the PIM, we
use the model described in [4], which is based on the notions of event processing network and event
processing agents. It is a comprehensive model that can be mapped to many specific event processing
languages. In the scope of Task 4.2 “Generation of an annotated event processing network” to be
carried out during year 3 of the project we will generate an EPN out of the TEM model as described in
this document.
7.4 Generate the code and create the Platform Specific Model This phase is a mapping between the PIM and PSM. Assuming that all missing details are obtained at
the PIM level, this is a mere functional transformation. This transformation can be done either with an
24
D4.2 Goal driven model and methodology for specification of EP applications
existing event processing language (like PROTON2’s), or using a compiler to generate a specific language,
e.g. Java code. As in the previous step, the code generation will be part of our work during the third year
of the project.
Note that the transformation is based on the functional specification. In some cases, further
refinements and optimizations are required based on non-functional requirements. The modeling of
non-functional requirements is described in Section 8.
7.5 Operate the application and support modifications One of the main advantages by applying TEM lies in the inclusion of changes to existing models. Changes
are made at the CIM level and automatically propagated to lower levels according to our methodology.
This simplifies the process since:
The new logic is validated along with the entire application, thus avoiding potential
inconsistencies that can result from including new logic.
Since there exists only one EDT per derived event, the modification is done solely in one place,
that is, in the EDT and corresponding Computation table it affects and in the relevant entrances
in the Glossary tables.
In order to illustrate this, let’s assume that Expensive calls are derived as before, i.e.; calls dialed to
premium locations sum up more than a pre-defined cost (e.g. 100 HRK) per calling number every six
hours; or also when calls are dialed and sum up more than a pre-defined cost (e.g. 100 HRK), but each
call must cost more than 40 HRK every two hours. Note than in the second case the temporal sliding
window is of two hours and also the derivation includes events to any location, not just to premium
ones as in the first case.
The TEM diagram will have a new entrance for the Expensive calls derived event as shown in Figure 8.
2 IBM PROactive Technology Online (PROTON) is the open source CEP engine applied in the FERARI project. The
engine has been detailed in D4.1. The source code and accompanied documentation can be found at: https://github.com/ishkin/Proton/
25
D4.2 Goal driven model and methodology for specification of EP applications
Figure 8: New TEM diagram for the mobile phone fraud use case
As any derived event in TEM is specified in exactly one EDT, we add a new row to the Expensive calls EDT
to specify this new requirement. The new EDT is shown in Table 20. The second row expresses the new
conditions for derivation of instances of Expensive calls.
Table 20: New Expensive calls logic EDT
As there are no further modifications required to any other artifacts (we are not adding new fact types,
actors, or IT elements) no further changes in the model are required and the new model can be
validated, the new EPN can be generated, and converted to a running application in an automatic way,
without any programmatic intervention.
CDR system
Operator
Long call at night
CDR
Frequent long calls at night
Long call at night
Frequent each long call
CDR
CDR
Frequent long calls
CDR
Expensive calls
CDR
two hoursCalling
number
1every 6
hours
first CDR member
of
premium
services
= 1 > 100
2every 2
hours
first CDR = 1 > 100
same
same
Calling number other_party_tel_nu
mber <CDR>
call_direction
<CDR>
SUM(total_call_charge
_amount<CDR>)
Expensive calls Logic
Row #When
Expression
When
Start
When
End
Partition by Pattern Filter on patternFilter on event
26
D4.2 Goal driven model and methodology for specification of EP applications
8 Extending TEM to non-functional requirements
The design of event processing applications consists of the design of the functional properties as well as
the nonfunctional properties. Usually, the design of the functional as well as the non-functional
requirements is implementation specific and is either done using current dedicated event processing
tools by skilled IT developers that have good familiarity with the event processing engine and the
particular way to bypass the engine’s limitations, or in hand coded fashion. In both cases, it is rather
complex and the actual design is not accessible to business users. Our goal is to extend TEM to cope also
with non-functional requirements and accessible to non-technical people.
In this section we show how business users can annotate the TEM diagram with throughput and latency
requirements, thus converting them into an integral part of any TEM application. The idea is that the
non-functional requirements are treated as “functional requirements” and become part of the PIM and
PSM models. Each performance violation is interpreted as an alert to a producer, that is, as a situation to
be consumed by an external actor. During the third year of the project we will extend the non-functional
requirements to include requirements related to scalability such as: number of nodes, latency between
nodes, and communication cost between sites; making TEM also suitable for distributed environment
such as FERARI’s prototypes.
8.1 Extending the TEM diagrams to cope with performance requirements For latency and throughput requirements we add a (blue) rectangle callout to the relevant events in the
TEM diagram. A latency constraint refers to the latest occurrence of the raw events that are input in the
path to the derived event until the derived event is detected. For example, in the mobile phone use case
we can require that the latency to the Expensive calls situation is 5 min, thus adding a corresponding
callout in the TEM diagram as depicted in Figure 9. In this case, we require that the from the last
occurrence time of each CDR raw event until any Expensive calls event is detected, we don’t allow more
than 5 minutes. In a case of a violation an alert will be emitted to the Operator.
For throughput we add a (red) rounded callout to each event we are interested to add the throughput
constraint to. While latency is very important in any mobile phone fraud detection system, throughput is
determined by the rate of the incoming CDRs and less relevant in our case. For the sake of illustrating
our approach, let’s assume that we have a throughput constraint of at least 100 CDR raw events per min
as depicted in Figure 9.
27
D4.2 Goal driven model and methodology for specification of EP applications
Figure 9: TEM diagram annotated with non-functional requirements
8.2 Event Derivation Tables for non-functional requirements Throughput and latency requirements are automatically transformed to EDTs by the TEM compiler and
there is no need to define them explicitly, as the annotation in the TEM diagram and its corresponding
EDTs, include all information required. Each throughput and latency constraint can be implicitly
specified in a single EDT. The name of the EDT is composed of <Event_name><_latency or
_throughput><Violation>. The following alerts will be generated for our example: Expensive calls
Latency Violation and CDR Throughput Violation.
For each throughput constraint a new EDT is created by applying the COUNT pattern with opposite
condition on the constrained event, along with a temporal window which equals the throughput rate
given amount of time as shown in Table 21. Note that as we require a throughput of 100 events per min,
then the COUNT condition is <100 as we want to alert only in cases of violations. As the time given rate
is 1 min, the temporal window is defined as “for every min”. Furthermore, as we only want to alert once
at the end of the temporal windows we apply the default policies of deferred and single (see Section 5.3)
and there is no need to specify a policy table for throughput EDTs.
CDR system
Operator
Long call at night
CDR
Frequent long calls at night
Long call at night
Frequent each long call
CDR
Expensive calls
CDR
Frequent long calls
CDR
Latency <=5 min
Throughput >=100
events/min
28
D4.2 Goal driven model and methodology for specification of EP applications
Table 21: CDR Throughput Violation EDT
The TEM diagram is traversed in a bottom-up way, whereas for each latency constraint a new EDT is
created as follows:
We add a new attribute to a derived event metadata which we denote: occurrence_time<latest raw
event> or OT_LRE (in our example, occurrence_time<CDR>). This value stores the occurrence time of the
raw event that is part of the derivation in the first node (EDT) in the path to the derived event with the
latency constraint, and is passed from node/derived event to node/derived event along the path.
If the pattern condition is ABSENCE then the OT_LRE equals the OT_LRE of the previous derived event
(node in the TEM diagram path). If this is the first node in the path (connected to a producer in the TEM
diagram) then OT_LRE = detection_time<derived event with the ABSENCE pattern condition>
if there is more than one input event to a node (EDT) in the path then
for each row in the EDT the OT_LRE is the max{OT_LRE} among these input events
The OT_LRE of the derived event for that node (EDT) is the one that caused the derivation (as the rows
Table 22 shows the Expensive calls Latency Violation EDT. Note that since the latency constraint ≤ 5min,
the filter on event condition is >5, as we again, are interested in reporting only violations. Note that the
default policy value (unrestricted) applies, therefore there is no need of defining the corresponding
policy table.
Table 22: Expensive calls Latency Violation
8.3 Extending the TEM methodology to include performance requirements As we apply only pre-defined TEM building blocks, the modifications required to the methodology steps
(Section 7.1) are minimal and obvious. All we need is to add the following sentence “Add latency and
throughput requirements to events in the diagram” to step 2: “For each such situation, construct a
1for every
min
< 100
COUNT(CDR)
CDR Throughput Violation
Row #When
Expression
When
Start
When
End
Partition by Filter on event Pattern Filter on pattern
1always > 5
detection_time<Expensive
calls> - OT_LRE
Expensive calls Latency Violation
Row #When
Expression
When
Start
When
End
Partition by Filter on event Pattern Filter on pattern
29
D4.2 Goal driven model and methodology for specification of EP applications
diagram that drills down to what is needed to be known or detected in order to derive this situation (the
high level “HOW” phase)”.
9 Summary and future steps Our goal in FERARI is to bring event processing much closer to the business world by extending simple
stream processing to the much more powerful realm of complex event processing in a way that is both
consumable to business users and a seamless part of Big Data applications. Our approach in WP4 is to
provide a model to construct event processing applications by using a goal-driven declarative approach
to define the requirements for event processing applications, and generate implementable complete
designs out of these requirements.
This paper presents The Event Model (TEM) as a means to design, develop, implement, and maintain
event-driven applications. The friendly, yet rigorous, representation of the event logic enables the
model to be simpler relative to existing models and accessible to people lacking IT skills. The vision is to
strive for automatic transformation along with the model-driven engineering; this is contrary to the
current state of the practice in which the transformations between the three levels of models are mostly
done manually.
TEM is suitable for business users as it supports a top-down goal oriented approach by applying the TEM
diagrams as starting point. Furthermore, TEM doesn’t use technical terms. Technical details can be left
for a later phase and be defined just before the application translation into code. The spreadsheet-like
tables for specifying the application requirements have been already successfully proven in the domain
of business rules by The Decision Model (TDM) [7]. Our vision is to reach the same level of success in the
world of complex event processing by applying a similar approach.
Our methodology supports the model driven engineering approach and encompasses the phases of
constructing the CIM model and its translation into PIM and from it into a PSM model. In this report we
detailed the construction of the CIM model and exemplified it with the mobile phone use case we have
in the project.
During year three of the project we will concentrate in the next phase, that is, the translation of the CIM
into a PIM, in our case an EPN that can be easily consumed by the PROTON CEP engine.
In order to address the Big Data requirements of FERARI, we also plan to extend the model to include:
• Non-functional requirements suitable for distributive systems such as: number of nodes,
communication costs among sites, latency between nodes. These requirements will serve the
CEP optimizer developed by the Technical University of Crete.
• Uncertainty in input events
30
D4.2 Goal driven model and methodology for specification of EP applications
10 References
[1]. Bodenstein C., Lohse F., and Zimmermann A. 2010. Executable specifications for model-based
development of automotive software. SMC 2010, 727-732.
[2]. Brambilla M., Cabot J., and Wimmer M. 2012. Model Driven Software Engineering in Practice.
Morgan & Claypool.
[3]. Dechter R. 2003. Constraint Processing. Elsevier.
[4]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.
[5]. Khallouf J. and Winikoff M. 2009. The goal-oriented design of agent systems: a refinement of
Prometheus and its evaluation. IJAOSE 3(1), 88-112.
[6]. Jiang L., Topaloglou T., Borgida A., and Mylopoulos J. 2007. Goal-Oriented Conceptual
Database Design. RE 2.
[7]. Von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.
[8]. Zachman J.A. 1999. A Framework for Information Systems Architecture. IBM Systems Journal
(IBMSJ) 38(2/3), 454-470.
31
D4.2 Goal driven model and methodology for specification of EP applications
11 Appendix A – TEM Syntax The semantics of temporal context is defined in the following table; the first column designates the
temporal context expression (keyword) while the second column describes (informally) the semantics.
Keyword Details
Always No temporal restriction on the context
Hourly, daily, weekly, monthly
For every N time units
(seconds/minutes/hours…)
Expression for non-overlapping sliding time windows
For every N occurrences of event type Expression for non-overlapping sliding event windows
Every N time units (seconds/minutes/hours…)
Expression for overlapping sliding time windows
Every N occurrences of event type Expression for overlapping sliding event windows
Start Start of event interval, this may consist of: time
constant, single event type (possibly with condition),
collection of event types, time stamp, (for) every N
time units, (for) every N occurrences of event type
End End of event interval, this may consist of: time
constant, single event type(possibly with condition),
collection of event types, time stamp, time offset (+ N
time units)
32
D4.2 Goal driven model and methodology for specification of EP applications
As explained in Section 5.1.2, the conditions part in any EDT consists of three types of conditions: filter
conditions, pattern conditions, and filter on pattern conditions.
For the filter conditions part we describe below the possible semantics. Note that symbols and words
can be alternatively used in one cell but can be mixed in one table or in one model. Sometimes values
can be replaced by expressions (e.g., mathematical function) or keywords (e.g., today).
Column head Predicate Object Meaning
fact type is or = value A fact associated with an input event is equal to a given value
fact type is not or ≠ value A fact associated with an input event is not equal to a given value
fact type > or greater than
value A fact associated with an input event is greater than a given value
fact type < or less than value A fact associated with an input event is smaller than a given value
fact type ≥ or greater or equal than
value A fact associated with an input event is greater or equal than a given value
fact type ≤ or less or equal than
value A fact associated with an input event is smaller or equal than a given value
fact type of date data type
occurs earlier than, occurs later than, occurs no earlier than, occurs no later than
value A fact associated with an input event occurs before, after, not before, or not after a given time stamp
fact type is member of list value A fact associated with an input event is member in a given list value
fact type is not member of list value A fact associated with an input event is member in a given list value
Event type An input event type equals to the Event type. This is meaningful when the input events set cardinality is larger than 1
33
D4.2 Goal driven model and methodology for specification of EP applications
For the pattern conditions part we describe below the possible semantics. Sometimes values can be
replaced by expressions.
Column head Predicate Object Meaning
Event type is DETECTED An event associated with the given event type explicitly triggers this event driven logic
Event type is ABSENT No event associated with the given event type is detected within the given context
fact type is DECREASING A fact value in the participant events is always equal or smaller than its predecessor in the event stream
fact type is INCREASING A fact value in the participant events is always equal or bigger than its predecessor in the event stream
fact type is STRICTLY DECREASING
A fact value in the participant events is always smaller than its predecessor in the event stream
fact type is STRICTLY INCREASING
A fact value in the participant events is always bigger than its predecessor in the event stream
fact type is STABLE A fact value in the participant events is always equal to its predecessor in the event stream
Event type OCCURS BEFORE, OCCURS AFTER, OCCURS AT THE SAME TIME AS
Event type Binary SEQUENCE pattern between two events
COUNT(Event type) >, ≥, =, <, ≤ Value Counts the number of participants instances of this event type and determines if it satisfies the threshold condition in the predicate
SUM(fact type) >, ≥, =, <, ≤ Value Sums all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate
AVG(fact types) >, ≥, =, <, ≤ Value Computes the average of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate
34
D4.2 Goal driven model and methodology for specification of EP applications
MIN(fact type) >, ≥, =, <, ≤ Value Selects the minimal value of all values of this fact type in the participants events and determines if it satisfies the threshold condition in the predicate
MAX(fact type) >, ≥, =, <, ≤ Value Selects the maximal value of all values of this fact type in the participants events and determines if it satisfies the threshold condition in the predicate
MEDIAN(fact type) >, ≥, =, <, ≤ Value Computes the median of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate condition
STD(fact type) >, ≥, =, <, ≤ value Computes the standard deviation of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate
FOR ALL(fact type) >, ≥, =, <, ≤ value Assertion must be true for all input events
The filter on pattern conditions is similar to the filter on event conditions, except that the events are
associated with the matching set and not the input events.