Download - ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures … · 2016-11-03 · The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures)

Project funded by the European Community under the Information and Communication Technologies Programme Contract ICT-FP7-619491

ICT, STREP

FERARI ICT-FP7-619491

Flexible Event pRocessing for big dAta

aRchItectures

Collaborative Project

D4.2

Goal driven model and methodology for specification of event processing applications

01.02.2015 – 31.01.2016(preparation period)

Contractual Date of Delivery: 31.01.2016

Actual Date of Delivery: 31.01.2016

Author(s): Fabiana Fournier and Inna Skarbovsky

Institution: IBM

Workpackage: Flexible Event Processing

Security: PU

Nature: R

Total number of pages: 42

D4.2 Goal driven model and methodology for specification of EP applications

Project coordinator name Michael Mock Revision: 1

Project coordinator organisation name Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)

Schloss Birlinghoven, 53754 Sankt Augustin, Germany

URL: http://www.iais.fraunhofer.de

Abstract The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way

for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling

business users to express complex analytics tasks through a high-level declarative language that

supports distributed complex event processing as an integral part of the system architecture.

In this report we provide a model and methodology to support this goal. The proposed approach

addresses both the functional and non-functional properties of event processing applications by

supporting non-technical users with a declarative language expressed by a set of diagrams and tables.

The outcome model can be then automatically translated into an event processing network and

eventually into a running application. Our methodology supports the model driven engineering

approach and encompasses the phases of constructing the computation independent model and its

translation into a platform independent model and from it into a platform specific model. In this report

we detailed the construction of the computation independent model and exemplified it with the mobile

phone use case we have in the project.

http://www.iais.fraunhofer.de/


Revision History Administration Status

Project acronym: FERARI ID: ICT-FP7-619491

Document identifier: D4.2 Goal driven model and methodology for specification of event processing applications

(01.02.2015 – 31.01.2016)

Leading Partner: IBM

Report version: 1

Report preparation date: 31.01.2016

Classification: PU

Nature: REPORT

Author(s) and contributors: Fabiana Fournier and Inna Skarbovsky

Status: - Plan

- Draft

- Working

- Final

x Submitted

Copyright This report is © FERARI Consortium 2014. Its duplication is restricted to the personal use within the

consortium and the European Commission.

www.ferari-project.eu

http://www.ferari-project.eu/


Document History Version Date Author Change Description 0.1 0.2

15/12/2015 1/1/2016

Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First draft Second draft including section 9

0.3 0.4 0.5

15/1/2016 17/1/2016 25/1/2016

Fabiana Fournier (IBM) Fabiana Fournier (IBM) Fabiana Fournier (IBM)

First complete version Inclusion of abstract Updates per internal review

1.0 30/1/2016 Fabiana Fournier (IBM) Final fixes and cleanup


Table of Contents 1 Introduction .......................................................................................................................................... 1

1.1 Purpose and scope of the document ............................................................................................ 1

1.2 Relationship with other documents ............................................................................................. 1

2 Preliminaries ......................................................................................................................................... 1

2.1 Event Processing Network (EPN) .................................................................................................. 2

2.2 Pattern Matching Process ............................................................................................................. 2

2.3 Pattern Policies ............................................................................................................................. 3

2.4 Illustrative Example - The Mobile phone fraud use case .............................................................. 4

3 TEM in a nutshell ................................................................................................................................... 4

3.1 TEM and Concept Computing ....................................................................................................... 4

3.2 TEM Building Blocks ...................................................................................................................... 5

3.3 TEM Basic Terms ........................................................................................................................... 5

4 TEM Diagrams ....................................................................................................................................... 7

5 TEM Logic concepts ............................................................................................................................... 8

5.1 TEM Event Derivation Tables ...................................................................................................... 10

5.1.1 Event Derivation Tables Structure ...................................................................................... 11

5.1.2 Event Derivation Tables Conditions .................................................................................... 11

5.2 TEM Computation Tables ............................................................................................................ 13

5.2.1 Computation Tables Structure ............................................................................................ 14

5.3 TEM Policy Tables ....................................................................................................................... 14

5.3.1 Policy Tables Structure ........................................................................................................ 16

6 TEM Glossary concepts ....................................................................................................................... 16

6.1 TEM Concepts Lexicon Table ...................................................................................................... 17

6.2 TEM Fact Types Table ................................................................................................................. 19

6.3 TEM Actors Table ........................................................................................................................ 19

6.4 IT Elements Table ........................................................................................................................ 20

6.5 TEM Events View Table ............................................................................................................... 20

7 TEM Methodology .............................................................................................................................. 21

7.1 Lifecycle Overview and Methodology ......................................................................................... 22


7.2 Construct the Computational Independent Model .................................................................... 22

7.3 Transform to the Platform Independent Model ......................................................................... 23

7.4 Generate the code and create the Platform Specific Model ...................................................... 23

7.5 Operate the application and support modifications .................................................................. 24

8 Extending TEM to non-functional requirements ................................................................................ 26

8.1 Extending the TEM diagrams to cope with performance requirements .................................... 26

8.2 Event Derivation Tables for non-functional requirements ......................................................... 27

8.3 Extending the TEM methodology to include performance requirements .................................. 28

9 Summary and future steps .................................................................................................................. 29

10 References ...................................................................................................................................... 30

11 Appendix A – TEM Syntax ............................................................................................................... 31

List of Tables Table 1: Long call at night EDT .................................................................................................................... 10

Table 2: Frequent long calls at night EDT.................................................................................................... 10

Table 3: Frequent long calls EDT ................................................................................................................. 10

Table 4: Frequent each long call EDT .......................................................................................................... 10

Table 5: Expensive calls EDT ....................................................................................................................... 10

Table 6: Example of filter on pattern conditions ........................................................................................ 12

Table 7: call_start_dates<Frequent long calls at night> computation table .............................................. 13

Table 8: calls_count<Frequent long calls at night> computation table ...................................................... 13

Table 9: calls_length_sum<Frequent long calls> computation table ......................................................... 14

Table 10: calls_cost_sum<Expensive calls> computation table ................................................................. 14

Table 11: Frequent long calls at night policy table ..................................................................................... 15

Table 12: Frequent long calls policy table................................................................................................... 15

Table 13: Frequent each call policy table ................................................................................................... 15

Table 14: Expensive calls policy table ......................................................................................................... 15

Table 15: Lexicon table for the mobile phone fraud use case .................................................................... 18

Table 16: Fact type table for the mobile phone fraud use case ................................................................. 19


Table 17: Actors table for the mobile phone fraud use case ...................................................................... 20

Table 18: IT elements table for the mobile phone fraud use case ............................................................. 20

Table 19: Events view for the mobile phone fraud use case ...................................................................... 21

Table 20: New Expensive calls logic EDT ..................................................................................................... 25

Table 21: CDR Throughput Violation EDT ................................................................................................... 28

Table 22: Expensive calls Latency Violation ................................................................................................ 28

List of Figures Figure 1: Illustration of an event processing network .................................................................................. 2

Figure 2: Event recognition process in an EPA .............................................................................................. 3

Figure 3. TEM diagram icons ......................................................................................................................... 7

Figure 4. TEM diagram for the mobile phone fraud use case ....................................................................... 8

Figure 5. Structure of TEM logic concepts .................................................................................................... 8

Figure 6: Structure of TEM Glossary concepts ............................................................................................ 17

Figure 7: TEM Glossary concepts relationships .......................................................................................... 17

Figure 8: New TEM diagram for the mobile phone fraud use case ............................................................ 25

Figure 9: TEM diagram annotated with non-functional requirements ...................................................... 27


Acronyms CEP Complex Event Processing

CIM Computation Independent Model

EPA Event Processing Agent

EPN Event Processing Network

FERARI Flexible Event pRocessing for big dAta aRchItectures

PIM Platform Independent Model

PROTON IBM PROactive Technology Online

PSM Platform Specific Model

TDM The Decision Model

TEM The Event Model

WP Work Package

1


1 Introduction 1.1 Purpose and scope of the document The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures) project is to pave the way

for efficient real-time Big Data technologies of the future. The proposed framework aims at enabling

business users to express complex analytics tasks through a high-level declarative language that

supports distributed complex event processing as an integral part of the system architecture. Work

package 4 (WP4) “Flexible Event Processing” deals with all the relevant tasks around event processing

technologies in order to achieve this goal. Specifically, Deliverable 4.2 (D4.2) “Goal driven model and

methodology for specification of event processing applications”, aims at providing a comprehensible

model along with a methodology for event processing applications adequate for business users.

This report presents The Event model (TEM) a new way to model, develop, validate, maintain, and

implement event-driven applications. TEM is based on a set of well-defined principles and building

blocks, and does not require substantial programming skills, thus making it suitable for business users

and the project goals. A methodology is also described as part of the report.

Note that we use complex event processing and event processing, as well as tool, engine and system,

interchangeable throughout this report.

This report is structured as follows: Section ‎2 briefs complex event processing basic terms required for

the understanding of this report. Section ‎3 gives an overall overview of the event model while these

concepts are elaborated through Sections ‎4 to ‎6. In section 7 we describe our model driven

methodology for creating event driven application. In Section ‎8 we extend the event model to cope with

non-functional requirements. We conclude the report with summary and future steps in Section ‎0‎9.

1.2 Relationship with other documents FERARI stands for Flexible Event pRocessing for big dAta aRchItectures, therefore there is a tight

connection between event processing components and the rest of the components that form the

FERARI architecture, specifically, this deliverable is strongly related to D2.1 - Architecture definition in

WP2. The requirements for the event processing engine are dictated from the use cases in the project,

thus, this report is also strongly related to D1.1 - Application Scenario Description and Requirement

Analysis in WP1. In addition, WP4 interacts with WP5 which addresses algorithms for robust and flexible

stream processing, and therefore related to D5.2 - Algorithms for Robust Distributed Stream Monitoring

and Supporting Data Integrity.

2 Preliminaries Each complex event processing (CEP) engine uses its own terminology and semantics. We follow the

semantics presented in Etzion’s and Niblet’s book ‎[4]. In our previous deliverable (D4.1 – Requirements

2


and state of the art overview on Flexible Event Processing) we covered the main constructs and terms in

complex event processing. For the sake of clarity of this report, we only briefly mention below again two

main concepts: Event processing Network (EPN) and pattern matching process.

2.1 Event Processing Network (EPN) An Event Processing Network (EPN) is a conceptual model, describing the event processing flow

execution. An EPN comprises a collection of event processing agents (EPAs), event producers, events

and consumers (Figure 1). The network describes the flow of events originating at event producers and

flowing through various event processing agents to eventually reach event consumers. For example, in

Figure 1, events from Producer 1 are processed by EPA 1. Events derived by EPA 1 are of interest to

Consumer 1 but are also processed by EPA 3 together with events derived from EPA 2.

Figure 1: Illustration of an event processing network

2.2 Pattern Matching Process An EPA performs three logical steps, a.k.a pattern matching process or event recognition (see Figure 2).

The filtering step, in which relevant events from the input events are selected for processing

according to the filter conditions. The output of this step is a set of participant events.

The matching step that takes all events that passed the filtering and looks for matches between

these events, using an event processing pattern or some other kind of matching criterion. The

output of this step is the matching set.

The derivation step that takes the output from the matching step and uses it to derive the

output events by applying derivation formulae.

Event Producer 1

Event Producer 2

Event Consumer 1

Event Consumer 2

EPA 1

EPA 3EPA 2

Event Processing Agent

Incoming/input events

Derived/output events

within context

filtering

matching

deriving

participant events

matching set

3


Figure 2: Event recognition process in an EPA

An event pattern is a template specifying one or more combinations of events. Given any collection of

events, if it’s possible to find one or more subsets of those events that match a particular pattern, it can

be said that such a subset satisfies the pattern. Some common examples of patterns:

Sequence, means that at least one instance of all participating event types must arrive in a

specified order for the pattern to be matched.

Count, means that the number of instances in the participant event set satisfies the pattern’s

number assertion.

All, means that at least one instance of all participating event types must arrive for the pattern

to be matched; the arrival order in this case is immaterial.

Trend, events need to satisfy a specific change (increasing or decreasing) over time of some

observed value; this refers to the value of a specific attribute or attributes.

Sum, means that the value of a specific attribute, summed up over all participant events,

satisfies the sum threshold assertion.

Average (AVG), means that the value of a specific attribute, averaged over all participant events,

satisfies the average threshold assertion.

Note that the first two steps are optional but a derivation must take place (even if it is merely copying

values from the input events to the derived/output event).

2.3 Pattern Policies A pattern policy (or simply policy) is a named parameter that disambiguates the semantics of the pattern

and the pattern matching process. Pattern policies fine-tune the way the pattern detection process

works. We distinguish among four types of pattern policies:

Evaluation policy – when the matching sets are produced? The EPA can either generate output

incrementally (in this case the evaluation policy is called Immediate) or at the end of the temporal

context (called Deferred).

Cardinality policy – how many matching sets are produced within a single context partition? Cardinality

policy helps limiting the number of matching sets generated, and thus the number of derived events

produced. The policy type can be single, meaning only one matching set is generated; or unrestricted,

meaning there are no restrictions on the number of matching sets generated.

Repeated/Instance Selection type policy – what happens if the matching step encounters multiple

events of the same type? The override repeated policy means that whenever a new event instance is

encountered and the participant set already contains the required number of instances of that type, the

new instance replaces the oldest previous instance of that type. The every repeated policy means that

4


every instance is kept, meaning all possible matching sets can be produced. First means that every

instance is kept, but only the earliest instance of each type is used for matching. Last is the same as first,

but the latest instance of each type is used for matching.

Consumption policy – what happens to a particular event after it has been included in the matching set?

Possible consumption policies are consume, meaning each event instance can be used in only one

matching set; and reuse, meaning an event instance can participate in an unrestricted number of

matching sets.

Policy relevance can be dictated by the event pattern. For example, the evaluation policy for an absence

pattern is always deferred (as we are testing the existence of an event instance for a specified temporal

context). Also, not all possible policies combinations are meaningful. For example, the choice of

consumption policy is irrelevant if the cardinality policy is single, because this means that the matching

step runs only once.

2.4 Illustrative Example - The Mobile phone fraud use case We illustrate the model throughout this report using the mobile phone fraud use case previously

analyzed and implemented in the scope of D4.1. The goal in is to identify users, who use a network

service without the intention to pay for that use. Many fraud mining systems in telecommunications use

some form of rules, often defined by fraud experts or automatically by some software, to raise alarms.

These alarms are checked by fraud investigators on a case-by-case basis. It is their duty to decide

whether a suspicious behavior is fraudulent or legal. This depends on the current call, the call history,

the customer history and the subscription plan of the customer.

We remind that in this specific scenario we are seeking to fire alerts in the following situations:

A long call to premium distance is made during night hours (LongCallAtNight).

As before, but this time we are looking for at least three of these “long distance calls” at night

per calling number (FrequentLongCallsAtNight).

Multiple long distance calls per calling number that last more than a certain threshold value

(FrequentLongCalls).

Same as before, but each occurrence cost exceeds the threshold (FrequentEachLongCall)

We are looking for high usage of a line for long distance calls (Expensivecall).

3 TEM in a nutshell 3.1 TEM and Concept Computing As aforementioned in the introductory section, TEM enables to model, develop, validate, maintain, and

implement event-driven applications. TEM is based on a set of well-defined principles and building

blocks, and does not require substantial programming skills, therefore making it appropriate for

business users and FERARI’s mission. In the core of TEM there is the event derivation logic expressed

5


through a collection of related normalized tables that can be transformed into code generation. This

idea has already been successfully proven in the domain of business rules by The Decision Model

(TDM)‎[7]. The Decision Model groups the rules into natural logical groups to create a structure that

makes the model relatively simple to understand, communicate, and manage.

The Event Model follows the Model Driven Engineering approach ‎[1]‎[2] and can be classified as a CIM

(Computation Independent Model), providing independence in the physical data representation, and

omitting details which are obvious to the designer. This model can be directly translated to an

execution model (PSM – Platform Specific Model in the Model Driven Architecture terminology) through

an intermediate generic representation (PIM – Platform Independent Model) as described in Section ‎7.

TEM also follows the paradigm of concept computing1, according to which all model artifacts are

concepts. A concept is a meaningful term within the user’s domain of discourse. The model consists of

concepts and semantic relationships among them. These concepts are based on the user’s cognitive

terms, and are independent of the IT terms or specific implementation. The vision is to strive for

automatic transformation along with the model-driven engineering. The vision is to have a concept-

oriented model and transform it in a mostly automated fashion to create the execution model. While

the concept computing vision aims at simplification, the model still needs to be expressive enough to

allow this automatic transformation.

3.2 TEM Building Blocks TEM is composed of the following five building blocks, described in detail and illustrated in the

subsequent sections.

TEM Concepts: As stated before, TEM follows the concept computing paradigm according to which

anything is defined as a concept. A TEM concept can be either a glossary concept or a logic concept. A

glossary concept is a term in the specific domain which has a meaning. Some of the concepts denote

computational entities, and a logic concept is a description of how such a computational entity is

computed. A TEM model consists of a collection of concepts of various types and the relations among

them. Relations between two concepts are defined only once.

TEM Glossary: The knowledge model that stores all glossary concepts of a specific application.

TEM Diagrams: The set of diagrams that describes the event causality dependencies (and hence the

event flow) in the event-driven application.

TEM Logic: The knowledge model that describes all logic concepts of a specific application. The

knowledge model is represented as a collection of tables.

3.3 TEM Basic Terms In this section we introduce some of the basic terms used in TEM.

1 http://www.slideshare.net/Mills/understanding-concept- computing

6


Fact Type: A TEM glossary concept type that denotes a named type of a piece of data atomic to the

scope. This is analogous to the attribute of entity or event in most data models.

Fact: A specific instance of a fact type, contained in a specific entity or event.

Event: Something that happened or is thought to have happened in the real world. Examples are: a

temperature sensor is read or a piece of luggage is lost.

Event instance: The computerized entity that denotes a specific instance of an event type. Examples:

temperature of sensor 123 is 40c; luggage lost with tag ID z. The term “event” in common use is also

used as a synonym for “event instance”.

Event Type: A TEM glossary concept type that denotes a set of event instances sharing the same

meaning and structure (associated data). An event type is a container of fact types, and consequently an

event is a container of facts. Event type examples: temperature read; luggage lost.

Actor: A TEM glossary concept type that denotes anyone or anything that plays a role in an event

processing system.

Raw event: An event originating from an external actor. In this case the actor’s role is defined as event

producer.

Derived event: An event that is generated by applying a function on event(s) instances over time. A

derived event is an event whose instances are created by applying a logic concept.

Derivation: The specification of the logic applied to generate a derived event.

Situation: An event of interest that may require a course of action. A Situation is a derived event emitted

outside the event processing system and consumed by an actor of type consumer.

Context: A named specification of conditions to partition the event occurrences so these partitions can

be processed separately.

Partition by: Context partition criterion based on the values of one or more Fact Types contained in

event(s), also known as segmentation context.

When?: Context partition criterion based on the instance time of events, also known as temporal

context.

Conditions: Expressions executed against event instances.

Implementation independent: A model free of references to technical concerns; this means it can be

implemented in any technology that supports TEM principles. The Event Model (TEM) is an

implementation independent model, understandable by business and technical audiences, to depict the

logic of detecting and deriving situations of interest from a stream of open-ended event instances.

7


4 TEM Diagrams One way to simplify a model is to apply a top-down methodology that provides a high level view and

understanding of the system in hand.

The Event Model diagram is a simple drawing that illustrates the structure of the logic by showing a

situation along with the flow direction of derivations in a top-down manner. At the top of the diagram

there is a goal which is the situation that is required to be derived. This goal is connected with the raw

and derived events that are identified as participants in the situation derivation. This is done in a

recursive way until raw events or facts are encountered as depicted in Figure 4 for our mobile phone

fraud use case example.

TEM diagram employ nine icons that express all the relevant terms (see Figure 3)

Figure 3. TEM diagram icons

For each situation in TEM, there is a corresponding TEM diagram.

Each node in the diagram, except of producers and consumers, is composed of blocks represented as

rectangle shapes and, separated by a black thick line. Each node has a 1:1 mapping to a corresponding

Event Derivation Table (EDT) artifact. EDTs are explained in Section ‎5.1. The rectangle in the background

of each block represents the context for the block. The contexts can be collapsed or expanded, as in the

case of the Frequent each long call derived event. Solid lines describe events transitions inside the

event-driven system. Dotted lines specify event flows to and from the event-driven system (see Figure 4).

Figure 4 depicts the TEM diagrams for our mobile phone fraud use case example. The situations to be

derived address potential cases of mobile phone fraud, which require alert notifications and human

intervention. As described in Section ‎2.4, we would like to emit five situations: LongCallAtNight,

FrequentLongCallsAtNight, FrequentLongCalls, FrequentEachLongCall, and Expensivecall. We have one

consumer of the situations (Operator, who gets the system alerts) and one producer, CDR System that

sends records corresponding to calls from mobile phones. The Context part of the Frequent each long

call derived event is expanded in the diagram to show a temporal context temporal window of one day.

Situation

Fact

Consumer

Producer

Partition by

When?

Raw event

Detected derived event

Derived event

8


We partition the events according to the Calling_number ID domain fact type (for the definition and role

of domain fact types, refer to Section ‎6), since we are looking for attempts of fraud per a calling number.

Figure 4. TEM diagram for the mobile phone fraud use case

The diagrams serve as a major design tool that provides a top down view. All blocks that describe

situations or derived events require the definition of logic concepts as described next.

5 TEM Logic concepts Logic concepts are descriptions of concepts that are computed by the application. The Event Model

Logic consists of three logic concept types (Figure 5) which are represented as tables.

Figure 5. Structure of TEM logic concepts

Event Derivation: A single logic artifact (represented as a table) for each derived event. Each derivation

table name is composed of the Derived event type + “Logic” as suffix. The derived event mentioned in

CDR system

Operator

Long call at night

CDR

Frequent long calls at night

Long call at night

Expensive calls

CDR

Frequent long calls

CDR

Frequent each long call

CDR

dayCalling

number

Logic Concept

Event Derivation Logic

Computation Logic

Policy logic

9


the name is associated with the table in the sense that the table specifies the conditions for generations

of new instances of this event type.

Computation: A single logic artifact (represented as a table) for each computed fact type in a derived

event. It specifies the computation of assignments of the values of a fact type (attribute) associated with

a derived event. Each computation table name is composed of the Derived fact type + “Computation”

as suffix. The derived fact type mentioned in the name is associated with the table in the sense that the

table specifies the value assignment for this fact type. Note that if the value of a derived fact type can

be implicitly inferred, then the computation table for this derived fact type can be omitted (see

Section ‎5.2).

Policy: A single logic artifact (represented as a table) for each derived event. It specifies the fine tuning

semantics of the derivations. Each policy table is composed of the Derived event type + “Policy” as suffix.

The derived event type mentioned in the name is associated with the table in the sense that the table

specifies the policy assignments for this event type. Note that TEM uses default policy values for

derived events. Whenever the default policies hold, then the corresponding policy tables can be omitted

(see Section ‎5.3).

Although the names of concepts in TEM can be determined freely by the system designer, we use some

naming conventions in the logic tables for the sake of clarity. For example, domain fact types (see

Section ‎6) as well as event types and actors start with a capital letter; fact types start with a lowercase

letter. Non unique fact types names are denoted fact type<Event type> or fact type <Actor>. Fact types

of sub-type lists are denoted by {fact type}. Patterns that are operators are denoted OPERATOR(fact

type) or OPERATOR(Event type), see for example COUNT(CDR) or SUM(conversation_duration<CDR>) in

Table 3 below. Operators’ operands are denoted operand.Operator, e.g., countSum.SUM and

count.COUNT in the Computation tables shown in Section ‎5.2.

We also underline event types in condition columns that have an Event Derivation Table of their own

(hyperlinks), to stress the fact that these events are themselves derived from another piece of logic, and

enabling to follow paths of inference by pressing these links (see for example Long call at night in Table

2).

As aforementioned, TEM models all artifacts as concepts within the user’s domain of discourse.

Furthermore, the model uses keywords and syntax close to the “business world”, and independent of

any specific event processing language. For example, to define a temporal context that spans a week, we

just use the weekly keyword; every N time units expresses overlapping sliding time windows; and

MIN(fact type) selects the minimal value of the values in the matching set. For a complete dictionary of

TEM terms refer to Appendix A.

The model consists of concepts and semantic relationships among them. These concepts are based on

the user’s cognitive terms, and are independent of the IT terms or specific implementation.

10


5.1 TEM Event Derivation Tables An Event Derivation Table (EDT) is a two-dimensional representation of logic leading to a derived event,

based on events and facts. Thus, an EDT designates the circumstances under which a derived event of

interest is reached. In our mobile phone fraud scenario there are five EDTs shown in Tables 1-5.

Table 1: Long call at night EDT

Table 2: Frequent long calls at night EDT

Table 3: Frequent long calls EDT

Table 4: Frequent each long call EDT

Table 5: Expensive calls EDT

1always member

of

premium

services

= 1 is

between

19:00,

7:00

> 40

other_party_tel_nu

mber <CDR>

call_direction

<CDR>

call_start_date

<CDR>

conversation_dura

tion <CDR>

Long call at night Logic

Row #When

Expression

When

Start

When

End

Partition by Pattern Filter on patternFilter on event

1for every

day

> 2

Frequent long calls at night Logic

Row #When

Expression

When

Start

When

End

Partition by Filter on event Pattern Filter on pattern

Calling number COUNT(Long call at

night)

same

1for every

day

member

of

premium

services

= 1 > 60 > 9

Frequent long calls Logic

FrequentLongCallsAtNight Pattern

COUNT(CDR)

Filter on event

call_direction

<CDR>

Filter on pattern

Calling number other_party_tel_numb

er <CDR>

SUM(conversation_

duration<CDR>)

Row #When

Expression

When

Start

When

End

Partition by

same

1for every

day

member

of

premium

services

= 1 > 9

Frequent each long call LogicWhen

Start

When

End

Partition by Filter on pattern

Calling number other_party_tel_numb

er <CDR>

COUNT(CDR)

Filter on event

same

call_direction

<CDR>

Pattern Row #

When

Expression

1every 6

hours

first CDR member

of

premium

services

= 1 > 100same

Expensive calls Logic

Row #When

Expression

When

Start

When

End

Partition by Pattern Filter on event Filter on pattern

Calling number other_party_tel_nu

mber <CDR>

call_direction

<CDR>

SUM(total_call_charge

_amount<CDR>)

11


Note the following:

Data has been anonymized in a way that the business logic has been preserved due to privacy

issues.

In addition, due to privacy issues, the values chosen for specific variables and thresholds

selected are not the correct ones. This does not alter the logic of the rules, just the assignment

of the different variables and thresholds values.

“Premium services” is a closed list of potential geographical distant locations/destinations for

which the rules are relevant (e.g., “Maldives”).

In this use case night hours are considered between 19:00 and 7:00, and 24 hours are

considered from 24:00 to 24:00 the day after.

We are only are interested in outgoing calls (incoming calls are not relevant to fraud detection),

indicated whenever the call_direction field equals 1.

5.1.1 Event Derivation Tables Structure

The first row in an EDT indicates its name. The EDT name is the derived event name + “Logic”. For

example: Long call at night Logic in Table 1. The table consists of two parts, context and conditions,

separated by a red vertical line. The context part consists of two logical sections. The temporal context,

represented by When expression, When start, and When end columns; and the segmentation context

represented by the Partition by column.

For example, Table 2 describes a non-overlapping sliding fixed interval temporal context ‎[4] of 24 hours

length (a day) and a segmentation context that partitions the events by Calling number domain (refer to

Section ‎‎6 for definition of domain fact types and to Appendix A for TEM syntax and keywords).

An event derivation consists of a collection of conditions that issue a disjunctive normal form, namely all

conditions in a single row has conjunction relationship among them, while the relationships among

multiple rows is a disjunction. In other words, in any EDT, the relationship between any two rows is a

disjunction. The relations among conditions within a single row, both context conditions (When,

Partition by) and the derivation conditions (filter on event, pattern, filter on pattern), have conjunction

semantics.

In addition, the relationships between the set of derived events in a TEM model and the set of EDTs in

the same model is a bijection (one to one mapping both ways), thus the cardinality of the EDTs set is

identical to the cardinality of the derived events set. This principle, in essence, forces the designer to

unify the different cases that bring to the derivation of a single event into a single table. Note that in

current event processing models there is no restriction on the cardinality of the number of logic artifacts,

since an event can be derived in unbounded multiple ways.

5.1.2 Event Derivation Tables Conditions

The conditions part consists of three types of conditions: filter conditions, pattern conditions, and filter

on pattern conditions. Each condition type is composed of a set of conditions, whereas each condition is

12


composed of a header (fact types, event types, or functions/operators/patterns) along with two

columns (predicate and object). For TEM syntax refer to Appendix A.

As previously stated, the conditions are interrelated as a disjunctive normal form; conjunction within

the same row and disjunction among rows. The column head can be: event type, fact type, or an

operator (function) over event types and fact types. Each condition cell in an EDT is divided into two: the

predicate and the object. Possible keywords for EDT conditions are specified in Appendix A.

The conditions are logically applied in the following order:

Filter conditions are expressions evaluated against the content of a single event instance. The role of

filter conditions is to determine whether an event instance satisfies the filtering condition and should

participate in the derivation. For example, the Filter on event columns in tables 2-5 describe one

condition on the fact type other party tel number and one condition on the fact type call_direction, both

fact types belong to the CDR input event. The value of the other_party_tel_number must be any value in

the premium services set and call_direction must equal to 1, i.e., an outgoing call. Filter conditions relate

to the Filtering step in Figure 2.

Pattern conditions are expressions on the participant events (input events that passed the filtering

conditions) such as Detected, Absent, Thresholds over Aggregations, or Fact Type value changes. The

role of pattern conditions is to detect the specified relationships among participant event instances. For

example, in Table 2, the Pattern condition describes a COUNT pattern over at least 2 occurrences of

event type Long call at night, which means that we emit a derived event whenever we have at least two

instances of the input event that passes the filter condition within the specified context (for each day

and for each calling number). Filter conditions relate to the Matching step in Figure 2.

Filter on pattern conditions are expressions on the matching set (i.e., the events that satisfied the

pattern conditions), including comparisons, memberships, and time-relationships among the event

instances in the matching set. The role of the filter on pattern conditions is to filter the pattern result

based on conditions among the different events that issue this pattern. Filter conditions relate to the

Derivation step in Figure 2.

In order to illustrate the Filter on pattern condition, let’s assume the following in our mobile phone

fraud use case: we want to fire an alert for “Expensive calls” only if we have at least one instance of the

value “Maldives” in the other_party_tel_number fact type in the matching set. Table 6 specifies the

corresponding EDT in this case for Expensive calls.

Table 6: Example of filter on pattern conditions

1

every 6

hours

first CDR member

of

premium

services

= 1 > 100 contains Maldives

other_party_tel_num

ber <CDR>

same


mber <CDR>

call_direction

<CDR>


_amount<CDR>)


Row #When

Expression

When

Start

When

End


13


5.2 TEM Computation Tables A derived event, like any event, is a container that contains facts (attributes) which are instances of the

fact types contained in the derived event’s event type. Part of the derivation is the assignment of values

to these facts. Some of the computed facts are merely copy of values from the input events to the

derived event. In TEM, we only specify the computation details for fact types’ values which are not

copied while we omit those that are copied (these are implicitly assigned by TEM compiler).

A Computation Table is a two-dimensional representation of logic leading to a computed fact type that

needs to be explicitly specified. Table 7- Table 10 specify the computation tables for our use case.

For example, Frequent long calls at night derived event has four fact types (see Events view Table 19):

calling_number, call_direction, call_start_dates, and calls_count. The first two fact values are copied

from the input event CDR, as all Long call at night input events have the value “1” in the call_direction

fact and they are partitioned by calling_number, therefore the matching set shares the same value in

this fact type (see the corresponding EDTs in Table 1 and Table 2). Accordingly we only compute two

fact types for this derived event as specified in Table 7 and Table 8. The call_start_dates fact type is the

list of all call_start_date fact types’ values of the CDR input events that result in the matching set (Table

7). We denote this by {fact type<Event type>}. The calls_count fact type holds the result of the COUNT

pattern matching denoted by count.COUNT,namely the number of events in the matching set.

In the same way we compute the call_start_dates values for the other three derived event containing

the fact type call_start_dates (Frequent long calls, Frequent each long call, and Expensive calls), and

calls_count (Frequent long calls and Frequent each long call), therefore we omit these computation

tables from the report as they are identical to Table 7 and Table 8 respectively.

Table 7: call_start_dates<Frequent long calls at night> computation table

Table 8: calls_count<Frequent long calls at night> computation table

Similarly, Table 9 specifies the calculation of the fact type calls_length_sum value which equals to the

countSum variable value of the SUM function (the actual summarization of the conversation_duration of

the calls in the matching set, see the corresponding EDT in Table 3). The calls_cost_sum fact type value

in Table 10, also receives the value of the SUM function, meaning it holds the total cost of the calls in

the matching set (see EDT in Table 5).

Row # Row in Event

derivation Table

1 {call_start_date<CDR>} 1

call_start_dates<Frequent long calls at night> Computation

Row # Row in Event

derivation Table

1 count.COUNT 1

calls_count<Frequent long calls at night> Computation

14


Table 9: calls_length_sum<Frequent long calls> computation table

Table 10: calls_cost_sum<Expensive calls> computation table

5.2.1 Computation Tables Structure

The first row in a computation table indicates the fact type name + “Computation”. For example, Table 7

is a computation table that describes the logic to compute the call_start_dates fact type associated with

the Frequent long calls at night derived event.

The second row is the headings row. The third row and on, include the row number, the expression

value of the computed fact type, and a reference to the row number in the corresponding EDT.

Following the call_start_dates example, in all derivations the value of this fact type equals the set

{call_start_date}, that is, the values of the call_start_date fact types of the events in the matching set. In

all these cases the relevant derived event is in the first row in the EDT and therefore the value “1” in the

“Row in event derivation table” column.

5.3 TEM Policy Tables As explained before, policies are used to fine tune the semantic of derivations (see Section ‎2.3). In TEM,

we defined default policies per patterns and only specify the policy details (tables) for those policies

which are not the default values implicitly assigned by the TEM compiler during the transformation of

the computation independent model to the platform independent model (see Section ‎7). TEM supports

implicitly the following default policy values (for patterns in TEM refer to Appendix A):

The evaluation policy default: The evaluation of the following patterns is immediate, that is, done when

the derivation conditions are satisfied: is DETECTED (including the AND and OR implicit patterns),

OCCURS BEFORE, OCURRS AFTER, and OCCURS AT THE SAME TIME. The evaluation for the rest of the

patterns is deferred.

The cardinality policy default: The single policy. That is, for each context partition there is at most one

derivation when the temporal context is restricted (not always), and unrestricted number of derivations

when the temporal context is always. Note that according to this default policy the policy table for the

Long call at night derived event can be omitted as the temporal window is always.

The repeated type policy default: The override policy. If there are multiple events of the same event

type, the newest one overrides the previous one.

Row # Row in Event

derivation Table

1 countSum.SUM 1

calls_length_sum<Frequent long calls> Computation

Row # Row in Event

derivation Table

1 countSum.SUM 1

calls_cost_sum<Expensive calls> Computation

15


The consumption policy default: The consume policy. As the cardinality policy default is single, the

consumption policy is relevant only to the case of always temporal context which may have multiple

derivations. In this case, an event in the matching set is consumed, and cannot be used again.

A Policy Table is a two-dimensional representation of logic leading to a policy that needs to be explicitly

specified. Table 11 - Table 14 show the policy tables for our use case, as in these situations, the policies

values differ from the default ones implicitly assigned. Note that we explicitly assign values only to the

policies which differ from the default values.

Table 11: Frequent long calls at night policy table

Table 12: Frequent long calls policy table

Table 13: Frequent each call policy table

Table 14: Expensive calls policy table

For example, for the Frequent long calls at night, we derive a new event every time (evaluation policy =

immediate) the pattern assertion is satisfied (i.e., COUNT(Long call at night) >2) while each event in the

matching set is used multiple times (consumption policy = reuse). In simple words, as soon as the

number of Long call at night input events is larger than 2, we start deriving a new situation until the

temporal context is closed, whereas the matching set includes at each time the new Long call at night

input event. For the other situations, we derive only one derive event (consumption policy = single as in

default) but as soon as the pattern conditions are satisfied (evaluation policy = immediate).

Row # Evaluation Cardinality Repeated ConsumptionRow in Event

derivation Table

1 immediate unrestricted reuse 1

Frequent long calls at night Policy


derivation Table

1 immediate 1

Frequent long calls Policy


derivation Table

1 immediate 1

Frequent each long call Policy


derivation Table

1 immediate 1

Expensive calls Policy

16


5.3.1 Policy Tables Structure

The first row in a policy table indicates the event type name + “Policy”. For example, Table 14 is a policy

table that describes the logic to assign the pattern policies for the Expensive calls derived event.

The second row is the headings row. The third row and on, include the row number, and the policy

assignments for the row number in the corresponding EDT as specified in the last column. In the

Expensive calls EDT there is only one row, therefore the row in the EDT in the policy table shows “1”.

Note that from table normalization point of view, we could add the policy table columns to the EDT as a

third part in the table (separated by a red vertical line as in the case for context and conditions). The

reason for having a separate logic artifact for the policies is twofold: first, we believe that this simplifies

the tables’ structure. Assuming that the default policy values hold in some of the cases, there is no need

of having columns that remain empty many times. Second, policy fine tuning is sometimes done after

the main logic of the application is defined by the business user, therefore we can leave the design of

the policy tables for a later phase.

While the logic artifacts may be defined first, the glossary concepts eventually need to be completed.

The next Section discusses the glossary concepts.

6 TEM Glossary concepts As noted, TEM concepts are partitioned into TEM glossary concepts and TEM logic concepts. In this

Section we show the hierarchy of glossary concepts and provide the Glossary tables for our mobile

phone fraud example.

As depicted in Figure 5, there are four concepts in The Event Model Glossary (Glossary for simplicity):

Event, Fact, Actor, and IT element. Each concept has a type. For example, an Event can be either of type

Raw or type Derived. A concept may be further classified into a sub type. For example, a Concrete fact

type can be of sub type Regular. Domain Fact Types serve as abstract fact types to enable segmentation

contexts in the EDTs (e.g., Calling number). A concrete fact type may be mapped to a domain.

17


Figure 6: Structure of TEM Glossary concepts

The Glossary is composed of the following artifacts represented as tables and detailed in the subsequent

subsections:

Concepts Lexicon Table: one entry for each concept (both glossary concepts and logic concepts).

Fact Types Table: one entry for each concrete Fact Type.

Actors Table: one entry for each combination of Actor-Role-Event. An actor is identified as a noun in a

natural language sentence. It can denote a person or a computerized artifact in the modelled domain.

IT Elements Table: one entry for each of the IT Elements. An IT element represents the connection to the

physical world of implementation and provides the pointer to the actual IT element it represents.

Figure 7 shows the relationships among the Glossary concepts.

Figure 7: TEM Glossary concepts relationships

6.1 TEM Concepts Lexicon Table Each entry in the Concepts Lexicon table describes a concept along with its type, sub type, description,

and a reference to its IT element. Table 15 describes the concepts in our use case. For example, CDR is a

Glossary Concept

Fact Type

Domain Concrete

Regular

List

Composite

Constant

Actor Type

Static

Event Type

Raw

Derived

IT Element Type

Software

Module

API

App

Hardware

Sensor

Actuator

Data

File

Database

Event

Global Variable

1

1

1 1

1

1

1

Fact Event

ActorIT Element

Role

1

NN

N

1

18


raw event, whereas Long call at night is a derived event. Calling number is a domain fact type. The CDR

event refers to the CDR IT element, which is of sub-type Data. More details on this IT element, such as,

its URI, can be obtained from the table for IT Elements (Table 18), under the corresponding element

name).

Table 15: Lexicon table for the mobile phone fraud use case

Concept name Concept TypeConcept Sub type

Description IT Element Reference

CDR Event Raw Call detail records CDR IT element

Long call at night Event Derived Check for “long” calls (defined as more than 40 min) to premium

locations during night hours (limited from 19:00 to 7:00)

Long call at night IT element

Frequent long calls at night Event Derived Same as Long call at night , but we are seeking for at least 3 calls

made to premium locations during night hours lasting longer

than “40 minutes” per a calling number

Frequent long calls at night IT

element

Frequent long calls Event Derived A situation resulting from at least 10 calls made to a premium

location summing up at least 60 min length in a day

Frequent long calls IT element

Frequent each long call Event Derived A situation resulting from at least 10 long (last at least 60 min

each) calls made to a premium location in a day

Frequent each long call IT

element

Expensive calls Event Derived A situation in which calls dialed to premium locations sum up

more than a pre-defined cost (e.g. 100 HRK) per calling number

Expensive calls IT element

object_id Fact Concrete Sequential number as maintianed by internal system

billed_msisdn Fact Concrete The msisdn to be charged. MSISDN is a number uniquely

identifying a subscription in a GSM or a UMTS mobile network.

Simply put, it is the telephone number to the SIM card in a

mobile/cellular.

call_start_date Fact Concrete Date and time for the call

calling_number Fact Concrete The msisdn calling

called_number Fact Concrete The msisdn that is called

other_party_tel_number Fact Concrete Telephone number called

call_direction Fact Concrete "1" for Outgoing and "0" for Inbound

tap_related Fact Concrete Roaming data

conversation_duration Fact Concrete Length of call

total_call_charge_amount Fact Concrete Charge amount for the call

call_start_dates Fact Concrete The call start dates of the events in the matching set

calls_count Fact Concrete Number of events in the matching set

calls_cost_sum Fact Concrete Total cost of calls of events in the matching set

calls_length_sum Fact Concrete Total duration of calls in the matching set

Calling number Fact Domain Universal name for calling number

CDR system Actor Static System that produces the CDR raw events CDR system IT element

Operator Actor Static Manual inspection of potential fraud mobile numbers Operator IT element

CDR IT element IT ElementData Reference to the producer of the CDR raw events

Long call at night IT element IT ElementData Reference to the consumer of the alert


IT element IT Element

Data Reference to the consumer of the alert

Frequent long calls IT

element IT Element


Frequent each long call IT

element IT Element


Expensive calls IT element IT Element Data Reference to the consumer of the alert

CDR system IT element IT Element Data Reference to the consumer of the alert

Operator IT element IT Element Software Reference to event driven application

19


6.2 TEM Fact Types Table Each entry in the Fact Type table describes a given concrete fact type. As shown in Figure 6, a fact type

can be either a domain or a concrete fact type. Domain designates an equivalence class of all concrete

fact types that reference it. For example Calling number is a domain fact type and there are concrete

fact types that refer to Calling number and are associated with various events or actors. A concrete fact

type can be either contained in an event or contained in an actor. A concrete fact type is further sub

typed into a regular (atomic), list (multiple homogenous instances), composite (multiple heterogeneous

instances, contains a collection of lower level fact types), or constant (a literal that is a substitute for a

constant value). Table 16 describes the fact type table in our scenario. Fact type calling_number is a

regular fact type that is contained in CDR and it also refers to the Calling number domain fact type.

Table 16: Fact type table for the mobile phone fraud use case

6.3 TEM Actors Table An Actor is described through its roles in the model. Actor roles can be one of these types: producer, an

actor that emits events; consumer, an actor that consumes situations; actuator, similar to consumer, but

has actions associated with; event subject, an actor that the event is about; event descriptor, an actor

that one of the fact types associated with the event is about; data provider, an actor that provides fact

values; and data receiver, an actor that receives derived facts. An actor may have multiple roles; each

role may have multiple events in the same role. Table 17 shows the actors of the mobile phone fraud

scenario with their roles and respective events. For example, the Operator actor consumes five events:

Long call at night, Frequent long calls at night, Frequent long calls, Frequent each long call, and

Expensive calls; therefore, it has five distinct entries in the table.

Fact type nameFact type

Sub typeContained in event type/actor

Data

type

Domain fact

type

Deafult

valueUnits

object_id Regular CDR String

billed_msisdn Regular CDR String

call_start_date Regular CDR,Long call at night Date

calling_number Regular CDR, Long call at night,Frequent long calls at

night,Frequent long calls,Frequent each long

call, Expensive calls

String Calling number

called_number Regular CDR,Long call at night String

other_party_tel_number Regular CDR,Long call at night String

call_direction Regular CDR,Long call at night,Frequent long calls at

night

String

tap_related Regular CDR String

conversation_duration Regular CDR,Long call at night Integer minutes

total_call_charge_amount Regular CDR Double HRK (Croatian

Kuna)

call_start_dates List Frequent long calls at night,Frequent long

calls,Frequent each long call, Expensive calls

Date

calls_count Regular Frequent long calls at night,Frequent long

calls,Frequent each long call

Integer

calls_cost_sum Regular Expensive calls Double HRK (Croatian

Kuna)calls_length_sum Regular Frequent long calls Long

20


Table 17: Actors table for the mobile phone fraud use case

6.4 IT Elements Table An IT element represents the connection to the physical world of implementation and provides the

pointer to the actual IT element it represents. Each IT element is referred to by another concept, such

as actor and event in the concept lexicon table. The IT elements table defines the sub type of the

element and the physical URI to obtain its value. IT elements sub types are depicted in Figure 6. Table 18

contains the IT elements of the mobile phone fraud scenario. It can be seen that there are entries for IT

elements that are events, file, and app, and their corresponding URIs.

Table 18: IT elements table for the mobile phone fraud use case

6.5 TEM Events View Table An event schema is an example of a useful view that can be obtained from TEM Glossary tables. It is

inferred from the references to event types in the Fact type table. The events view includes the event

name and its associated fact types. Table 19 includes a view of the mobile phone fraud scenario’s raw

and derived events.

Actor Name Role Event TypeCDR System Producer CDR

Operator Consumer Long call at night

Operator Consumer Frequent long calls at night

Operator Consumer Frequent long calls

Operator Consumer Frequent each long call

Operator Consumer Expensive calls

IT Element Name IT Element Sub type

URI

CDR IT element Event CDR file URI

Long call at night IT element Event PROTON's dashboard URI

Frequent long calls at night IT element Event PROTON's dashboard URI

Frequent long calls IT element Event PROTON's dashboard URI

Frequent each long call IT element Event PROTON's dashboard URI

Expensive calls IT element Event PROTON's dashboard URI

CDR system IT element File CDR file URI

Operator IT element App PROTON's dashboard URI

21


Table 19: Events view for the mobile phone fraud use case

7 TEM Methodology A TEM model describes an application or a collection of related applications, which are event-driven by

nature. As previously described, the model is represented as a collection of concepts, where the

concepts’ definitions also contain relations to other concepts. While the methodology is goal-driven

(situations as goals) and at design time the model may be incomplete in the sense that a concept may be

referred to before being fully defined, eventually for a model to be valid it should be complete. A TEM

model is functionally complete in the sense that the content of the model is sufficient to generate a

functional execution of the model.

This section discusses the lifecycle of an event-driven application developed with TEM. The lifecycle

starts with the construction of the computation independent model (CIM) and continues to its

transformation into the platform independent model (PIM) and finally to the platform specific model

(PSM). We also discuss the process of modifying an existing model.

Event name Associated fact typesobject_id

billed_msisdn

call_start_date

calling_number

called_number

other_party_tel_number

call_direction

tap_related

conversation_duration

total_call_charge_amount

calling_number

conversation_duration

other_party_tel_number

called_number

call_direction

calling_number

call_direction

call_start_dates

calls_count

calling_number

call_start_dates

calls_length_sum

calls_count

calling_number

call_start_dates

calls_count

calling_number

call_start_dates

calls_cost_sum

Expensive calls

CDR

Long call at night


Frequent long calls


22


7.1 Lifecycle Overview and Methodology The top-down TEM methodology design enables us to build the design as goal-oriented, and makes the

fetching of events and data a requirement. This approach of goal-oriented design has been used in

other areas such as database design ‎[6] or agent systems design ‎[5]. A lifecycle that supports this type

of design consists of the following phases:

1. Construct the computational independent model (CIM)

2. Transform the CIM to the platform independent model (PIM)

3. Generate the code and create the platform specific model (PSM).

4. Operate the application and support modifications.

Next we discuss each of these phases.

7.2 Construct the Computational Independent Model The previous sections provided an example of how to construct a computational independent model.

The methodology roughly follows the Zachman top-down framework ‎[8] and consists of the following

phases:

1. Identify the goals in terms of situations that need to be derived from the application and identify a

consumer for each situation (the “WHAT” phase).

2. For each such situation, construct a diagram that drills down to what is needed to be known or

detected in order to derive this situation (the high level “HOW” phase).

3. For each node in the diagram, construct a corresponding EDT and optionally computation and policy

tables that specify the logic for the node. This step is done bottom-up starting from the leaves of the

diagram and finishing with the situations to be detected.

4. For each event or fact type that is referred in the logic artifacts, locate its origin or create a

requirement to fetch or instrument it. If it is not feasible, refine the requirements.

5. Complete the glossary.

6. Validate the model against TEM Principles.

7. Test against test cases (business people can do step 6 and 7 if software supports these steps)

Phases 1-3 can be performed by business analysts or business specialists that possess moderate or no

programming skills. Phase 4 can be either done completely by business analysts or specialists, if the

organization has accessible meta-data repositories and all the required items are available. Alternatively,

it can serve as a starting point for the handshake between the business and IT, if there is a requirement

to disambiguate terms, or to create instrumentation to detect events. Phase 5 completes the process.

At the end of Phase 5, the CIM is complete and ready for validation.

Phase 6 deals with the validation of a CIM model. We are currently studying CSP techniques (see for

example ‎[3]) as a possible approach to performing this validation.

A constraint satisfaction problem (CSP), P = <V, D, C> involves a set of variables, V = {v1, .., vn}, which

take discrete values from their corresponding finite domains D ={Dv1, ... ,Dvn}, and a set of constraints

23


C={C1, …, Cm}. All sets are finite. A constraint is an entity that restricts the values of the variables it

involves. A solution to the CSP problem is a single assignment of each variable such that the value of the

variable belongs to its corresponding domain, and all the constraints are satisfied. One particular

application of CSP is its ability to verify that a given solution satisfies all the constraints. The CSP engine

will let us know whether the model is valid or which constraints are the ones violated.

The validation phase may inflict iterative refinement of the previous phases. At the end of this phase a

valid CIM model is ready for the transformation.

Following our illustrative mobile phone fraud use case example, the following steps have been done:

1. Goals identification – The five situations for mobile phone fraud were identified – Long call at night,

Frequent long calls at night, Frequent long calls, Frequent each long call, and Expensive calls.

2. Construct the TEM diagrams – The corresponding TEM diagram (Figure 4) was articulated in a top-

down approach, starting from the goals stated in the previous step, all the way to the raw events

and the event processing application consumers.

3. Specify the EDTs and computation tables – The corresponding EDTs (Table 1 to Table 5) were

articulated along with their Computation tables (Table 7 to Table 10).

4. Build the Glossary – the corresponding Lexicon table (Table 15), Fact types table (Table 16), Actors

table (Table 17), and IT Elements table (Table 18) are specified.

7.3 Transform to the Platform Independent Model The platform independent model (PIM) is a generic representation of an event processing application.

The CIM might omit some details that can be implicitly inferred or specified by IT people at a later phase.

Examples of omitted details are: assignment of fact types associated with derived events whose values

are copied, and the physical realization of data elements and the way they are fetched (part of original

event, enrichment of events, or query of data stores). The implementation details are beyond the scope

of this report.

We adopted the approach of transforming the CIM to a PIM rather than do direct transformation to a

PSM model, since the aim of TEM is to be generic and fit multiple implementations. For the PIM, we

use the model described in ‎[4], which is based on the notions of event processing network and event

processing agents. It is a comprehensive model that can be mapped to many specific event processing

languages. In the scope of Task 4.2 “Generation of an annotated event processing network” to be

carried out during year 3 of the project we will generate an EPN out of the TEM model as described in

this document.

7.4 Generate the code and create the Platform Specific Model This phase is a mapping between the PIM and PSM. Assuming that all missing details are obtained at

the PIM level, this is a mere functional transformation. This transformation can be done either with an

24


existing event processing language (like PROTON2’s), or using a compiler to generate a specific language,

e.g. Java code. As in the previous step, the code generation will be part of our work during the third year

of the project.

Note that the transformation is based on the functional specification. In some cases, further

refinements and optimizations are required based on non-functional requirements. The modeling of

non-functional requirements is described in Section ‎8.

7.5 Operate the application and support modifications One of the main advantages by applying TEM lies in the inclusion of changes to existing models. Changes

are made at the CIM level and automatically propagated to lower levels according to our methodology.

This simplifies the process since:

The new logic is validated along with the entire application, thus avoiding potential

inconsistencies that can result from including new logic.

Since there exists only one EDT per derived event, the modification is done solely in one place,

that is, in the EDT and corresponding Computation table it affects and in the relevant entrances

in the Glossary tables.

In order to illustrate this, let’s assume that Expensive calls are derived as before, i.e.; calls dialed to

premium locations sum up more than a pre-defined cost (e.g. 100 HRK) per calling number every six

hours; or also when calls are dialed and sum up more than a pre-defined cost (e.g. 100 HRK), but each

call must cost more than 40 HRK every two hours. Note than in the second case the temporal sliding

window is of two hours and also the derivation includes events to any location, not just to premium

ones as in the first case.

The TEM diagram will have a new entrance for the Expensive calls derived event as shown in Figure 8.

2 IBM PROactive Technology Online (PROTON) is the open source CEP engine applied in the FERARI project. The

engine has been detailed in D4.1. The source code and accompanied documentation can be found at: https://github.com/ishkin/Proton/

25


Figure 8: New TEM diagram for the mobile phone fraud use case

As any derived event in TEM is specified in exactly one EDT, we add a new row to the Expensive calls EDT

to specify this new requirement. The new EDT is shown in Table 20. The second row expresses the new

conditions for derivation of instances of Expensive calls.

Table 20: New Expensive calls logic EDT

As there are no further modifications required to any other artifacts (we are not adding new fact types,

actors, or IT elements) no further changes in the model are required and the new model can be

validated, the new EPN can be generated, and converted to a running application in an automatic way,

without any programmatic intervention.

CDR system

Operator

Long call at night

CDR


Long call at night


CDR

CDR

Frequent long calls

CDR

Expensive calls

CDR

two hoursCalling

number

1every 6

hours

first CDR member

of

premium

services

= 1 > 100

2every 2

hours

first CDR = 1 > 100

same

same


mber <CDR>

call_direction

<CDR>


_amount<CDR>)


Row #When

Expression

When

Start

When

End

Partition by Pattern Filter on patternFilter on event

26


8 Extending TEM to non-functional requirements

The design of event processing applications consists of the design of the functional properties as well as

the nonfunctional properties. Usually, the design of the functional as well as the non-functional

requirements is implementation specific and is either done using current dedicated event processing

tools by skilled IT developers that have good familiarity with the event processing engine and the

particular way to bypass the engine’s limitations, or in hand coded fashion. In both cases, it is rather

complex and the actual design is not accessible to business users. Our goal is to extend TEM to cope also

with non-functional requirements and accessible to non-technical people.

In this section we show how business users can annotate the TEM diagram with throughput and latency

requirements, thus converting them into an integral part of any TEM application. The idea is that the

non-functional requirements are treated as “functional requirements” and become part of the PIM and

PSM models. Each performance violation is interpreted as an alert to a producer, that is, as a situation to

be consumed by an external actor. During the third year of the project we will extend the non-functional

requirements to include requirements related to scalability such as: number of nodes, latency between

nodes, and communication cost between sites; making TEM also suitable for distributed environment

such as FERARI’s prototypes.

8.1 Extending the TEM diagrams to cope with performance requirements For latency and throughput requirements we add a (blue) rectangle callout to the relevant events in the

TEM diagram. A latency constraint refers to the latest occurrence of the raw events that are input in the

path to the derived event until the derived event is detected. For example, in the mobile phone use case

we can require that the latency to the Expensive calls situation is 5 min, thus adding a corresponding

callout in the TEM diagram as depicted in Figure 9. In this case, we require that the from the last

occurrence time of each CDR raw event until any Expensive calls event is detected, we don’t allow more

than 5 minutes. In a case of a violation an alert will be emitted to the Operator.

For throughput we add a (red) rounded callout to each event we are interested to add the throughput

constraint to. While latency is very important in any mobile phone fraud detection system, throughput is

determined by the rate of the incoming CDRs and less relevant in our case. For the sake of illustrating

our approach, let’s assume that we have a throughput constraint of at least 100 CDR raw events per min

as depicted in Figure 9.

27


Figure 9: TEM diagram annotated with non-functional requirements

8.2 Event Derivation Tables for non-functional requirements Throughput and latency requirements are automatically transformed to EDTs by the TEM compiler and

there is no need to define them explicitly, as the annotation in the TEM diagram and its corresponding

EDTs, include all information required. Each throughput and latency constraint can be implicitly

specified in a single EDT. The name of the EDT is composed of <Event_name><_latency or

_throughput><Violation>. The following alerts will be generated for our example: Expensive calls

Latency Violation and CDR Throughput Violation.

For each throughput constraint a new EDT is created by applying the COUNT pattern with opposite

condition on the constrained event, along with a temporal window which equals the throughput rate

given amount of time as shown in Table 21. Note that as we require a throughput of 100 events per min,

then the COUNT condition is <100 as we want to alert only in cases of violations. As the time given rate

is 1 min, the temporal window is defined as “for every min”. Furthermore, as we only want to alert once

at the end of the temporal windows we apply the default policies of deferred and single (see Section ‎5.3)

and there is no need to specify a policy table for throughput EDTs.

CDR system

Operator

Long call at night

CDR


Long call at night


CDR

Expensive calls

CDR

Frequent long calls

CDR

Latency <=5 min

Throughput >=100

events/min

28


Table 21: CDR Throughput Violation EDT

The TEM diagram is traversed in a bottom-up way, whereas for each latency constraint a new EDT is

created as follows:

We add a new attribute to a derived event metadata which we denote: occurrence_time<latest raw

event> or OT_LRE (in our example, occurrence_time<CDR>). This value stores the occurrence time of the

raw event that is part of the derivation in the first node (EDT) in the path to the derived event with the

latency constraint, and is passed from node/derived event to node/derived event along the path.

If the pattern condition is ABSENCE then the OT_LRE equals the OT_LRE of the previous derived event

(node in the TEM diagram path). If this is the first node in the path (connected to a producer in the TEM

diagram) then OT_LRE = detection_time<derived event with the ABSENCE pattern condition>

if there is more than one input event to a node (EDT) in the path then

for each row in the EDT the OT_LRE is the max{OT_LRE} among these input events

The OT_LRE of the derived event for that node (EDT) is the one that caused the derivation (as the rows

Table 22 shows the Expensive calls Latency Violation EDT. Note that since the latency constraint ≤ 5min,

the filter on event condition is >5, as we again, are interested in reporting only violations. Note that the

default policy value (unrestricted) applies, therefore there is no need of defining the corresponding

policy table.

Table 22: Expensive calls Latency Violation

8.3 Extending the TEM methodology to include performance requirements As we apply only pre-defined TEM building blocks, the modifications required to the methodology steps

(Section ‎7.1) are minimal and obvious. All we need is to add the following sentence “Add latency and

throughput requirements to events in the diagram” to step 2: “For each such situation, construct a

1for every

min

< 100

COUNT(CDR)

CDR Throughput Violation

Row #When

Expression

When

Start

When

End


1always > 5

detection_time<Expensive

calls> - OT_LRE

Expensive calls Latency Violation

Row #When

Expression

When

Start

When

End


29


diagram that drills down to what is needed to be known or detected in order to derive this situation (the

high level “HOW” phase)”.

9 Summary and future steps Our goal in FERARI is to bring event processing much closer to the business world by extending simple

stream processing to the much more powerful realm of complex event processing in a way that is both

consumable to business users and a seamless part of Big Data applications. Our approach in WP4 is to

provide a model to construct event processing applications by using a goal-driven declarative approach

to define the requirements for event processing applications, and generate implementable complete

designs out of these requirements.

This paper presents The Event Model (TEM) as a means to design, develop, implement, and maintain

event-driven applications. The friendly, yet rigorous, representation of the event logic enables the

model to be simpler relative to existing models and accessible to people lacking IT skills. The vision is to

strive for automatic transformation along with the model-driven engineering; this is contrary to the

current state of the practice in which the transformations between the three levels of models are mostly

done manually.

TEM is suitable for business users as it supports a top-down goal oriented approach by applying the TEM

diagrams as starting point. Furthermore, TEM doesn’t use technical terms. Technical details can be left

for a later phase and be defined just before the application translation into code. The spreadsheet-like

tables for specifying the application requirements have been already successfully proven in the domain

of business rules by The Decision Model (TDM) ‎[7]. Our vision is to reach the same level of success in the

world of complex event processing by applying a similar approach.

Our methodology supports the model driven engineering approach and encompasses the phases of

constructing the CIM model and its translation into PIM and from it into a PSM model. In this report we

detailed the construction of the CIM model and exemplified it with the mobile phone use case we have

in the project.

During year three of the project we will concentrate in the next phase, that is, the translation of the CIM

into a PIM, in our case an EPN that can be easily consumed by the PROTON CEP engine.

In order to address the Big Data requirements of FERARI, we also plan to extend the model to include:

• Non-functional requirements suitable for distributive systems such as: number of nodes,

communication costs among sites, latency between nodes. These requirements will serve the

CEP optimizer developed by the Technical University of Crete.

• Uncertainty in input events

30


10 References

[1]. Bodenstein C., Lohse F., and Zimmermann A. 2010. Executable specifications for model-based

development of automotive software. SMC 2010, 727-732.

[2]. Brambilla M., Cabot J., and Wimmer M. 2012. Model Driven Software Engineering in Practice.

Morgan & Claypool.

[3]. Dechter R. 2003. Constraint Processing. Elsevier.

[4]. Etzion O. and Niblett P. 2010. Event Processing in Action. Manning Publications Company.

[5]. Khallouf J. and Winikoff M. 2009. The goal-oriented design of agent systems: a refinement of

Prometheus and its evaluation. IJAOSE 3(1), 88-112.

[6]. Jiang L., Topaloglou T., Borgida A., and Mylopoulos J. 2007. Goal-Oriented Conceptual

Database Design. RE 2.

[7]. Von Halle B. and Goldberg L. 2010. The Decision Model. CRC Press.

[8]. Zachman J.A. 1999. A Framework for Information Systems Architecture. IBM Systems Journal

(IBMSJ) 38(2/3), 454-470.

31


11 Appendix A – TEM Syntax The semantics of temporal context is defined in the following table; the first column designates the

temporal context expression (keyword) while the second column describes (informally) the semantics.

Keyword Details

Always No temporal restriction on the context

Hourly, daily, weekly, monthly

For every N time units

(seconds/minutes/hours…)

Expression for non-overlapping sliding time windows

For every N occurrences of event type Expression for non-overlapping sliding event windows

Every N time units (seconds/minutes/hours…)

Expression for overlapping sliding time windows

Every N occurrences of event type Expression for overlapping sliding event windows

Start Start of event interval, this may consist of: time

constant, single event type (possibly with condition),

collection of event types, time stamp, (for) every N

time units, (for) every N occurrences of event type

End End of event interval, this may consist of: time

constant, single event type(possibly with condition),

collection of event types, time stamp, time offset (+ N

time units)

32


As explained in Section ‎5.1.2, the conditions part in any EDT consists of three types of conditions: filter

conditions, pattern conditions, and filter on pattern conditions.

For the filter conditions part we describe below the possible semantics. Note that symbols and words

can be alternatively used in one cell but can be mixed in one table or in one model. Sometimes values

can be replaced by expressions (e.g., mathematical function) or keywords (e.g., today).

Column head Predicate Object Meaning

fact type is or = value A fact associated with an input event is equal to a given value

fact type is not or ≠ value A fact associated with an input event is not equal to a given value

fact type > or greater than

value A fact associated with an input event is greater than a given value

fact type < or less than value A fact associated with an input event is smaller than a given value

fact type ≥ or greater or equal than

value A fact associated with an input event is greater or equal than a given value

fact type ≤ or less or equal than

value A fact associated with an input event is smaller or equal than a given value

fact type of date data type

occurs earlier than, occurs later than, occurs no earlier than, occurs no later than

value A fact associated with an input event occurs before, after, not before, or not after a given time stamp

fact type is member of list value A fact associated with an input event is member in a given list value

fact type is not member of list value A fact associated with an input event is member in a given list value

Event type An input event type equals to the Event type. This is meaningful when the input events set cardinality is larger than 1

33


For the pattern conditions part we describe below the possible semantics. Sometimes values can be

replaced by expressions.

Column head Predicate Object Meaning

Event type is DETECTED An event associated with the given event type explicitly triggers this event driven logic

Event type is ABSENT No event associated with the given event type is detected within the given context

fact type is DECREASING A fact value in the participant events is always equal or smaller than its predecessor in the event stream

fact type is INCREASING A fact value in the participant events is always equal or bigger than its predecessor in the event stream

fact type is STRICTLY DECREASING

A fact value in the participant events is always smaller than its predecessor in the event stream

fact type is STRICTLY INCREASING

A fact value in the participant events is always bigger than its predecessor in the event stream

fact type is STABLE A fact value in the participant events is always equal to its predecessor in the event stream

Event type OCCURS BEFORE, OCCURS AFTER, OCCURS AT THE SAME TIME AS

Event type Binary SEQUENCE pattern between two events

COUNT(Event type) >, ≥, =, <, ≤ Value Counts the number of participants instances of this event type and determines if it satisfies the threshold condition in the predicate

SUM(fact type) >, ≥, =, <, ≤ Value Sums all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate

AVG(fact types) >, ≥, =, <, ≤ Value Computes the average of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate

34


MIN(fact type) >, ≥, =, <, ≤ Value Selects the minimal value of all values of this fact type in the participants events and determines if it satisfies the threshold condition in the predicate

MAX(fact type) >, ≥, =, <, ≤ Value Selects the maximal value of all values of this fact type in the participants events and determines if it satisfies the threshold condition in the predicate

MEDIAN(fact type) >, ≥, =, <, ≤ Value Computes the median of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate condition

STD(fact type) >, ≥, =, <, ≤ value Computes the standard deviation of all values of this fact type in the participant events and determines if it satisfies the threshold condition in the predicate

FOR ALL(fact type) >, ≥, =, <, ≤ value Assertion must be true for all input events

The filter on pattern conditions is similar to the filter on event conditions, except that the events are

associated with the matching set and not the input events.

Download - ICT, STREP FERARI ICT-FP7-619491 Flexible Event pRocessing for big dAta aRchItectures … · 2016-11-03 · The goal of the FERARI (Flexible Event pRocessing for big dAta aRchItectures)

Top Related