eindhoven university of technology master business process ... · comments on my thesis and his...

Eindhoven University of Technology

MASTER

Business process analysis with semantic dotted chartextracting X-Ray usage information from log files using semantic process mining

Bozkaya, M.

Award date:2011

Link to publication

DisclaimerThis document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Studenttheses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the documentas presented in the repository. The required complexity or quality of research of student theses may vary by program, and the requiredminimum study period may vary in duration.

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

https://research.tue.nl/en/studentthesis/business-process-analysis-with-semantic-dotted-chart(024dafbc-bc29-4206-bacf-26dde917e8e2).html

Business Process Analysis withSemantic Dotted Chart

Extracting X-Ray Usage Information from Log Filesusing Semantic Process Mining

in partial fulfilment of the requirements for the degree of

Master of Science in Business Information Systems

Author: Melike BozkayaStudent ID: 0641467

University: Eindhoven University of TechnologyDepartment: Mathematics & Computer ScienceMaster: Business Information SystemsSupervisor: Prof. Dr. Ir. W.M.P. van der AalstTutor: Ir. K.J.F.R. van Uden

Company: Philips HealthcareBusiness Line: CardioVascularDepartment: Validation & ReleaseSupervisor: W.L.M. van Rooij

to my parents, Mehmet and Nuray,

whose love and support made my dream a reality..

iii

Acknowledgements

I would like to thank my supervisor, Prof. Wil van der Aalst, for making it pos-sible for me to work with him and in particular, his constructive and insightfulcomments on my thesis and his very supportive and understanding attitude.

I would also like to express my warm gratitude to Wim van Rooij and Kennyvan Uden for giving me opportunity to perform this master thesis in PhilipsHealthcare. Thank you Kenny for creating relaxed working environment, yoursupport and being the best ’boss’ ever.

Of course, I want to thank Minseok Song and Ana Karla Alves de Medeiros fortheir help and inspiration whenever I needed. Thank you Minseok for being inthe assessment committee in one of your happiest time. A special thanks to JCRantham Prabhakara for all his help and friendship.

My very cosy thanks to my friends for their constant support throughout mymaster studies. Thank you Derya, Firat, Onder, Mehmet and Ingmar for beingin my life. Thanks my lovely housemates, Seda and Sandra, for everything butespecially for your patience in the difficult periods of my thesis. My specialthanks to you Jovan, not only for giving your time, patiently reading the textand listening my frustrations but also for believing in me more than I do. Mylife in Eindhoven has been more entertaining with you all.

Last, but definitely not least, I want to thank my family for all their unendinglove, support and caring. I am forever indebted to them. Thank you Melik,for listening my complaints but most importantly, for being my best friend.Thanks, mom and dad!

Melike BozkayaAugust 2009, Eindhoven, Netherlands

v

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Rationale for the Project . . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives of the Project . . . . . . . . . . . . . . . . . . . . . . 21.3 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . 3

2 Research Environment . . . . . . . . . . . . . . . . . . . . . . . . 52.1 Philips in General . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Background Information for X-Ray Domain . . . . . . . . . . . . 102.3 ProM Research Group . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Process Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Project Description . . . . . . . . . . . . . . . . . . . . . . . . . . 193.1 General Problem Definition . . . . . . . . . . . . . . . . . . . . . 193.2 PH Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 203.3 Description of the Assignment . . . . . . . . . . . . . . . . . . . . 22

4 Semantic Process Mining . . . . . . . . . . . . . . . . . . . . . . . 254.1 Semantic Process Mining . . . . . . . . . . . . . . . . . . . . . . 254.2 Semantic Technologies . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Semantic Process Mining in ProM . . . . . . . . . . . . . . . . . 33

5 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.1 ProMimport RunInfo Plug-in . . . . . . . . . . . . . . . . . . . . 395.2 Event Logs of Philips . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Ontology Construction for Philips . . . . . . . . . . . . . . . . . 425.4 SAMXML/MXML Mappings . . . . . . . . . . . . . . . . . . . . 475.5 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . 49

6 Semantic Dotted Chart . . . . . . . . . . . . . . . . . . . . . . . . 556.1 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.2 The Dotted Chart . . . . . . . . . . . . . . . . . . . . . . . . . . 566.3 Semantic Dotted Chart . . . . . . . . . . . . . . . . . . . . . . . 586.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666.5 Problem Encountered: Validation of Ontologies . . . . . . . . . . 73

7 Philips Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.1 Tailored Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.2 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

vii

CONTENTS

8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

A X-Ray Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

B Acquired Run Information . . . . . . . . . . . . . . . . . . . . . . 103

C WSML Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

D X-Ray Run Related Formulas . . . . . . . . . . . . . . . . . . . . 109

E Run Ontology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

F Log Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

viii

Chapter 1

Introduction

Today, organizations use a variety of information systems (Enterprise ResourcePlanning (ERP) Systems, Workflow Management Systems (WFM), etc.) tosupport the execution of their business processes. As organizations change,these information systems also change, often from simple to complex systems.Hence, it becomes harder to realize whether or not the system works in theway organization thinks it works. Process Mining (PM), which is the mainresearch area of this master thesis, has emerged as a way to gain insights intobusiness processes, which are linked to information systems, based on event logs.

In the context of PM, business processes can be analyzed from different views,such as control flow of the process, structure of the organization that the pro-cess is performed by and data elements of the process, etc. One way to aidorganizations in analysis of their business processes is to visualize them in aneasy-to-understand manner. The visualization is used with the domain knowl-edge of the business analyst to facilitate gaining insights into business processes.It should also be possible to include this domain knowledge to the analysis tech-niques of PM in order to bring the analysis of business processes closer to thebusiness analysts.

This master project deals with making use of and visualizing domain knowledgeof business processes to one of PM analysis techniques, namely Dotted ChartAnalysis. This chapter provides an introduction to the project. Section 1.1and Section 1.2 respectively give the reasons and objectives of the project. Thegeneral approach used for the realization of this project will be discussed inSection 1.3. In Section 1.4, the organization of this thesis will be given.

1.1 Rationale for the Project

Philips Healthcare (PH) has aimed to extract usage behavior of X-Ray machinesin order to analyze and monitor usage of their X-Ray systems and to use thismined information for test purposes. This intend was realized by this masterproject that was supported by Eindhoven University of Technology (TU/e) andPH.

1

CHAPTER 1. INTRODUCTION

The usage of X-Ray machines can be extracted from event logs by using PMtechniques. The event logs generated by X-Ray machines mainly contain quan-titative data of the X-Ray systems such as physical and geometrical parameters.This necessitated to realize a solution for PH using visualization of data in thecontext of PM. For this purpose, the existing Dotted Chart Analysis techniquewas chosen to be extended for PH in order to visualize this quantitative data ofX-Ray systems.

However, it was also required to analyze this quantitative data based on a setof measurements. These measurements are based on X-Ray domain knowledgewhich facilitates the interpretation of analysis results. Therefore, the domainknowledge of X-Ray systems, so-called semantic layer, should have also beenincluded to the PH solution.

This PH specific requirement has been extended to the general research of thismaster thesis. Including the semantic layer to the analysis and visualizationtechniques of PM can support business analyst in the interpretation of the re-sults.

The basic premise underlying this project is:- How can domain knowledge (semantic layer) be included in analysis and visu-alization of business processes?

1.2 Objectives of the Project

The main objective of this project is to make use of semantic layer informa-tion in analysis and visualization of business processes in the context of PM.The information extracted from log files by PM techniques, can be abstractedto semantic level by using so-called ontologies, explicit formal specification of adomain. Since an extended version of Dotted Chart Analysis will be used for thevisualization of quantitative data of PH, the semantic layer information shouldbe included to this PM technique.

The objectives of this master project can be summarized as follows:

• Extension of the existing Dotted Chart Analysis technique to analyze andvisualize semantic layer of business processes (Semantic Dotted Chart).

• Modification of Semantic Dotted Chart Analysis for PH in order to analyzeand visualize quantitative data of X-Ray systems.

• Adding semantic layer information to PH data to facilitate analysis andvisualization of X-Ray domain.

2

1.3. THE APPROACH

1.3 The Approach

Figure 1.1 illustrates the general approach that will be conducted in this project.Dotted Chart Analysis will be extended to Semantic Dotted Chart Analysis inorder to make use of semantic layer for analysis and visualization of businessprocesses. This to-be-implemented Semantic Dotted Chart technique will beworking with a log file that contains semantic layer information. Adding seman-tic layer to the log file (i.e. pre-processing) and utilize this semantic informationwithin the analysis and visualization will be achieved by using ontologies. Pre-processing step will be explained in detail for PH data.

Figure 1.1: Illustration of the general approach used in this project

1.4 Organization of the Thesis

Figure 1.2 illustrates the organization of thesis. The remainder of this thesis isstructured as follows:

• Chapter 2 and 3 give the project description of this master thesis. Chapter2 introduces the related parties of this master project. This chapter alsoprovides background information about the project. Chapter 3 gives boththe general problem definition and PH specific problem definition.

• Chapter 4 elaborates the research area, technologies and related work donein this research area.

• Chapter 5, 6 and 7 discuss the realized solutions as the outputs of thisproject. Chapter 5 describes data preparation phase. Chapter 6 explainsthe Semantic Dotted Chart technique in detail. Chapter 7 discusses therealized solution for PH.

• Chapter 8 concludes this thesis.

3

CHAPTER 1. INTRODUCTION

Figure 1.2: Organization of the thesis

4

Chapter 2

Research Environment

This chapter provides general information about the interested parties of thismaster project. In Section 2.1, a general description to the Philips organizationin which this master project has been performed and the X-Ray systems thatwere analyzed for this project will be given. The background information aboutthe X-ray domain of PH, which is the problem domain of PH, will be given inSection 2.2. Section 2.3 gives a description about the Process Mining ResearchGroup of TU/e and a general description of PM research area.

2.1 Philips in General

This section aims to provide general information about Philips and detailedinformation about the department in which this master project has been per-formed.

2.1.1 Philips Healthcare

The foundations for what was to become one of the world’s biggest health-care, lifestyle and technology companies were laid in Eindhoven, the Nether-lands, in 1891. Philips Electronics N.V. is developed into one of the largestglobal diversified industrial company with sales in 2008 of e 26,385 million anda multinational workforce of 116,000 employees. Philips strives to bring ’Sense& Simplicity’ to consumers by designing products mainly in three business di-visions; namely Lighting, Consumer Lifestyle and Healthcare [9]. The generaloverview of business divisions of Philips can be seen in Figure 2.1.

The healthcare activities of Philips date back to 1918, when it first introduceda medical X-ray tube. By 1933, the company was manufacturing medical X-rayequipment in Europe and the United States. Today, PH is among the top threein the market, offering diagnostic imaging systems, healthcare information tech-nology solutions, patient monitoring and cardiac devices.

PH has around 27,440 employees spread over 63 countries. The two headquartersof the division are located in Best, The Netherlands, and in Andover, United

5

CHAPTER 2. RESEARCH ENVIRONMENT

States of America. Total sales summed up to e 6,742 million over 2006, ande 3,106 million for 2007 [8].

2.1.2 CardioVascular Business Unit

As shown in Figure 2.1 Business Unit (BU) CardioVascular (CV) X-Ray is, to-gether with General X-Ray, Magnetic Resonance, Computed Tomography andNuclear Medicine, part of the Imaging Systems group of PH. CV X-Ray, beingone of the larger BUs, was responsible for delivery of more than 800 systemsand a total revenue of e 864 million in 2008.

CV achieved these successes with multinational employees, of which

• 620 are located in Best (R&D, Customer Services, Marketing, ClinicalScience)

• 180 at IPC in Melbourne, Florida, USA (Manufacturing, Marketing andCustomer Services)

• 60 at Alpha in Mumbai, India (Manufacturing, Marketing and CustomerServices)

The mission of CV is to be the recognized leader of X-Ray centered, minimallyinvasive solutions, designed around patient and clinician, excelling in clinicaloutcome and workflow efficiency [1]. The BU CV makes products for two mainclinical areas, namely Cardio and Vascular, and as the name implies, the Val-idation and Release department is responsible for the validation and release ofthese CV systems.

2.1.3 Validation and Release Department

This master thesis is carried out in the CV, that is a BU of Imaging Systems ofPH, and more specifically in the Validation and Release (VR) department. Themain responsibility of the VR department consists of validation and release ofnewly designed CV products (systems) that are going to be introduced to themarket. For these tasks, VR department should ensure sufficient and provenquality of the system for the release. In order to do so, it is necessary to assurethat the system is performing according to the pre-specified requirements andit meets the customer expectations.

For this purpose, the VR department develops test and release strategies whichare mainly based on product risk analysis, and the organization and planningof the test process. Verification and validation are done for pre-defined qualitycharacteristics; verification is mainly done early in test process based on therequirements in the product design; whereas validation is done at the end oftest process.

The VR department uses ProM framework, which will be explained in Section2.3, to gain more insight into actual usage of CV systems. The frameworkhas been tailored according to PH needs and it is in use with ProMise name.

6

2.1. PHILIPS IN GENERAL

Figure 2.1: The business structure of Philips

The use of ProMise framework supports and improves the test process of VRdepartment.

2.1.4 Allura Xper Productline

The CV BU makes products for two main clinical areas, Cardio and Vascular.CV divides these main clinical areas even further into branches. These mainclinical areas and their branches are described below:

7


• Cardio: all medical issues concerning the heart

– Cardiology: the branch of medicine pertaining to the heart

– Pediatric: the branch of medicine that deals with the medical careof infants, children, and adolescents

– Electro Physiology: the study of the electrical properties of bio-logical cells and tissues. It involves measurements of voltage changeor electrical current flow on a wide variety of scales from single ionchannel proteins to whole tissues like the heart.

• Vascular: all medical issues concerning blood vessels

– Neurology: the branch of medicine dealing with disorders of thenervous system

– Radiology: the medical specialty directing medical imaging tech-nologies to diagnose and sometimes treat diseases not concerning anyof the areas listed above.

These clinical areas are targeted by the Allura Xper productline. The AlluraXper productline consists of x-ray systems designed to diagnose and possiblyassist in the treatment of all kinds of diseases, like heart and lung diseases, bygenerating images of the internal body. The Allura Xper productline is opti-mized for cardiac and vascular procedures, offering excellent image quality atthe lowest possible radiation dose. The Allura Xper enables clinicians to see andwork in the smallest vessels of the heart and head. In addition, Xper technol-ogy allows clinicians to personalize settings and access an intuitive user interfacewhile ensuring that it is integrated with the hospital’s IT network [29].

Allura Xper productline mainly consists of two types of systems. Mono-planesystems have one arm and are capable of acquiring images from one side; whereasbi-plane systems have two arms and can acquire images from two sides simul-taneously. The prefix FD stands for Flat Detector. The suffix 10 or 20 refersto the size of the detector that is installed in inches; a larger detector can scana wider area at once. The bi-plane Allura Xper FD10/10 system is shown inFigure 2.2. Just like other Philips X-Ray systems, Allura Xper products areprovided with the X-ray generator (i.e. tube), which enables optimal imagequality at the lowest possible dose ratio in order to protect people from theX-Ray environment. The tube uses special devices, techniques and practices influoroscopy and exposures to maintain image quality with less patient dose andgathered radiation beams. It is also worth to mention that this tube is the mostexpensive component within such an X-Ray system.

The Allura Xper systems can operate in two different modes: Application andField Service. Application mode refers to the use by the doctors and assistants,i.e. for the treatment of patients. Field Service mode is used to install, up-date, configure, adjust, and do other administrative activities to maintain thesystems.

8

2.1. PHILIPS IN GENERAL

Figure 2.2: Allura Xper FD10/10

2.1.5 Philips Remote Services

The Philips Remote Services (PRS) infrastructure is a vehicle for an activemonitoring of all systems connected and for the supply of services via the in-tranet. Philips’ guaranteed, secure broadband connection delivers remote tech-nical support, monitoring, diagnostics, applications assistance and many addedvalue services. Throughout the whole world Allura Xper systems are connectedto the PRS. Via the PRS network, event logs are downloaded to PH. Using theRemote Analysis, Diagnostics And Reporting (RADAR) system, the event logsare converted to an XML format and stored in an internal database of RADAR.RADAR monitors PH systems and gathers service data via the PRS network.RADAR is not only used to convert the data, but it also acts as a platform forother goals, such as ProClarity (analysis tool to show all kinds of informationabout the connected systems, such as the number of startups, and the type ofthe systems).

Figure 2.3 illustrates the RADAR system in detail. The left part of the figureshows the environment of all these networks and systems within PMS. Collecteddata are sent to the RADAR framework in TXE format. These files are prepro-cessed and converted to XML and finally to CDF file formats. It is also possibleto analyze this data in RADAR data warehouse by analysis tools like ProClar-ity. The lower part of the figure shows the connection between this environment

9


with this master project and the global steps further explained in this thesis. Insummary, the XML event logs are loaded into a conversion framework, calledProMimport, to translate them to MXML log files which can be analyzed byanother software framework, called ProM.

Table 2.1 shows the number of Allura Xper systems that are connected to PRSand RADAR, categorized into system type and region. An Allura Xper systemcan be placed in one of the following regions: North America (NA), Europe,Middle East, and Africa (EMEA), Asia Pacific (APAC), and Latin America(LATAM).

System type NA EMEA APAC LATAM TotalFD10 390 228 116 7 741FD20 581 109 111 15 816

FD10/10 57 31 47 2 137FD20/10 105 69 51 2 227FD20/20 59 46 32 1 138

Total 1195 486 361 29 2071

Table 2.1: Systems connected to PRS and RADAR

Figure 2.3: Systems connected to PRS and RADAR

2.2 Background Information for X-Ray Domain

In order to talk about the PH problem that will be tackled in this master project,general information about the problem domain will be given in this section.

10

2.2. BACKGROUND INFORMATION FOR X-RAY DOMAIN

2.2.1 X-Ray in General

Wilhelm Conrad Röntgen (27 March 1845 10 February 1923) was a Germanphysicist, who, on 8 November 1895, detected electromagnetic radiation by ac-cident in a wavelength range today known as X-rays [25]. In order to signifyan unknown type of radiation, he called the phenomenon X-radiation, thoughit also became known as Röntgen radiation. In 1901, he was awarded the veryfirst Nobel Prize in Physics [6].

The discovery of X-rays changed the whole conception of human of the structureof matter, paving the way for quantum physics and the theories of relativity.In addition to their ability to penetrate the human body, X-rays provide newinsights into the structure of the atom, and the composition of stars. Radiology,as described in section 2.1.4 is the most common use of X-Ray technology, dealswith the study and application of imaging technology to diagnose and treatdiseases [35]. Experience gained with X-rays in diagnostic imaging has alsoled to such new medical imaging techniques as Computed tomography (CT)and Magnetic Resonance Imaging (MRI), as well as techniques currently underdevelopment [28]. X-rays are especially useful in the detection of pathology ofthe skeletal system, but are also useful for detecting some disease processes insoft tissues such as the brain or muscle.

2.2.2 X-ray Properties

X-rays belong to the group of electromagnetic radiation, which travel as wavesin space, and behave according to the square law, meaning that the X-ray in-tensity decreases with the square of the source distance increase [4].

X-ray is able to penetrate materials, depending on:

• The X-ray hardness (i.e. wavelength); the longer the wavelength (softradiation), the more x-rays are absorbed in the materials.

• The material nature; the higher the atom numbers in the material, themore x-rays are absorbed in the material.

• The material density.

• The material thickness.

Materials attenuation determines the reduction of X-ray intensity.

Human body, exposed to X-rays, absorbs part of the X-rays (as mentionedabove, depending on the different materials, with different densities and differ-ent thicknesses in the body). Only the X-rays that leave the patient contributeto the generated image in image detection. This results in a shadow image thatcontains the superposition of all penetrated objects, with their different absorp-tion.

11


X-ray effects living tissues because X-rays hamper growth, destroy tissues andprovoke inflammations. Therefore, the effect of X-rays is not a curative orconstructive one, but a damaging one.

2.2.3 X-Ray Generator

An X-ray machine is a device used by radiographers to acquire an X-ray imageand this machine mainly consists of an X-ray source or generator (X-ray tube),and an image detection system which can either be comprised of film (analogtechnology) or a digital capture system (such as a picture archiving and com-munication system).

An X-ray generator is a high vacuum tube in which X-rays are generated byaccelerating electrons to a high velocity with a high-voltage field and causingthem to collide with a target, the anode plate [4]. For more detailed explanationabout X-Ray generators, please refer to Appendix A. It is also worth to notethat, an X-Ray generator is the most expensive part of an X-Ray machine.

Three parameters determine the generation of X-rays:

• Tube Anode-Cathode Voltage (kV): the high voltage, or tube voltage,measured in kV between the anode and the cathode (anode is positive andcathode is negative). This high voltage determines the X-ray spectrum orhardness of the X-ray radiation, and as such the contrast of the image.

• Tube Current (mA): the Tube Current, measured in mA,determinesthe X-ray intensity.

• Time(ms): the duration time of X-ray production.

2.2.4 Image Acquisition

As mentioned in section 2.2.3, an X-ray Generator is responsible for the X-raygeneration function which produces the X-ray beam. The Image AcquisitionControlling unit is responsible for the image acquisition function, which can bedescribed as: to move the patient and the X-ray beam into a suitable positionand to generate electronic images of the patient by exposing the patient withX-rays and converting the X-ray image behind the patient into electronic im-ages. A general overview of image acquisition is illustrated in Figure 2.4.

In X-ray applications, a physician looks at images generated by X-Ray machines.The quality of the image (average grey-level) is very important, since it is theimage that the physician uses for diagnosis. During the actual X-ray acquisitionrun, combination of special filtering mechanisms, bright contrast, high resolu-tions and the like attributes determines the quality of acquired images.

There are two different types of image acquisition modes:

• Fluoroscopy: the common name for the production of real-time images,directly visible to the eye during an X-ray examination. Fluoroscopy is

12

2.2. BACKGROUND INFORMATION FOR X-RAY DOMAIN

Figure 2.4: Schematical Overview of Image Acquisition [4]

mainly used as viewing and positioning aid - like the viewer of a photocamera - before the final exposure are made. Since the generated imagesonly guide for positioning, and not for diagnosis, fluoroscopy generateslow quality images with low X-Ray dose.

• Exposure: the second image acquisition mode where the images arestored for later viewing. Since the acquired images are used for diagnosis,exposure mode generates high quality images with high X-Ray dose.

An actual image acquisition run (also known as X-Ray run) is initiated by theuser and following functions start in a synchronized way:

• X-ray Generation

• Image Detection

• if desired special patient and beam movements.

13


2.2.5 Acquired Run Information

When an X-Ray run is initiated, the functions described above generate eventsin log files with corresponding parameters and their used values. Followings arethe examples of parameters generated and logged when an X-Ray run occurred:the tube parameters (e.g. tube anode-cathode voltage (kV), tube current (mA),time(ms)), X-Ray dosage parameters, applied technique, etc. For detailed ex-planation of run parameters, please refer to Appendix B. It should be notedthat, most of these parameters are not insightful enough by themselves. There-fore, the analyst is mostly interested in interpreting the results of formulas ormetrics applied to these parameters.

These run parameters are mainly used for the analysis of the followings:

• Safety Monitoring

• X-Ray Dose Control

• Image Quality

• Image Processing (e.g. sharpness, focus, etc.)

The results of the analysis of these run parameters can be used to make anoptimization sequence of X-Ray systems. For example, combination of imagequality and X-Ray dose controls can be optimized and applied to X-Ray systemsin the long term.

As described in Section 2.1.5, event log files are collected for X-ray systemsthat are currently being used in hospitals throughout the world. This loggeddata can be used to monitor and analyze with what values these functions ofthe systems are being used in the field(hospitals). So, the general aim of theproject is to convert this quantitative data, currently stored in log files, intoinformation.

2.2.6 X-Ray Generator Activities

There should be certain level of service that PH should provide to its cus-tomers(hospitals) specific to the X-Ray tube and its usage. Since the results ofthis master project would also be used to improve PH customer services in thelong run, three main X-Ray tube related activities will be shortly described here.The relation of these activities with the result of this project will be explainedin the next section.

• Service Contracts: As the name implies, this is a contract where thelevel of service for the X-Ray machine is formally defined and this contractis signed by two parties, PH and hospitals. The agreed technical supportthat is provided by PH and received by hospitals is the subject of thislegal contract.

• Calibration Activities: One of the agreements take place in ServiceContracts is calibration and correction activities which should be done by

14

2.3. PROM RESEARCH GROUP

PH technical support once in each half year. The aim of these activitiesis to guarantee the lifespan of X-Ray tube. However, when hospitals donot have any complains about the X-Ray machine, they can simply ignorethese activities.

• Loadibility Tests: This is one of the tests performed by the VR depart-ment before the release of the systems. The tube load is a combinationof X-Ray tube capabilities, the required image quality and working speedwithin a clinical procedure. For the best image quality, the requirement isto use all the available power. However, using the X-ray tube on maximumpower will heat up the X-ray tube rapidly which leads to tube overloadand that results in unwanted cool down waiting time for the tube. Forthis reason, VR department applies Loadibility Tests in order to ensuretube load and tube load indication with worst case (high-load) and bestcase (low-load) scenarios.

2.3 ProM Research Group

The Process Mining (PM) research group of Mathematics and Computer Sci-ence Department of TU/e was initiated by Prof. Wil van der Aalst. This groupmainly works on PM research which is concerned with the extraction of knowl-edge about a (business) process from its process execution logs [42]. Examplesof logs include process data generated by administrative services, health caredata about patient handling, and logs of workflow tools. PM is an emergingresearch area to fulfill the gap between the systems available for supporting theexecution of (business) processes and the current research areas for monitor-ing and analyzing this process execution in the organization. PM strives toprovide insights into various perspectives, such as the process (or control flow)perspective, the performance, data, and organizational perspective. Many ma-chine learning and data mining techniques have successfully been applied in thisfield by PM research group and have been implemented in the open-source toolProM (Process Mining Toolkit).

ProM is a generic open-source framework for implementing process mining tech-niques in a platform-independent environment. Combining efforts from re-searchers all over the world, the framework currently consists of more than230 plug-ins for mining, analysis and monitoring, and the import of (and theconversion between) several process modeling languages, such as Petri nets,Event-driven Process Chains (EPCs), etc., is supported as well.

Some examples for mining plugins are as follows:

• Plugins supporting control-flow mining techniques and for mining less-structured, flexible processes (such as the Alpha algorithm, Fuzzy Miner,Heuristic Miner, Genetic mining, Multi-phase mining, etc.)

• Plugins analyzing the organizational perspective (such as the Social Net-work miner, the Staff Assignment miner, etc.)

15


• Plugins dealing with the data perspective (such as the Decision miner,etc.)

• Elaborate data visualization plugins (such as the Cloud Chamber Miner)

Furthermore, these are the examples for analysis plugins:

• The verification of process models (e.g., Woflan analysis)

• Verification of Linear Temporal Logic (LTL) formulas on a log

• Checking the conformance between a given process model and a log

• Performance analysis (Basic statistical analysis, and Performance Analysiswith a given process model)

• The analysis of the semantic layer of Semantic Business Process Manage-ment Systems (SBPMS) with plugins like Semantic LTL Checker, Perfor-mance Metrics in Ontologies, etc.

The ProM framework receives input logs as in the Mining XML (MXML) format.Logs from all kinds of popular information systems can easily be converted toMXML formatted logs by the open-source ProMimport framework, which hasalso been implemented by PM research group [10].

2.4 Process Mining

As the information systems used by organizations evolve from simple to complexsystems, managing business processes that are linked to these information sys-tems became crucial. In the context of Business Process Management (BPM)life-cycle, the need for analyses and monitoring business processes has creatednew technologies, such as Business Process Analysis (BPA), Business Intelli-gence(BI), etc. [41]. PM, which is an emerging research area and that wasinitiated by ProM research group, is used to discover, monitor, analyze, diag-nose and improve real processes by extracting knowledge from event logs [16],[17]. PM is useful in several phases of a BPM life-cycle: design, analyses, mon-itoring and redesigning business processes.

Event logs can differ in structure. For example, the audit trail entries of aWFM or the transaction logs of an ERP system record executions of businessprocesses in a well-structured and detailed way. On the other hand, electronicpatient records achieved by different departments visited by a patient in a hospi-tal can be less-structured and complicated manner. However, all so-called eventlogs have one thing in common. They show occurrences of events at specificmoments in time, where each event refers to a specific process and an instanceof this process, which is also called a case [43]. Until now, PM has been appliedto high-end copiers, web services, wafer steppers, careflows in hospitals, etc.Therefore, PM is not limited to information systems and it can also be used tomine all kinds of business processes and systems [15].

16

2.4. PROCESS MINING

Figure 2.5: Overview of Process Mining [15]

Figure 2.5 illustrates the overview of PM. As already mentioned, the basic ideaof PM is to learn from observed executions of a process. Information systemscontrol and/or support “real”world processes and record the execution of theseprocesses. Depending on the existence of a-priori model, three different classesof PM techniques can be applied to event logs. These three classes are [37] :

• Process Discovery: Traditional PM techniques have been focusing ondiscovery, i.e. deriving information about the original process model, theorganizational context, and execution properties from enactment logs.There is no a-priori model, i.e. based on an event log some model isconstructed.

• Conformance Checking: Techniques of this class compare an a-priorimodel with the observed behavior as recorded in the log. In this case,there is an a-priori model. This model is used to check if reality conformsto the model.

• Extension: There are different ways to extend a given process model withadditional perspectives based on event logs. There is an a-priori model.This model is extended with a new aspect or perspective, i.e. the goal isnot to check conformance but to enrich the model with the data in theevent log.

In the context of PM, by using techniques belonging to one of these three classes,a process can be mined and analyzed in at least three different perspectives [43]:

17


• Process Perspective: The process perspective focuses on the controlflow, i.e. the ordering of tasks/activities. The goal of mining this perspec-tive is to find a good model describing the process under consideration.

• Organizational Perspective: The organizational perspective focuseson the originator, i.e. which performers are involved and how are theyrelated. The goal is to either structure the organization by classifyingpeople in terms of roles and organizational units or to show relation be-tween individual performers.

• Case Perspective: The case perspective focuses on properties of cases(in particular data elements of processes). Cases can be characterized bytheir path in the process, by the values of the corresponding data elementsor by the originators working on a case.

18

Chapter 3

Project Description

This chapter aims to describe the problem definition that is tackled in thismaster project. Firstly, the general description and then PH specific problemdescription will be given in Section 3.1 and Section 3.2, respectively. Subse-quently, in each section, main motivations for the project will be discussed.Finally, a detailed description of the assignment will be given.

3.1 General Problem Definition

As mentioned in Section 2.4, the basic idea of PM is to discover, monitor, andimprove real processes (i.e. not assumed processes) by extracting knowledgefrom event logs [42], [37], [15]. The main purpose of PM is to learn from ob-served executions of a process and can be used to (1) discover new models (e.g.constructing a Petri net that is able to reproduce the observed behavior), (2)check the conformance of a model by checking whether the modeled behaviormatches the observed behavior, and (3) extend an existing model by projectinginformation extracted from the logs onto some initial model (e.g., show bot-tlenecks in a process model by analyzing the event log) [29]. All these PMtechniques have in common that they provide a graphical representation for theextracted information. For PM technique to be effective, it is important thatthe graphical representation should be simple enough to be easily understood,but complete enough to reveal all the information present in the model.

The Dotted Chart plug-in, which will be explained in detail in Section 6.2, isone of the most robust and useful ProM plug-ins and it provides a useful graph-ical representation to derive new insights about the execution of processes. Theplug-in shows process events in a graphical way such that the analyst gets a“helicopter view”of the process and is able to immediately spot opportunitiesfor process improvement. Although this plug-in is able to show multiple viewsof a process (e.g. events, originators, etc.), it lacks of visualizing the data valuesof processes. This limitation of Dotted Chart Analysis plug-in necessitates anextension in order to represent data values of processes.

When there is a need for analysis of organization processes and resources, which

19

CHAPTER 3. PROJECT DESCRIPTION

meet a given specific definition, an implicit knowledge or an internal documentshould be taken into account. As an example, when a manager wants to knowperformance of a particular department, employees of that department shouldbe identified firstly and analyzed by PM techniques, and then this gathered datashould be combined and aggregated in order to see the bigger picture. There-fore, syntactical analysis (i.e. the analysis based on labels in log files) are centricto implicit knowledge and human labor, which make the analysis imperfect.

Although current PM algorithms and techniques are mature enough, the anal-ysis they support is somewhat limited since it is purely based on labels in logs.This means that these techniques cannot benefit from the actual semantics be-hind these labels [20]. If PM techniques could benefit from the real semanticsof these labels and reason over these semantics, then the implicit knowledge inthe head of business analyst and internal documents in the process of analysiswould be unnecessary. This would yield more advanced and faster analysis pro-cess and more detailed and accurate results.

These mentioned two limitations constructed the main requirements of the newSemantic Dotted Chart Analysis plug-in, which will be one of the deliverablesof this master project.

3.1.1 Motivations for the Semantic Dotted Chart plug-in

The main motivation of the Semantic Dotted Chart plug-in is to mine semanticlayer of log files for better analysis results. In a nutshell, semantic layer can bedefined as the conceptualized representation (which is embedded into ontolo-gies) of the domain. This domain knowledge should be used for more accuratemining results. Existing Dotted Chart plug-in is also limited to labels in thelog files. Therefore, the Dotted Chart plug-in should be extended in order toanalyze processes at the semantic level.

Moreover, as already mentioned, existing Dotted Chart plug-in does not usedata values. The Semantic Dotted Chart plug-in should be able to visualizedata values of processes to gain insight and knowledge from data of processesand its patterns and relationships. This would yield more robust analysis plug-inas well as more accurate analysis results.

3.2 PH Problem Definition

This section provides the motivations and general requirements of the PH as-signment.

3.2.1 Motivation for the PH Plug-in

As already mentioned, PH uses ProM framework and some of the PM tech-niques. A new necessity for PH in the existing ProMise framework is to have ananalysis and visualization plug-in in order to analyze the quantitative data ofX-Ray generator. One of the deliverables of this master project will be a new

20

3.2. PH PROBLEM DEFINITION

analysis plug-in for PH, which will be the tailored version of Semantic DottedChart plug-in.

The main motivations to apply PM techniques to PH event logs and have suchan analysis plug-in for X-Ray tubes can be summarized as follows:

• From the testing point of view, the results of the analysis can simplify thetesting phase of the system and can improve the quality of tests. Currently,“Loadibility Tests”, previously described in Section 2.2.6, are applied tothe system in the assumed ranges of worst case scenario (high-load) tolight usage of the tube (low-load). After the analysis of the real field data,these tests can be simplified for smaller ranges or for different scenarios.

• X-Ray user profiles and the usage of the tube in the field informationcan be extracted. This information would allow to visualize and comparehospital and/or region tube usages in an easy way. Furthermore, this com-parison would help to optimize “Service Contracts”, described in Section2.2.6, within that hospital/region.

• As already mentioned, this tube is one of the most expensive devices withinan X-Ray system. When a problem occurs within the guarantee period ofthe system, the analyst of PH can determine if the problem occurred foran extensive use of the tube and hence, whether the repairment should beincluded to the guarantee (coverage) or not with the analysis results ofthis plug-in.

• In combination with the previous item, tube repairment and replacementpredictions can easily be done with the results of this analysis.

• This plug-in would also allow to check “Calibration Activities”, describedin Section 2.2.6, and according to the results; if a validation occurs, legalconsequences of this validation can be thought of.

• For longer term consequences, results of the analysis can help to applyprotection on the tube by changing settings of the system.

3.2.2 General Requirements for the PH Plug-in

Together with the given motivations, following general requirements constructedthe description of the PH assignment.

• The first and one the most important requirements of the plug-in is tohave a general purpose mechanism for X-Ray parameters, which shouldbe applicable for all kinds of newly added parameters. As an example,if a new hardware is added to the X-Ray machine and this componentwould also generate parameters that are logged in PH event log, then itis required to add these new parameters of the new component to theanalysis without any programming effort.

• In order to cater for a general purpose mechanism, the plug-in should alsosupport users in defining their own metrics in a simple yet operationalway. This is also required since users are only interested in the outcome

21

CHAPTER 3. PROJECT DESCRIPTION

of the metrics definitions but not in parameter definition itself. Theseuser-defined formulas and metrics definitions are not fixed and they aresubject to change.

• The plug-in should support user-defined specifications (e.g. formulas, met-rics, etc.) combined with statistical information about radiation runs.

• The extracted information should be presented in a simple yet meaningfulway, better supporting analysts in the interpretation of usage of the tube.

• Performance or computation time is also one of the important require-ments of the plug-in. Although the execution time will depend on theselected number of systems, selected time interval and number of radia-tion runs, the plug-in should be able to handle one month data of onesystem within ten seconds as the minimum requirement.

3.3 Description of the Assignment

For the given problem definition and general requirements, the output of thismaster thesis will be the new Semantic Dotted Chart plug-in for ProM frame-work and tailored version of it for PH. Since existing Dotted Chart focus onoccurrence of events rather than data values, this new plug-in should visualizethe data values of processes. Moreover, this new plug-in should incorporatesemantic layer analysis based on ontologies for better analysis results.

The tailored version of this plug-in for PH will focus on usage information ofX-Ray generator of Allura XPer product line. In order to acquire insight intohow the X-Ray generators are being used by actual customers in hospitals, theevent logs provided by PRS and RADAR frameworks can be analyzed by PMwhich targets the automatic discovery of information from event logs [42].

Parameters that are recorded in log files by X-ray machines are not insight-ful by themselves. Rather than directly analyzing those parameters’ values,this logged data needs to be analyzed by its semantic information. The useof semantic information yields a general purpose analysis technique that allowsanalysts to use their own metrics definitions, measurements, domain knowledgeand their own terminology. Thus, making use of semantic information of theseparameters is an important part of this project.

To summarize, the assignment that is tackled in this master thesis as follows:

”This assignment intends to provide visualization and semantic analysis ofevent log files. The output should provide the visualization of quantitative data

and ontological basis, which allows for the easy selection of data andmaintenance of metrics.”

The assignment can be divided into two main sub-assignments: the first sub-assignment aims to develop a new generic plug-in for ProM framework thatanalyzes any kind of log files. The second sub-assignment provides a new plug-in to be submitted to PH. Each of these plug-ins require a pre-processing data

22

3.3. DESCRIPTION OF THE ASSIGNMENT

part in order to add semantic information to log files and to convert these logfiles to a format that can be read by ProM framework. This pre-processingphase will be explained in detail in Section 5 for PH event log files. In orderto include semantic information to log files, related Semantic work (SemanticProcess Mining, Semantic Web Technologies) will be explained in detail be-fore pre-processing phase. Figure 3.1 gives an overview of the structure of theremainder of this thesis.

Figure 3.1: Outline of the thesis

23

Chapter 4

Semantic Process Mining

This chapter gives the foundation for this master thesis. Research areas andrelated work done in these areas are provided to give the technical backgroundinformation for this project. Semantic Process Mining (SPM) and related se-mantic technologies are discussed in Section 4.1 and Section 4.2, respectively.The last section provides information about the existing semantic plug-ins thatare used in this master project.

4.1 Semantic Process Mining

Semantic Business Process Management (SBPM) is a recent and promising re-search area that aims to overcome the Business-IT gap and to achieve higherdegree of automation in BPM by making use of semantic technologies. Similarto how Semantic Web Services achieve more automation in discovery and me-diation as compared to conventional Web services, in SBPM more automationshould be achieved in process modeling, implementation, execution and mon-itoring phases by using so-called ontologies, explicit formal specification of adomain, will be explained in detail in Section 4.2.1 and Semantic Web Tech-nologies [44].

In a nutshell, the use of Semantic Web technologies, in particular ontologies,offers a suitable framework to improve BPM life-cycle with the following reasons[33]:

• The fundamental approach is to represent both the business perspectiveand the IT perspective of enterprises using a set of ontologies, and to usemachine reasoning for carrying out or supporting the translation tasksbetween the two worlds.

• Ontologies are particularly well-suited for defining shared conceptualiza-tions in order to support the integration of heterogeneous and interorgan-isational sources of information.

• Ontologies are not simply used to represent conceptualizations in the styleof a data model, but to embody relationships and inferences over such

25

CHAPTER 4. SEMANTIC PROCESS MINING

conceptualizations, an important task to relate business and IT views ofprocesses ontologically.

• Having a formal definition, ontologies are amenable to automated rea-soning, providing the flexibility required for navigating through differentlevels of abstraction and querying the overall body of knowledge aboutthe business processes.

• Ontologies yield to properly correlate the data between business and ITworlds at runtime. Automating this, as necessary for what is commonlyreferred to as Business Process Intelligence, requires semantic informationthat spans these layers of abstraction and which should be easily retrievedfrom audit trails.

• The ontologies are part of an extensive formalization of the BPM domainand therefore allow accessing the whole body of knowledge about pro-cesses, organizations or IT systems in order to support making queries atdifferent levels of abstraction.

As already mentioned in Section 3.1, the idea of using semantic technologies tosupport all phases of the BPM life-cycle was developed in the context of theEuropean project SUPER [3]. SUPER “aims to provide a semantic-based andcontext-aware framework, based on Semantic Web Services technology that ac-quires, organizes, shares and uses the knowledge embedded in business processeswithin existing IT systems and software, and within employees heads, in orderto make companies more adaptive”[3].

The use of semantic technologies does not affect the main phases of the BPMlife-cycle, but the automation degree within the phases is increased and existingfunctionalities can be extended. Semantic Process Mining (SPM), which is oneof the features of Semantic Business Process Analysis (SBPA) phase, appliesexisting process mining approach by using and reasoning semantics of event logfiles. The basic elements of SPM are ontologies, will be explained in Section(4.2.1) and ontology reasoners, in Section (4.2.5). An ontology, in short, can bedescribed as an explicit formal specification of a domain [24]. In this domain,the set of shared concepts necessary for the analysis can be defined and theirrelationships and properties can be formalized. The ontology reasoner infers“new” (i.e. not explicitly stated) information over the ontologies [18].

Figure 4.1 illustrates the overview of SPM. As previously stated in Section 2.4,the current PM techniques, in discovery, conformance and extension classes, arealready quite powerful. However, the analysis they support is somewhat limitedbecause it is purely syntactic, i.e. based on labels in logs [20]. In other words,existing PM techniques are “unable to reason over the concepts behind the la-bels in the log, thus the actual semantics behind these labels remain in the headof the business analyst which has to interpret them”[19]. By using conceptsin ontologies to reference the elements of log files and deriving new knowledgefrom these ontologies by reasoners, the level of abstraction in SPM techniquesis raised from the syntactical level to semantical level.

26

4.1. SEMANTIC PROCESS MINING

Figure 4.1: Overview of Semantic Process Mining [19]

The use of semantics can improve existing PM techniques in the following man-ner [19]:

• Process Discovery: Currently, these techniques mainly discover a flatmodel showing all the tasks encountered in the log. Consequently, a singlelarge model is shown without any hierarchy or structuring. However, ifthe tasks in these instances would link to concepts in ontologies, subsump-tion relations over these ontologies could be used to aggregate tasks and,therefore, mine hierarchical process models supporting different levels ofabstraction.

• Conformance Checking: Techniques of this class require an exact matchbetween the elements (or strings) in the log and the corresponding el-ements in the models. As a consequence, many defined models cannotbe reused over different logs because these logs do not contain the samestrings as the elements in the models. When ontologies are used, thesemodels can be defined over concepts and, as far as the elements in differentlogs link to the same concepts (or super/sub concepts of these concepts),the conformance can be assessed without requiring any modification of themodels or the logs.

• Extension: These techniques enhance models based on information minedfrom event logs. Like the conformance checking techniques, the enhance-ments are only possible with an exact match between elements in models

27


and logs. Thus, the use of ontologies would bring this match to the con-cept level and, therefore, models could also be extended based on differentlogs.

Some of the ideas given above have already been implemented and currentlyavailable in ProM. Semantic related plug-ins can be found in ProM 5.0 or higherversions which can be downloaded from ProM website [10]. These plug-ins areexplained in detail in deliverable D6.5 for SUPER project and can be accessedfrom SUPER website [3].

4.1.1 Semantic Process Mining for PH

This master project is the first attempt to use SPM techniques for PH data. Anoverview of SPM for PH processes is illustrated in Figure 4.2. X-ray machinesof PH are being used in hospitals to support people in whole around world. Theusage of the Allura Xper systems is represented by the world box, the AlluraXper systems themselves are represented by the information system box, andthe event logs that are generated by the systems are represented by the eventlogs box. Through PRS and RADAR, the event logs can be downloaded fromthe Allura Xper systems. These event logs are the enablers for PM techniquesfor PH and PM techniques can be used to analyze the usage information ofAllura XPer systems. The new plug-in that will be used for the analysis is rep-resented by the (process) model box in the Figure 4.2.

In this project, this current situation of PM is extended for PH by using Seman-tic Web technologies. Semantic information about X-Ray domain is includedto event logs in pre-processing phase (will be described in Chapter 5) by usingontologies and ontology reasoners. For the discovery and analysis of X-ray runs,a tailored version of Semantic Dotted Chart is used in PH. This tailored plug-inis also using ontologies and ontology reasoners to mine semantic information ofX-ray domain.

4.2 Semantic Technologies

The Semantic Web is a collaborative effort led by the World Wide Web Consor-tium (W3C) with participation from a large number of researchers and industrialpartners for the last few years. The initiative was inspired by the vision of itsfounder, Tim Berners-Lee, of a more flexible, integrated, automatic and self-adapting Web, providing a richer and more interactive experience for users. Itis envisioned to be an extension of the current Web, in which information isannotated with meaning, in order to provide better machine-processability andautomated reasoning. Such provision, would enable better cooperation betweencomputers and people [31]. Being a novel research field, some emerging tech-nologies and standards are being developed to realize the Semantic Web.

These emerging technologies for the Semantic Web are discussed in this section.It is important to stress that, it is not possible in the scope of this thesis toexplain these technologies from every aspect. Thus, in the following sections,the technologies that were used in this master project will be discussed in detail.

28

4.2. SEMANTIC TECHNOLOGIES

Figure 4.2: An overview of SPM for PH

4.2.1 Ontologies

Although ontologies can be seen as one of the pillars of Semantic Web, the termhas a longer history than Semantic Web. The term ”ontology” originally comesfrom the field of philosophy, in which it refers to the subject of existence. Inthe last two decades, this term borrowed and has gained significant popularityin computer science and information systems research areas.

From the most popular definition, an ontology can be defined as a formal spec-ification of a shared conceptualization [7]. A conceptualization is an abstract,simplified view of the world that we wish to represent for some purpose. Everyknowledge base, knowledge-based system, or knowledge-level agent is commit-ted to some conceptualization, explicitly or implicitly [26].

The popular definition of ontology is evolved to “an explicit, machine readablespecification of a shared conceptualization” [39] in computer related research ar-eas. One of the intents of using ontologies in computer science and informationsystems research areas is mainly to specify an abstract model (i.e. conceptu-alization) of some domain, real or imagined. Conceptualizing a domain thatis independent of its particular form brings interoperability between multiple

29


representations of reality. This reality can be a data, system and/or businessprocess models residing inside computer systems.

Ontologies became an important part of the W3C standards stack for the Se-mantic Web, in which they are used to specify standard conceptual vocabular-ies in which to exchange data among systems, provide services for answeringqueries, publish reusable knowledge bases, and offer services to facilitate inter-operability across multiple, heterogeneous systems and databases.

4.2.2 WSML Ontology Language

As described in the previous section, an ontology is an explicit specification ofa conceptualization for a shared domain (i.e. definitions of classes, relations,functions, and other objects). This specification should be formulated in somerepresentation language to allow encoding of knowledge about that domain andreasoning that support the processing of that knowledge. These representationlanguages, which are used to construct ontologies, are called ontology lan-guages.

Several ontology languages have been developed during the last few years. Re-source Description Framework (RDF) and Web Ontology Language (OWL) arejust two examples of currently available ontology languages [2]. The use any ofthese ontology languages in the context of the Semantic Web is possible in orderto describe ontologies. However, analysis and explanation of these languages areout of scope of this master thesis. Therefore, the ontology language of this mas-ter project, WSML, and its components used in this project will be explainedbriefly. More detailed information about the WSML ontology language can befound in Appendix C.

WSML is a concrete formal language based on the conceptual model of WSMO,which provides an ontology based framework, which supports the deploymentand interoperability of Semantic Web Services [38]. In a nutshell, the WSMOhas four main components [18]:

• Ontologies: provide formal and explicit specifications of the vocabularyused by the other modeling elements in WSMO.

• Goals: a mechanism for describing the requirements that a given servicerequester has when searching for services that meet these requirements.

• Mediators: are used as connectors between components to provide in-teroperability among them. Four kinds of mediators are built: ontologymediators, mediators between Web services, mediators between goals andmediators between Web services and goals.

• Web Services: are semantic descriptions of Web Services (i.e. function-alities accessible over the Web). They may include functional (Capability)and usage (Interface) descriptions.

30

4.2. SEMANTIC TECHNOLOGIES

4.2.3 Ontologies in WSML

An ontology in WSML consists of the elements concept, relation, instance andaxiom. A namespace declaration can appear at the beginning of each WSMLfile. Such a declaration may comprise the default namespace and abbreviationsfor other usages. Additionally, an ontology can import other ontologies.

• Concept: The notion of concepts (sometimes also called classes) plays acentral role in ontologies. Concepts can be abstract or concrete, elemen-tary or composite, real or fictitious; in short, a concept can be anythingabout which something is said, and, therefore, could also be the descrip-tion of a task, function, action, strategy, reasoning process, and so on.Taxonomies and hierarchies of the domain are created by using the key-word subConceptOf followed by one or more concept identifiers.A concept can have instances and can have a number of attributes asso-ciated with it. Each attribute definition can have a number of associatedfeatures,namely, transitivity, symmetry, reflexivity, and the inverse of anattribute, as well as minimal and maximal cardinality constraints. WSMLallows inheritance of attribute definitions, which means that a concept in-herits all attribute definitions of its superconcepts.

• Relation: Relations in WSML can be used in order to model interde-pendencies between several concepts (respectively instances of these con-cepts). WSML allows the specification of relations with arbitrary arity,and organization in a hierarchy using subRelationOf.For concepts, the exact meaning of a relation can be defined using axioms.As with concepts, it is recommended that related axioms are indicated us-ing the annotation dc#relation.

• Instance: Instances represent elements in the domain attached to a spe-cific concept. A concept may have a number of instances associated withit. The memberOf keyword identifies the concept to which the instancebelongs.Instances explicitly specified in an ontology are those which are sharedtogether as part of the ontology. However, most instance data exists out-side the ontology in private data stores. It is not recommended to linkan instance store to a WSML ontology. This would be done outside theontology definition, since an ontology is shared and can thus be used incombination with different instance stores.

• Axiom: Axioms provide a means to add arbitrary logical expressions toan ontology. Such logical expressions can be used for several purposes,such as constraining information, verifying correctness, or deducting newinformation.The need for axioms is application-dependent. Although they are not cur-rently widely used, they will become an important factor in Semantic Webapplications, because new knowledge will be deducted when looking forinformation, inconsistencies will be detected when processing millions ofWeb pages, and so on [22].

31


4.2.4 Why to use WSML?

As mentioned previously, one of the deliverables of this project will be a newplug-in for ProM. All other semantic plug-ins developed in ProM are based onWSML ontologies and consequently the WSML2Reasoner framework, which isused to perform all the necessary reasoning over the ontologies, a language fromWSML family should have been chosen for this master project in order not todeal with compatibility issues. WSML-Flight is the language choice for repre-senting ontologies in this master project, mainly because of its expressivenessand its efficient reasoning.

4.2.5 Reasoning

In computer science, reasoning is commonly understood as the process of in-ferring “new”(i.e. not explicitly stated) information about some domain ofdiscourse from a given (formal) model of that domain [18]. When the model ofthe unknown domain is sufficient and rich enough, it is possible to act in thedomain, as if it is known already. Therefore, the reasoning mechanism allowscomputers to act in an informed manner in unknown domains and to be moreflexible if they face situations which are not covered literally in the control pro-gram.

Using a formal language for the modeling of the domain is the key for the rea-soning mechanism. Just as the heterogeneity of formal languages currently existin Semantic Web (as described in Section 4.2.2), it is also heterogeneous in itsreasoning forms. These diverse reasoning mechanisms in Semantic Web allow todiscover currently available Web services that are able to achieve a certain clientgoal, to automatically compose a complex Web service from a set of simpler Webservices which are known to a Web service repository, or to check if two or moreWeb services can successfully interact with each other (given knowledge abouttheir behavior or communication model).

As already mentioned, since WSML is the chosen ontology language for this mas-ter project, the following subsection explains ontology reasoning with WSML,particularly WSML-Flight language, in detail.

4.2.5.1 Ontology Reasoning with WSML

Ontologies represent two different types of knowledge about a domain underconsideration. The first one reflects general knowledge of schema and/or termi-nology (by making use of concept and relation elements) for the domain, whereasthe second one describes knowledge based on facts for a specific situation (byusing instance elements). In parallel to database systems, schema knowledge ofan ontology is similar to database schema while the specific knowledge corre-sponds to real data in a database. In principle, ontologies are generally used forthe representation of terminological knowledge. Specific knowledge about datasources is added in an application-dependent manner. This would yield to usethe same ontology for different applications.

32

4.3. SEMANTIC PROCESS MINING IN PROM

Therefore, ontology reasoning can be used for two different purposes:

1. to derive terminological and schematic knowledge about a domain andfind human mistakes (consistency checking).

2. to check a specific situation regarding a domain, which is similar to query-ing a database.

In this master project, both of these given purposes of ontology reasoning areused. As the rule-based WSML-Flight language is chosen as the formal ontol-ogy language, the reasoning mechanism was layered on a Datalog engine [13].The WSML2Reasoner framework [14] implements reasoning with WSML on-tologies. Integrated Rule Inference System (IRIS) [5] is a datalog engine, whichworks together with the WSML2Reasoner framework in order to support queryanswering for WSML-Flight. The WSML2Reasoner framework can translateontology description in WSML to predicates and datalog rules. The reasoningmechanism of IRIS is based on deductive database algorithms like semi-naive al-gorithms, dynamic filtering, and well-founded evaluation with alternating fixedpoint computation [18].

4.3 Semantic Process Mining in ProM

As indicated in deliverable D6.5 for SUPER project [3], seven SPM algorithmshave been added to the ProM tool so far. In order to support using semnaticinformation within ProM framework, the input log file of ProM is extendedto support semantic annotations and this log file format will be explained infollowing Section 4.3.1. In Section 4.3.2, semantic plug-ins that are constructivefor the Semantic Dotted Chart will be discussed.

4.3.1 Input Log File

As already mentioned, the pre-processing phase consists of preparation of datafor the rest of the SPM process. After the preparation of the ontology, whichis used as the semantic linking of data, the raw data should be converted to aformat on which the SPM techniques can be applied on. The ProM tool has acommon format, called Mining XML (MXML), which allows the representationof the different elements in event logs. The MXML format started as an initia-tive to share a common input format among different mining tools. This way,event logs could be shared among different mining tools [20].

As it is shown in Figure 4.3, an event log (element WorkflowLog) contains theexecution of one or more processes (element Process), and optional informationabout the source program that generated the log (element Source) and addi-tional data elements (element Data). Every process (element Process) has zeroor more cases or process instances (element ProcessInstance). Similarly, everyprocess instance has zero or more tasks (element AuditTrailEntry). Every taskor audit trail entry must have at least a name (element WorkflowModelElement)and an event type (element EventType). The event type determines the state ofthe corresponding task. There are 13 supported event types: schedule, assign,

33


Figure 4.3: The visual description of the schema for the MXML format [20]

reassign, start, resume, suspend, autoskip, manualskip, withdraw, complete,ate abort, pi abort and unknown. The other task elements are optional. TheTimestamp element supports the logging of time for the task. The Originatorelement records the person/system that performed the task. The Data elementallows for the logging of additional information [20].

However, as described above, the MXML format is purely syntactic. It is notpossible to reason over the concepts behind the labels in the MXML formatbecause this semantic notion is simply not there [21]. In order to link the se-mantic information in ontologies to log files, a new format was needed. TheMXML format is extended in such a way that elements in the log files can belinked and annotated with concepts in ontologies. This new format is called Se-mantically Annotated MXML (SA-MXML). Figure 4.4 illustrates an exampleof SAMXML log file. In this format, all elements (except for AuditTrailEntryand Timestamp) have an optional extra attribute called modelReference. Thisattribute links to a list of concepts in ontologies and, therefore, supports thenecessary model references for SPM. The concepts are expressed as URIs andthe elements in the list are separated by blank spaces. The SA-MXML pro-vides the necessary support to capture the correspondence between labels inlogs and concepts in ontologies. Furthermore, because the SA-MXML format isbackwards compatible with MXML format, PM techniques that do not supportsemantic annotations yet can also be directly applied to SA-MXML logs [20].

As already mentioned in Section 2.3, logs from all kinds of popular informa-tion systems can easily be converted to MXML formatted logs by open-sourceProMimport framework. Thanks to its pluggable environment, plug-ins can eas-ily be added and/or removed from the framework. In order to convert an eventlog file to described SAMXML format for further SPM analysis, it is possibleto implement and add a new plug-in to ProMimport framework.

34


Figure 4.4: The visualization of an SAMXML file

4.3.2 Semantic Plug-ins of ProM

Although there are seven plug-ins making use of semantic information in ProM,the following two plug-ins are very helpful for this master project.

4.3.2.1 Ontology Summary

As explained in Super deliverable 6.5, Ontology Summary plug-in shows a pro-jected view of an ontology [3]. This view consists of the concepts and corre-sponding instances within a SAMXML log file. Hierarchical relations betweenconcepts (i.e. sub/super-concepts) are also derived by using the reasoner.

Figure 4.5: Screenshot of the Ontology Summary plug-in

35


Figure 4.5 illustrates a portion of the ontology built for PH and discussed inChapter 5. The left-side of the screen shows all the ontology names occurringin the log file and ontology views can be changed via “Update Graph”button.The center of the figure shows the projected view of the selected ontology. Therectangle contains the ontology name, i.e. RunOntology. The ellipses containthe names of concepts in the ontology. A black arrow shows a relation froma sub-concept to a matching super-concept. Black arrows pointing directly tothe ontology name (i.e. the rectangle) indicate the concepts that are not sub-concepts of any other concept in the ontology. A gray arrow points from adirect instance to its concept. Please note that, for readability purposes, thisscreenshot is taken from a log file containing only one ProcessInstance. Hence,only one direct instance exist for each concept.

This plug-in is modified and integrated to Semantic Dotted Chart in order tovisualize the ontology. More detailed information will be given in Section 6.3.

4.3.2.2 Ontology Abstraction Filter

The Ontology Abstraction Filter plug-in supports the mining of process modelsat different levels of abstraction. The desired level of abstraction is determinedby selecting or deselecting concepts linked to events (the actual instances ofthese concepts) in logs. The selection is based on a view of the ontology thatshows only the concepts (and all of their super concepts) that are referenced ina log.

Figure 4.6: Screenshot of the Ontology Abstraction Filter plug-in

Figure 4.6 shows the screenshot of this plug-in. The idea of showing differentlevels of abstractions is implemented in the Semantic Dotted Chart plug-in.

36


4.3.2.3 Performance Metrics in Ontologies

The Performance Metrics in Ontologies plug-in provides feedback about (i)the processing times of events and (ii) throughput times of process executions.Moreover, this plug-in also shows how frequently instances of a given concepthave been performed.

Figure 4.7: Screenshot of the Performance Metrics in Ontologies plug-in

Figure 4.7 shows a screenshot of the plug-in. This figure illustrates the process-ing times of users for selected task concepts. The right side panel shows thetime metrics and frequencies of a given concept has been involved in the process.In the middle of the screen, concepts are colored according to these processingtimes. This coloring mechanism is applied to PH plug-in to indicate the valuesof concepts.

37

Chapter 5

Pre-processing

The pre-processing phase focuses on with checking, converting and preparingthe data for the rest of the SPM process. One of the deliverables of this masterthesis is a new ProMimport plug-in that converts PH event log files to SAMXMLformat for further analysis. In this section, this deliverable will be discussed indetail.

5.1 ProMimport RunInfo Plug-in

As it was already shown in Figure 2.3, PM techniques are already applied toPH processes by ProMise tool at PH. For the mentioned X-ray data logged byRADAR system, pre-processing and data conversion phase is required and itis realized by means of a custom-built converter plug-in for the ProMimportframework. Figure 5.1 shows a screenshot of the framework. The left side of thefigure shows the different conversion plug-ins available (e.g. PeopleSoft, CPNTools, etc.). The center of the figure shows a dialog with properties of the se-lected plug-in. Note that Figure 5.1 shows the properties of the import plug-indeveloped in the context of this thesis.

The first two properties are required inputs for the plug-in. The LogDirectoryproperty is used to set the location of the event log files that will be converted.The second property is used to specify the ontology file that will be used forthe conversion. Other properties are used for filtering of event logs. The follow-ing two sections describe in detail the required inputs of this plug-in. Section5.4 and Section 5.5 respectively give the mappings of PH event logs to MXMLstructure and implementation details of the plug-in.

5.2 Event Logs of Philips

As already discussed in Section 2.1.5, Allura Xper systems throughout the wholeworld are connected to the PRS. Via the PRS network, event logs are down-loaded to PH. Using the RADAR system, the event logs are converted to an

39

CHAPTER 5. PRE-PROCESSING

Figure 5.1: Screenshot of ProMimport RunInfo Plug-in

XML format and stored in an internal database of RADAR. These XML eventlogs are one of the inputs of new ProMimport plug-in that converts Philips datato MXML format for further analysis within ProM tool.

All systems that are connected to the PRS network produce log files which logevery operation of a system, and are reported on a daily basis. Each monitoredsystem has a unique PRS identification number. Each system has a directorywith a log file, Log.xml, for each day the system logged any operations. Fig-ure 5.2 shows a Unified Modeling Language (UML) class diagram of the XMLstructure of an event log file. Each system can create one or more XML logfiles, which in turn consist of one or more LogEntry XML elements. Each logfile has the same XML data structure. This is shown in the UML model as theclasses System, XMLFile, LogEntry, and their relationships.

Each LogEntry has common elements like: Unit, Date, Time, SystemMode, In-dex, LogID, EventID, Severity, Module, SourceFile, LineNumber, and Thread.Except the Date and Time elements, these elements do not contain informationabout the event itself, but these elements are included for other reasons, forexample debug purposes. Therefore, these elements are not included in Figure5.2 and not used in the conversion and analysis in the remainder of this thesis.

40

5.2. EVENT LOGS OF PHILIPS

Figure 5.2: UML Model of XML input [29]

A LogEntry element can be one of three different types:

• Information LogEntry: This entry consists of all mentioned commonelements with the addition of the Description element. The Descriptionelement gives a natural language description of the logged event. AnInformation LogEntry can contain various type of information, like thetemperature of a system.

• Command LogEntry: The Command LogEntry will be logged when acommand to the system is given, either by the user itself or by the system.Figure 5.3 shows an example of this type of LogEntry.A command alwayshas a Name element which represents the name of that command. Insome cases, a command also has a Params element which gives some moreinformation about the logged command.

• AcquiredRunInfo LogEntry: The AcquiredRunInfo LogEntry will belogged after radiation (i.e. fluoroscopy or exposure) is done. An exam-ple is shown in Figure 5.4. This entry gives information about the lastradiation run. The AcquiredRunInfo LogEntry consists of two elements:CVXRayRecord, which consists of specific X-Ray data about the last run,and CVGeometry, which consists of specific geometry data about the lastrun.

41


It is important to note that, depending on the radiation technique, as describedin Appendix B, different hardware and software components of the X-Ray ma-chine are being used. This yields changes in the AcquiredRunInfo LogEntryelements. As an example, Pulsed Fluo technique, see Figure 5.4, generatesLast kV parameters whereas TestShot Lockin Multiphase technique, see Figure5.5, generates both Test kV and Result kV parameters, and no Last kV param-eters.

Figure 5.3: An example of an Information LogEntry

The usage behaviour of X-Ray runs can be extracted from AcquiredRunInfoentries. Information and Command entries are used to construct the flow ofradiation runs by using description tags and name tags of the given commands.The mappings will be discussed in detail in coming subsection 5.4.

5.3 Ontology Construction for Philips

In order to specify semantic information of event logs, an ontology, namelyRunOntology, is created with the purpose of improving data quality and en-abling SPM. Ontological engineering is a new field of study concerning theontology development process, the ontology life cycle, the methods and method-ologies for building ontologies, and the tool suites and languages that supportthem [23]. Many methodologies and good practices emerged from experiences ofresearchers and practitioners in order to facilitate this process. The followingsare the examples of ontology development methodologies: Uschold and KingMethodology, Grüninger and Fox Methodology, METHONTOLOGY, On-To-Knowledge Methodology, DILIGENT, TOVE and Ontology Development 101[23], [40], [32], [34], [27].

42

5.3. ONTOLOGY CONSTRUCTION FOR PHILIPS

Figure 5.4: An example of an AcquiredRunInfo LogEntry with Pulsed Fluo technique

The RunOntology is built with following iterative steps that are defined in theUschold and King Methodology for building ontologies [40]:

• Identify Purpose and Scope: In this phase, the purpose and coverageof RunOntology has been determined.

– Domain of the ontology: After several meetings with domain experts,the scope of the project as well as the scope of the ontology has beendetermined to CVXRayRecord elements and ApplicationName andProcedureName of AcquiredRunInfo LogEntry. The descriptions of

43


Figure 5.5: An example of an AcquiredRunInfo LogEntry with TestShot Lockin Multiphasetechnique

these elements can be found in Appendix B. The used elements ofthis AcquiredRunInfo tag can be seen in Figure 5.4.

– Intended Usages: The constructed RunOntology firstly will be usedfor the new analysis and visualization tool, which will be the outputof this master project, within PH.

• Building the Ontology: In this phase, the actual construction of theontology occurred. Since this ontology is the first attempt of PH using Se-mantic Technologies in their software tools, no existing ontology regardingthe run information can be found. For this, RunOntology is built from

44

5.3. ONTOLOGY CONSTRUCTION FOR PHILIPS

scratch. However, existing WSML ontologies have been used. Since thisis the main phase of ontology construction methodology, related steps willbe explained in detail in following Subsection 5.3.1.

– Ontology Capture: Key concepts and their relationships have beenidentified with the domain experts in this phase.

– Ontology Coding: Explicit representation of the conceptualization inWSML language occurred in this phase.

– Integrating Existing Ontologies: Existing WSML ontologies are foundand merged with RunOntology.

• Evaluation: The validation of the RunOntology is done with automatedtools and expert opinions.

• Documentation: The natural text definitions of each concept and rela-tion of the RunOntology were given in this phase.

5.3.1 Building the RunOntology

The initial step to build an ontology is the Ontology Capture step and in thisstep, concepts and their relationships should be identified. As already mentionedabove and shown in Figure 5.4, CVXRayRecord elements and ApplicationNameand ProcedureName elements of CVGeometry tag form the key concepts of theRunOntology. These key concepts are created with the same name as used inthe Log.xml files. However, since these elements are not insightful enough andthe analysis tool should be able to support the analysts of PH in the inter-pretation of usage of the tube, the necessity of creating new concepts by usingAcquiredRunInfo LogEntry elements has raised. Therefore, new concepts andrelations are added to the RunOntology.

These new concepts, of which the corresponding elements do not occur in eventlogs, are created from other key concepts in relation with a user-defined metrics.These metrics are implemented as logical expressions in axioms in the RunOn-tology and a new ontology-specific rule is introduced for the implementation.

Rule I: ”If the

eindhoven university of technology master business process ... · comments on my thesis and his...

Documents