lucentia research group department of software and computing systems using i* modeling for the...

49
LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for Using i* modeling for the multidimensional the multidimensional design of data design of data warehouses warehouses Jose-Norberto Mazón, [email protected] Juan Trujillo, [email protected] Toronto, 17 th July 2008

Upload: erick-oneal

Post on 22-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

LUCENTIA Research GroupDepartment of Softwareand Computing Systems

Using i* modeling for the Using i* modeling for the multidimensional design of multidimensional design of data warehousesdata warehousesJose-Norberto Mazón, [email protected]

Juan Trujillo, [email protected]

Toronto, 17th July 2008

Page 2: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses2

ContentsContents

• Introduction• Current research

• Requirements for DWs• Reconciling with data sources• Deriving logical representations

• Conclusions and short term research

Page 3: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses3

ContentsContents

• Introduction• Current research

• Requirements for DWs• Reconciling with data sources• Deriving logical representations

• Conclusions and short term research

Page 4: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses4

IntroductionIntroductionResearch problemResearch problem

• Data warehouse• Integrated collection of historical data in support of decision

making process

• Multidimensional (MD) modeling• Fact

• Contains interesting measures of a business process

• Dimension• Represents context of analysis

• Resembles traditional method for database design• Model at conceptual level

• Abstracting details related to specific technologies

Page 5: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses5

DATASOURCES

INTERNAL

EXTERNAL

DATAWAREHOUSE

ETL CUBES

OLAP

DATAMINING

REPORTS

WHAT-IFANALYSIS

- Integrated collection of historical datain support of decision makers

IntroductionIntroductionResearch problemResearch problem

Page 6: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses6

DATASOURCES

INTERNAL

EXTERNAS

DATAWAREHOUSE

ETL CUBES

OLAP

DATAMINING

REPORTS

WHAT-IFANALYSIS

DATASOURCES

- Integrated collection of historical datain support of decision makers

IntroductionIntroductionResearch problemResearch problem

Page 7: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses7

DATASOURCES

INTERNAL

EXTERNAS

DATAWAREHOUSE

ETL CUBES

OLAP

DATAMINING

REPORTS

WHAT-IFANALYSIS

DATASOURCES

- Integrated collection of historical datain support of decision makers

- Information needs cannot be understoodby only analyzing data sources

IntroductionIntroductionResearch problemResearch problem

Page 8: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses8

DATASOURCES

INTERNAL

EXTERNAS

DATAWAREHOUSE

ETL CUBES

OLAP

DATAMINING

REPORTS

DECISIONMAKERS

DATASOURCES

- Integrated collection of historical datain support of decision makers

IntroductionIntroductionResearch problemResearch problem

- Information needs cannot be understoodby only analyzing data sources

Page 9: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses9

WHAT-IFANALYSIS

DATASOURCES

INTERNAL

EXTERNAS

DATAWAREHOUSE

ETL CUBES

OLAP

DATAMINING

REPORTS

WHAT-IFANALYSIS

DECISIONMAKERS

DATASOURCES

- Integrated collection of historical datain support of decision makers

IntroductionIntroductionResearch problemResearch problem

- Information needs cannot be understoodby only analyzing data sources

- Decision makingprocesses mustbe understood bydesigners

Page 10: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses10

• Only data sources are analyzed to define the conceptual MD model• Incorrect information needs may be modeled

• Requirements are specified once the conceptual MD model is defined (even after the deployment of the DW)• Incorrect MD elements may be modeled

• Requirements and data sources are not reconciled• Complex ETL processes to populate the DW

• Thus, the DW is not viewed as a valuable resource

IntroductionIntroductionDrawbacks of the state-of-the-artDrawbacks of the state-of-the-art

Page 11: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses11

• 1. Explicit requirement analysis stage• Focus on decision making processes• Information requirements

• 2. Transformation to a conceptual MD model• Model Driven approach• MD model agrees with decision makers’ expectations

• 3. Reconcile requirement model with data sources• MD model agrees with data sources

• Completeness

• Faithfulness

IntroductionIntroductionNovelty of our proposalNovelty of our proposal

Page 12: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses12

• 1. Explicit requirement analysis stage• Focus on decision making processes• Information requirements

• 2. Transformation to a conceptual MD model• Model Driven approach• MD model agrees with decision makers’ expectations

• 3. Reconcile requirement model with data sources• MD model agrees with data sources

• Completeness• Faithfulness

IntroductionIntroductionNovelty of our proposalNovelty of our proposal

Page 13: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses13

• 1. Explicit requirement analysis stage• Focus on decision making processes• Information requirements

• 2. Transformation to a conceptual MD model• Model Driven approach• MD model agrees with decision makers’ expectations

• 3. Reconcile requirement model with data sources• MD model satisfies decision makers’ needs• MD model agrees with data sources

• Completeness• Faithfulness

IntroductionIntroductionNovelty of our proposalNovelty of our proposal

Page 14: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses14

• Defining a goal-oriented approach for DWs• Based on i*• Model decision processes

• Decision makers are concerned about GOALS not directly DATA

• Traceability to a conceptual MD model• Align with MDA

• Integrate requirements and data sources

IntroductionIntroductionObjectives of our proposalObjectives of our proposal

Page 15: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses15

MDAMDA

• Model Driven Architecture (MDA)• Object Management group (OMG) standard• Using models in software development

• Computation Independent Model (CIM)• Platform Independent Model (PIM)• Platform Specific Model (PSM)

• Transformations between models• Query/View/Transformation language (QVT)

• The code is obtained from PSMs

Page 16: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses16

PIM

PSM1 PSMN

CIM

T T

CODE1

T

CODEN

T

T

...

...

MDAMDA

• Model Driven Architecture (MDA)

Describes user requirements

Contains information about functionality andstructure of the system without taking into account

the technology used to implement it

Includes information about the specifictechnology that is used in the implementation

of the system on a specific platform

Every PSM is transformed into code to beexecuted, obtaining the final software product.

Page 17: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses17

MDAMDA

• Query/View/Transformation language (QVT) • Declarative part of QVT• Transformation set of relations• Relations between metamodels formally defined and

automatically performed• Relations applied to models

Page 18: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses18

MDAMDA

MODEL 1

MODEL2

R DOMAINCANDIDATE

MODEL

WHEN & WHERECLAUSES

KIND OF RELATION

METAMODELNAME

Declarative approach of QVT specifies relationships that

must hold between candidate models

Page 19: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses19

IntroductionIntroductionOur proposalOur proposal

[REBNITA 2005][RIGIM 2007]

[ER 2006][ER 2007]

[DKE 2007]

[DOLAP 2005][DaWaK 2006]

[DSS 2008]

Page 20: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses20

ContentsContents

• Introduction• Current research

• Requirements for DWs• Reconciling with data sources• Deriving logical representations

• Conclusions and short term research

Page 21: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses21

• Goal Oriented Requirement Engineering• DW supports the decision making process to fulfill

goals of an organization• Decision makers are concerned about goals

• Information requirements are obtained by refining decision makers’ goals

• MDA approach• Information requirements must be derived into a

conceptual MD model

Requirements for DWsRequirements for DWs

Page 22: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses22

• CIM• Goals and information

requirements

• PIM• Conceptual MD model

• QVT• Transformation between

models

Requirements for DWsRequirements for DWs

Page 23: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses23

Requirements for DWsRequirements for DWsDefining a CIMDefining a CIM

• Classification of DW goals• Strategic goals

• Change to a better situation

• Decision goals• Take appropiate actions

• Information goals• Related to required information

• Information requirements• Interesting measures of business process• Context of analysis

Page 24: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses24

• i* framework• Modeling goals of decision makers and the required

tasks and resources to fulfil them• Several decision makers with different goals

• Two extensions of UML• Profile for i*• Profile for adapting i* to the DW domain

Requirements for DWsRequirements for DWsDefining a CIMDefining a CIM

Page 25: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses25

Requirements for DWsRequirements for DWsDefining a CIMDefining a CIM

Page 26: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses26

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 27: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses27

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 28: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses28

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 29: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses29

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 30: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses30

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 31: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses31

Requirements for DWsRequirements for DWsSample CIMSample CIM

Page 32: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses32

Conceptual MD modelConceptual MD model

• UML Profile for MD modeling• Luján, Trujillo, Song. A

UML profile for Multidimensional Modeling in Data Warehouses. Data and Knowledge Engineering. 2006.

• Class diagram

Stereotype Icon

Fact

Dimension

Base

FactAttribute

DimensionAttribute

Rolls-UpTo <<Rolls-UpTo>>

Page 33: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses33

Conceptual MD modelConceptual MD model

Page 34: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses34

Conceptual MD modelConceptual MD modelObtaining an initial PIMObtaining an initial PIM

Page 35: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses35

Conceptual MD modelConceptual MD modelObtaining an initial PIMObtaining an initial PIM

Page 36: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses36

Conceptual MD modelConceptual MD modelSample initial PIMSample initial PIM

Page 37: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses37

USER REQUIREMENTS

DATASOURCES

RECONCILIATION

PIM

PSM

INITIALPIM

Reconciling with data sourcesReconciling with data sources

Page 38: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses38

Reconciling with data sourcesReconciling with data sources

• The MD conceptual model is reconciled with the available data sources• The DW will be properly populated from data sources• The analysis potential provided by the data sources is captured by the

MD conceptual model• Redundancies are avoided• Optional dimension levels are controlled to enable summarizability and

to avoid inconsistent queries

• Reconciliating process is automatically performed• QVT relations based on Multidimensional Normal Forms

• Lechtenbörger and Vossen. Multidimensional normal forms for data warehouse design. Information Systems 28(2003)

Page 39: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses39

Reconciling with data sourcesReconciling with data sources

Page 40: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses40

n_t1=district, n_t2=state

<<Rolls-upTo>>

1..n +d

1

+r

Reconciling with data sourcesReconciling with data sources

Page 41: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses41

Deriving logical representationsDeriving logical representations

• PIM• UML profile for MD modeling [Luján et al. DKE 2006]

• PSM• Common Warehouse Metamodel (CWM)

• From PIM to each PSM • QVT transformation

Page 42: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses42

• Common Warehouse Metamodel (CWM)• Resource layer

• Standard to represent the structure of data according to certain technologies

• Relational metamodel• Tables, columns, primary keys, and so on

• Multidimensional metamodel• Generic data structures • Vendor specific extension

• Oracle Express extension

Deriving logical representationsDeriving logical representations

Page 43: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses43

ContentsContents

• Introduction• Current research

• Requirements for DWs• Reconciling with data sources• Deriving logical representations

• Conclusions and short term research

Page 44: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses44

• DW projects fail in support decision making process• Requirement analysis stage is overlooked for defining

a conceptual MD model

• Using i* framework together with MDA

ConclusionsConclusionsObjectivesObjectives

Page 45: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses45

• MDA framework• UML profile for i*

• Extension for using i* in the DW domain

• Transformations to obtain a conceptual MD model• Several kind of logical representations

• Multidimensional normal forms• Reconciling data sources and requirements in a hybrid approach

• Eclipse-based prototype

ConclusionsConclusionsScientific contributionsScientific contributions

Page 46: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses46

Eclipse-based prototypeEclipse-based prototype

Page 47: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses47

ConclusionsConclusionsRelated work at LUCENTIA research groupRelated work at LUCENTIA research group

UML Profile forMD Modeling at DKE 2006

UML for PhysicalModeling at JCIS 2006

Common WarehouseMetamodel

CIM

PIM

PSM

MDA [DKE 2007 & DSS 2008]

Requirements for DWs[RIGiM 2007]

Security[DSS 2006 & IS 2007]

UML profile for Data mining[DKE 2007]

Data sources analysis[ER 2007]

Page 48: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

Using i* modeling for the multidimenionaldesign of data warehouses48

• Studying unstructured decision processes in-deth to model them in i* diagrams• Taking advantage of every i* feature• Considering complex mechanisms to reason

about goals and structure decision processes• Prioritization of goals

Short term researchShort term research

Page 49: LUCENTIA Research Group Department of Software and Computing Systems Using i* modeling for the multidimensional design of data warehouses Jose-Norberto

LUCENTIA Research GroupDepartment of Softwareand Computing Systems

Using i* modeling for the Using i* modeling for the multidimensional design of multidimensional design of data warehousesdata warehousesJose-Norberto Mazón, [email protected]

Juan Trujillo, [email protected]

Toronto, 17th July 2008