a framework for building resilient data warehouses a

1
A Framework for Building Resilient Data Warehouses A Framework for Building Resilient Data Warehouses A Framework for Building Resilient Data Warehouses using a Mandala Topology Architecture using a Mandala Topology Architecture IA using a Mandala Topology Architecture IA Michel JACQUES Michel JACQUES Information Assembly SPRL, Brussels, Belgium Introduction 1.Users Perspectives & Interfaces 2.The Four Major Streams of Data Introduction 1.Users Perspectives & Interfaces 2.The Four Major Streams of Data Data Application: DW Platform is Usage-Driven Data Application: DW Platform is Usage-Driven Constructing ETL work for a DW project is a complex affair, one that Each user type has a different Source data is in an unrefined state (to Data Application: DW Platform is Usage-Driven Data Application: DW Platform is Usage-Driven Constructing ETL work for a DW project is a complex affair, one that requires planning. This framework uses a DW architecture based on a perspective and role towards the DW. Existing users include: data stewards, various degrees) that must have its data elements differentiated into a DW core Integration Integration requires planning. This framework uses a DW architecture based on a Mandala Topology . At its core is an agile, step-by-step approach to Existing users include: data stewards, analytical end-users, quality control elements differentiated into a DW core data model (IN) in order to able to META Integration META Integration Mandala Topology . At its core is an agile, step-by-step approach to identify ETL work units: analytical end-users, quality control masters, DW administrators & integration data model (IN) in order to able to recombine them effectively afterwards for META META identify ETL work units: 1.Identifying the external users masters, DW administrators & integration managers. Each user has his own set of information requirements and accesses recombine them effectively afterwards for analytics (OUT). During this transition, STAR STAR 1.Identifying the external users 2.Positioning main data flows, information requirements and accesses the DW via a specific interface (STAG, two other types of data elements are produced: metadata (UP) and erroneous STAR EDM STAG MART Data STAR EDM STAG MART Data 2.Positioning main data flows, 3.Decomposing a flow internal layers the DW via a specific interface (STAG, META, MART, or SINK). The STAR can produced: metadata (UP) and erroneous data (DOWN) streams. No data should Analytics Data Stewards Analytics Data Stewards 3.Decomposing a flow internal layers 4.Incorporating them within the DW application platform (Mandala) META, MART, or SINK). The STAR can only be accessed indirectly via the other data (DOWN) streams. No data should be lost in this closed system. Thus the SINK Analytics SINK Analytics 4.Incorporating them within the DW application platform (Mandala) 5.Extending conceptual EDM with a functional level of detail interfaces to ensure data integrity & security and prevents dependencies position & direction of a data stream determines its purpose, the means, and 5.Extending conceptual EDM with a functional level of detail 6.Classifying data model entities and their dependencies security and prevents dependencies caused by different user demands. The determines its purpose, the means, and its destination. This naturally leads to Quality Quality 6.Classifying data model entities and their dependencies 7.Combining the Functional ER model with Topic Areas & Tiers caused by different user demands. The users are part of an iterative feedback its destination. This naturally leads to asymmetries between streams, which Management: Quality Control Quality Control 7.Combining the Functional ER model with Topic Areas & Tiers 8.Enumerate the ETL work units in matrix format users are part of an iterative feedback loop improving future data content and asymmetries between streams, which must be accounted for in the ETL design.. Management: Intravenous Dextrose infusion with rates up to 250mls/hr of 20% Dextrose 8.Enumerate the ETL work units in matrix format The method focuses on the topological relationship between all the DW quality. Hence a multi-perspectives/faceted data warehouse architecture acting as a crossway between “Just about every [DW] process has side effects; but they can be deliberate and sustaining instead of unintentional and pernicious…and we can also be inspired by it to design some Intravenous Dextrose infusion with rates up to 250mls/hr of 20% Dextrose The method focuses on the topological relationship between all the DW artifacts, in order to comprehensive improve planning & design of DW . Hence a multi-perspectives/faceted data warehouse architecture acting as a crossway between these end-users is comprehensive and non-discriminatory at an organizational level. instead of unintentional and pernicious…and we can also be inspired by it to design some positive side effects to our own enterprises instead of focusing exclusively on a single end.” (p.80) Dietary intervention with frequent meals and corn starch artifacts, in order to comprehensive improve planning & design of DW . these end-users is comprehensive and non-discriminatory at an organizational level. positive side effects to our own enterprises instead of focusing exclusively on a single end.” (p.80) Ref: W. McDonough and M. Braungart, Craddle to Craddle, North Point Press, NY, 2002 Dietary intervention with frequent meals and corn starch Diazoxide intolerant leading to hyponatraemia, oedema and nausea 5.Functional ER Data Model 3.The Five ETL Layers & Chirality 4.DW Mandala Topology Architecture Octreotide/glucagon intravenously in order to replace counter-regulatory hormones 5.Functional ER Data Model 3.The Five ETL Layers & Chirality 4.DW Mandala Topology Architecture Data Application: Platform for 5 Architectural Layers Data Application: Platform for 5 Architectural Layers Enterprise Data Model: Conceptual Model Enterprise Data Model: Conceptual Model The Buddhist Mandala metaphor The conceptual model is The purpose of an ETL is to increase the Octreotide/glucagon intravenously in order to replace counter-regulatory hormones Data Application: Platform for 5 Architectural Layers Data Application: Platform for 5 Architectural Layers Enterprise Data Model: Conceptual Model Enterprise Data Model: Conceptual Model The Buddhist Mandala metaphor helps visualize an architecture The conceptual model is business-oriented, while the The purpose of an ETL is to increase the integration & analytical capability of Subcutaneous Octreotide - hypoglycaemia worsened 1 META META where topological relationships between DW components business-oriented, while the logical model is focused on integration & analytical capability of sourced data. An ETL data path consists of 5 Subcutaneous Octreotide - hypoglycaemia worsened 1 Authoring Authoring between DW components including: data modeling, ETL logical model is focused on content and application. The layers, each conducting a different set of data transformations. The first two “IN” stages Prednisolone developed fluid retention including: data modeling, ETL data flows, and surrounding functional data model, proposed herein, places itself in the gap transformations. The first two “IN” stages integrate the data to enable multiple Prednisolone developed fluid retention STAR STAR Trn Maps Trn Maps data flows, and surrounding actors & applications; functionally herein, places itself in the gap between the two. It extends the integrate the data to enable multiple interpretations, while the last two “OUT” Hepatic Arterial Embolisation (HAE) performed twice with initial improvement (post-procedure insulin 29 pmol/l), but relapsed after 4 weeks. STAR EDM STAG MART IN OUT STAR EDM STAG MART IN OUT interact. The architecture is similar to a road crossway that between the two. It extends the number of entity types from the interpretations, while the last two “OUT” stages specialise the data such that it Hepatic Arterial Embolisation (HAE) performed twice with initial improvement (post-procedure insulin 29 pmol/l), but relapsed after 4 weeks. EDM Extraction:::Staging:::::Integration::::Publishing:User Access EDM Extraction:::Staging:::::Integration::::Publishing:User Access Profiles Attendance Profiles Attendance similar to a road crossway that gives essential context and number of entity types from the initial fact and dimension with becomes fit-for-purpose. This mirror-effect is referred to as ETL chirality. Extraction:::Staging:::::Integration::::Publishing:User Access Extraction:::Staging:::::Integration::::Publishing:User Access Profiles Roster Profiles Roster gives essential context and movement to data and new types for holding hierarchies, dim. ids, details & associations. referred to as ETL chirality. The following are important elements when selecting the most appropriate data modelling SINK SINK Trainings Trainings operations. The context is composed of 5 distinct locations dim. ids, details & associations. This improves history-keeping The following are important elements when selecting the most appropriate data modelling technique to use: a) the degree of convergence built into the data during ETL; b) the number of SINK SINK Trainings Trainings composed of 5 distinct locations (STAG, META, SINK, MART, & This improves history-keeping and makes the core data model unique pathways in the dimensional model; c) increased data flow resilience by decreased data reliance; and d) ability for decoupling of model components. Erroneous Data Repo Erroneous Data Repo (STAG, META, SINK, MART, & STAR) that provide a clear more resilient while decoupling the associated data flows. reliance; and d) ability for decoupling of model components. Data disturbances will occur either from external sources in an unexpected, subtle or extreme Data Repo Data Repo STAR) that provide a clear logical structure determining There is a need to extend our “vocabulary of forms” when modelling. This involves adding the associated data flows. Data disturbances will occur either from external sources in an unexpected, subtle or extreme manner, which requires a faster data recovery by minimising data reload to only what is relevant. data, flows, modeling patterns, access and security, user groups, and integration methods. Moreover, locations enable data persistence facilitating data recovery and transformations. There is a need to extend our “vocabulary of forms” when modelling. This involves adding functional features so as to harmonise “form with function” and thus achieve a greater manner, which requires a faster data recovery by minimising data reload to only what is relevant. Resilience also involves cyclic transfer of information across data applications reinforcing each Moreover, locations enable data persistence facilitating data recovery and transformations. functional features so as to harmonise “form with function” and thus achieve a greater decoupling of DW artefacts, whilst maintaining data cohesion. system data quality and monitoring usage. decoupling of DW artefacts, whilst maintaining data cohesion. 8.Work Matrix from Method & Conclusion 7. Agile DW Meets Functional ER Model 6.Volatility of Functional Entities 8.Work Matrix from Method & Conclusion 7. Agile DW Meets Functional ER Model 6.Volatility of Functional Entities ETL Decomposition Process: Classes as Tiers Revisited ETL Decomposition Process: Classes as Tiers Revisited ETL Decomposition Process: Work Units and Modules ETL Decomposition Process: Work Units and Modules ETL Decomposition Process: Topic Areas & Tiers for EDM ETL Decomposition Process: Topic Areas & Tiers for EDM Development progress follows an The ETL decomposition process This functional data model ETL Decomposition Process: Classes as Tiers Revisited DEPENDENCY LAYERS as TIERS ETL Decomposition Process: Classes as Tiers Revisited DEPENDENCY LAYERS as TIERS DEPENDENCY LAYERS as TIERS ETL Decomposition Process: Work Units and Modules ETL Decomposition Process: Work Units and Modules ETL Decomposition Process: Topic Areas & Tiers for EDM EVALUATION_QUESTIONS3 TYPES_OF_NEEDS3 OTHER_CATALOG3 EVALUATION_ANSWERS3 ETL Decomposition Process: Topic Areas & Tiers for EDM EVALUATION_QUESTIONS3 TYPES_OF_NEEDS3 OTHER_CATALOG3 EVALUATION_ANSWERS3 Development progress follows an iterative and mostly top-down The ETL decomposition process requires an entity having both a This functional data model applies additional data entity DEPENDENCY LAYERS as TIERS TIERS DATA CLASSES DEPENDENCY LAYERS as TIERS TIERS DATA CLASSES DEPENDENCY LAYERS as TIERS TIERS DATA CLASSES 1,n 1,n 0,n 1,n TRAINING_NEEDS_ACTUAL EVALUATION_QUESTIONS3 PRIORITY SOLUTION <Undefined> <Undefined> OTHER_CATALOG3 EVALUATION_ANSWERS3 1,n 1,n 0,n 1,n TRAINING_NEEDS_ACTUAL EVALUATION_QUESTIONS3 PRIORITY SOLUTION <Undefined> <Undefined> OTHER_CATALOG3 EVALUATION_ANSWERS3 iterative and mostly top-down approach: one topic area, one class and a theme. For each entity the process allocates: a) a applies additional data entity classes giving it increased ETL flexibility. The classes follow a T00 Metadata (static) T00 Metadata (static) T00 Metadata (static) EVA-4 TNA-5 1,n TRAINING_NEEDS_ACTUAL TNA_NO_NEEDS_VAALUE TNA_SUGGESTIONS_DESC TNA_OBJECTIVE_DESC ... <Undefine <Undefine <Undefine TRAININGS_EVALUATIONS EVA-4 TNA-5 1,n TRAINING_NEEDS_ACTUAL TNA_NO_NEEDS_VAALUE TNA_SUGGESTIONS_DESC TNA_OBJECTIVE_DESC ... <Undefine <Undefine <Undefine TRAININGS_EVALUATIONS module, one tier at a time, although units within same tier can be entity the process allocates: a) a tier corresponding to a class; b) flexibility. The classes follow a step-wise approach whereby they T01 Dimensions T01 Dimensions T01 Dimensions TMA-2 EVA-4 ORGANI WORKFLOW TMA-2 EVA-4 ORGANI WORKFLOW units within same tier can be developed in parallel. Once there is tier corresponding to a class; b) a topic area corresponding to a step-wise approach whereby they become increasingly volatile T01 base, details & struct) Facts T01 base, details & struct) Facts T01 base, details & struct) Facts TRA-3 T1 SUB_TIME SUP_TIME 0,n 1,n SUB SUP sup sub TRAINING_MAP_ACTUAL TMA_DOSSIER_ID TMA_STATUS_DATE <U <U TIMES3 TIMES_H WORKFLOW_STATUS5 WFS_SID WFS_CODE <Undefined> <Undefined> ORGANI ROOMS3 TRA-3 T1 SUB_TIME SUP_TIME 0,n 1,n SUB SUP sup sub TRAINING_MAP_ACTUAL TMA_DOSSIER_ID TMA_STATUS_DATE <U <U TIMES3 TIMES_H WORKFLOW_STATUS5 WFS_SID WFS_CODE <Undefined> <Undefined> ORGANI ROOMS3 developed in parallel. Once there is a Reference Model (built prototype) a topic area corresponding to a theme; and c) a priority become increasingly volatile (data susceptibility to change). Facts (events & profiles) T02 Facts (events & profiles) T02 Facts (events & profiles) T02 T0 T1 T2 1,n 0,n 0,n 0,n TMA_STATUS_DATE TMA_VERSION_NO TMA_VERSION_START_DATE TMA_VERSION_END_DATE ... <U <U <U <U TIME_SID TIME_DATE <Undefin <Undefin TRAINING_MAP_EXERCISE3 TME_SID TME_NAME <Undefined> <Undefined> ORGANISATIONS3 ORG_SID ORG_CODE ... <Undefin <Undefin COURSE_DETAILS3 T0 T1 T2 1,n 0,n 0,n 0,n TMA_STATUS_DATE TMA_VERSION_NO TMA_VERSION_START_DATE TMA_VERSION_END_DATE ... <U <U <U <U TIME_SID TIME_DATE <Undefin <Undefin TRAINING_MAP_EXERCISE3 TME_SID TME_NAME <Undefined> <Undefined> ORGANISATIONS3 ORG_SID ORG_CODE ... <Undefin <Undefin COURSE_DETAILS3 a Reference Model (built prototype) for a Tier work unit, estimates for all corresponding to prevailing business needs. An entity defines Since data in lower classes is used to derive new data in upper T03 Grids (associations) T03 Grids (associations) T03 Grids (associations) T4 0,n 0,n 0,n Owner 1,n 0,n Details 0,n 1,n 1,n ... STATUTORIES3 COURSES3 COURSE_SID <Undefine TRAININGS_ATTENDANCE_PAG T4 0,n 0,n 0,n Owner 1,n 0,n Details 0,n 1,n 1,n ... STATUTORIES3 COURSES3 COURSE_SID <Undefine TRAININGS_ATTENDANCE_PAG units can be based on the reference model and adjusted according to business needs. An entity defines the work unit in which it is used to derive new data in upper classes, the data volatility in T04 Densification & Conformations T04 Densification & Conformations T04 Densification & Conformations Details 0,n Participant 1,n 1,n 1,n 1,n 1,n PARTICIPANTS3 PTCD_SID PTC_LAST_NAME ... <U <U PARTICIPANTS_PROFILES_ACTUAL PPA_ONE_VALUE PPA_VALID_START_DATE PPA_VALID_END_DATE PPA_ACTUAL_FLAG <Undefine <Undefine <Undefine <Undefine ADMIN_POSITIONS3 COURSE_SID COURSE_CODE <Undefine <Undefine PARTICIPANT_STATUS3 SESSION_DURATION SESSION_EQUIVALENT_VALUE SESSION_EVENT_DATE SESSION_VIRTUAL_FLAG ... < < < < Details 0,n Participant 1,n 1,n 1,n 1,n 1,n PARTICIPANTS3 PTCD_SID PTC_LAST_NAME ... <U <U PARTICIPANTS_PROFILES_ACTUAL PPA_ONE_VALUE PPA_VALID_START_DATE PPA_VALID_END_DATE PPA_ACTUAL_FLAG <Undefine <Undefine <Undefine <Undefine ADMIN_POSITIONS3 COURSE_SID COURSE_CODE <Undefine <Undefine PARTICIPANT_STATUS3 SESSION_DURATION SESSION_EQUIVALENT_VALUE SESSION_EVENT_DATE SESSION_VIRTUAL_FLAG ... < < < < model and adjusted according to variable difficulty factors. Once the work unit in which it is contained. A work unit is the classes, the data volatility in lower class will be less than that Data Marts T05 T04 (ballasting & mappings) Data Marts T05 T04 (ballasting & mappings) Data Marts T05 T04 (ballasting & mappings) T2 T1 T3 0,n 0,n Details 0,n 0,n 1,n 1,n PARTICIPANTS_DETAIL PPA_ACTUAL_FLAG <Undefine JOBS3 CATEGORIES_GRADES TRAININGS_ROSTER_PTG FINAL_EXAM_MARK INSCRIPTION_DATE ... <Undefined> <Undefined> T2 T1 T3 0,n 0,n Details 0,n 0,n 1,n 1,n PARTICIPANTS_DETAIL PPA_ACTUAL_FLAG <Undefine JOBS3 CATEGORIES_GRADES TRAININGS_ROSTER_PTG FINAL_EXAM_MARK INSCRIPTION_DATE ... <Undefined> <Undefined> variable difficulty factors. Once completed & tested, work units contained. A work unit is the smallest functional artefact lower class will be less than that in upper classes. Volatility Data Marts (lov, hier & msr) T06 T07 Data Marts (lov, hier & msr) T06 T07 Data Marts (lov, hier & msr) T06 T07 T1 0,n 0,n 1,n 1,n 1,n RESOURCE_CENTERS3 TRAININGS_ACTUAL COST NO_SESSION_VALUE <Undefined> <Undefined> CATEGORIES_GRADES CGR_SID CGR_CODE ... <Undefine <Undefine DOMAINS3 T1 0,n 0,n 1,n 1,n 1,n RESOURCE_CENTERS3 TRAININGS_ACTUAL COST NO_SESSION_VALUE <Undefined> <Undefined> CATEGORIES_GRADES CGR_SID CGR_CODE ... <Undefine <Undefine DOMAINS3 completed & tested, work units modules are then promoted to next determining granularity of resource allocation. Regrouping determines which data flows are performed in parallel or in T08 T09 Segments & KPI (compositions & formula) T08 T09 Segments & KPI (compositions & formula) T08 T09 Segments & KPI (compositions & formula) PPA-1 T1 T1 T2 0,n 1,n 1,n 1,n 1,n 0,n DURATION_CALCUL_VALUE DURATION_MANUAL_VALUE NO_MAX_PARTICIPANTS ... <Undefined> <Undefined> <Undefined> ... CITIES3 COURSE_TYPES3 PPA-1 T1 T1 T2 0,n 1,n 1,n 1,n 1,n 0,n DURATION_CALCUL_VALUE DURATION_MANUAL_VALUE NO_MAX_PARTICIPANTS ... <Undefined> <Undefined> <Undefined> ... CITIES3 COURSE_TYPES3 dev. environment. resource allocation. Regrouping work units is as follows: Work performed in parallel or in sequence. This principle T99 Topic Area Promotion T99 Topic Area Promotion T99 Topic Area Promotion TOPIC PPA-1 T1 T1 TIERS TOPIC TIERS COURSE_STATUS3 TRAINERS3 TOPIC PPA-1 T1 T1 TIERS TOPIC TIERS COURSE_STATUS3 TRAINERS3 work units is as follows: Work unit >> Module >> Topic Area >> sequence. This principle drastically reduces ETL execution Conclusion: The advantages of implementing such a topological architecture include: greater Promotion Promotion Promotion Enterprise Data Model (EDM). drastically reduces ETL execution and development time. scalability of additional data themes, enhanced performance of data flows, increased resilience of decoupled artifacts, sturdier quality control, and lower operating and development costs. The Concept of Tiers and Topic Areas is borrowed from R. Hughes’ book: The Agile Data Warehousing, iUniverse Inc, Bloomington, 2008 decoupled artifacts, sturdier quality control, and lower operating and development costs. The framework provides a comprehensive, reproducible, and proven DW architecture solution. The 6 classes are mapped to 9 tiers that maintain volatility. Warehousing, iUniverse Inc, Bloomington, 2008 framework provides a comprehensive, reproducible, and proven DW architecture solution. Ecole Centrale Paris, Châtenay-Malabry Ecole Centrale Paris, Châtenay-Malabry

Upload: others

Post on 01-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Framework for Building Resilient Data Warehouses A

A Framework for Building Resilient Data Warehouses A Framework for Building Resilient Data Warehouses A Framework for Building Resilient Data Warehouses using a Mandala Topology Architectureusing a Mandala Topology ArchitectureIA using a Mandala Topology ArchitectureIA using a Mandala Topology Architecture

Michel JACQUES

IA

Michel JACQUESInformation Assembly SPRL, Brussels, BelgiumInformation Assembly SPRL, Brussels, Belgium

Introduction 1.Users Perspectives & Interfaces 2.The Four Major Streams of DataIntroduction 1.Users Perspectives & Interfaces 2.The Four Major Streams of DataIntroduction 2.The Four Major Streams of Data

Data Application: DW Platform is Usage-Driven Data Application: DW Platform is Usage-Driven Constructing ETL work for a DW project is a complex affair, one that Each user type has a different Source data is in an unrefined state (to Data Application: DW Platform is Usage-Driven Data Application: DW Platform is Usage-Driven Constructing ETL work for a DW project is a complex affair, one that

requires planning. This framework uses a DW architecture based on aperspective and role towards the DW.

Existing users include: data stewards,

Source data is in an unrefined state (to various degrees) that must have its data

elements differentiated into a DW core IntegrationIntegration

requires planning. This framework uses a DW architecture based on a

Mandala Topology. At its core is an agile, step-by-step approach toExisting users include: data stewards,

analytical end-users, quality control elements differentiated into a DW core

data model (IN) in order to able to

META

Integration

META

IntegrationMandala Topology. At its core is an agile, step-by-step approach toidentify ETL work units:

analytical end-users, quality control

masters, DW administrators & integration data model (IN) in order to able to

recombine them effectively afterwards for METAMETA

identify ETL work units:1.Identifying the external users

masters, DW administrators & integration

managers. Each user has his own set of

information requirements and accesses

recombine them effectively afterwards for

analytics (OUT). During this transition,

STARSTAR

1.Identifying the external users

2.Positioning main data flows,information requirements and accesses

the DW via a specific interface (STAG, two other types of data elements are

produced: metadata (UP) and erroneous STAR

EDMSTAG MART

Data

STAR

EDMSTAG MART

Data

2.Positioning main data flows,3.Decomposing a flow internal layers

the DW via a specific interface (STAG,

META, MART, or SINK). The STAR can produced: metadata (UP) and erroneous

data (DOWN) streams. No data should EDM

Analytics

Data Stewards

EDM

Analytics

Data Stewards

3.Decomposing a flow internal layers4.Incorporating them within the DW application platform (Mandala)

META, MART, or SINK). The STAR can

only be accessed indirectly via the other data (DOWN) streams. No data should

be lost in this closed system. Thus the

SINKAnalytics

SINKAnalytics4.Incorporating them within the DW application platform (Mandala)

5.Extending conceptual EDM with a functional level of detail

only be accessed indirectly via the other

interfaces to ensure data integrity & security and prevents dependencies

be lost in this closed system. Thus the

position & direction of a data stream determines its purpose, the means, and 5.Extending conceptual EDM with a functional level of detail

6.Classifying data model entities and their dependenciessecurity and prevents dependencies

caused by different user demands. The determines its purpose, the means, and

its destination. This naturally leads to

Quality Quality

6.Classifying data model entities and their dependencies7.Combining the Functional ER model with Topic Areas & Tiers

caused by different user demands. The

users are part of an iterative feedback its destination. This naturally leads to

asymmetries between streams, which

Management:Quality

ControlQuality

Control7.Combining the Functional ER model with Topic Areas & Tiers

8.Enumerate the ETL work units in matrix format

users are part of an iterative feedback

loop improving future data content and asymmetries between streams, which

must be accounted for in the ETL design..Management:

•Intravenous Dextrose infusion with rates up to 250mls/hr of 20% Dextrose

8.Enumerate the ETL work units in matrix format

The method focuses on the topological relationship between all the DW

loop improving future data content and

quality.

Hence a multi-perspectives/faceted data warehouse architecture acting as a crossway between

must be accounted for in the ETL design..“Just about every [DW] process has side effects; but they can be deliberate and sustaining instead of unintentional and pernicious…and we can also be inspired by it to design some •Intravenous Dextrose infusion with rates up to 250mls/hr of 20% DextroseThe method focuses on the topological relationship between all the DW

artifacts, in order to comprehensive improve planning & design of DW.

Hence a multi-perspectives/faceted data warehouse architecture acting as a crossway between these end-users is comprehensive and non-discriminatory at an organizational level.

instead of unintentional and pernicious…and we can also be inspired by it to design some

positive side effects to our own enterprises instead of focusing exclusively on a single end.” (p.80)

•Dietary intervention with frequent meals and corn starchartifacts, in order to comprehensive improve planning & design of DW. these end-users is comprehensive and non-discriminatory at an organizational level. positive side effects to our own enterprises instead of focusing exclusively on a single end.” (p.80)

Ref: W. McDonough and M. Braungart, Craddle to Craddle, North Point Press, NY, 2002•Dietary intervention with frequent meals and corn starch Ref: W. McDonough and M. Braungart, Craddle to Craddle, North Point Press, NY, 2002

•Diazoxide – intolerant leading to hyponatraemia, oedema and nausea 5.Functional ER Data Model 3.The Five ETL Layers & Chirality 4.DW Mandala Topology Architecture•Diazoxide – intolerant leading to hyponatraemia, oedema and nausea

•Octreotide/glucagon intravenously – in order to replace counter-regulatory hormones

5.Functional ER Data Model 3.The Five ETL Layers & Chirality 4.DW Mandala Topology ArchitectureData Application: Platform for 5 Architectural LayersData Application: Platform for 5 Architectural Layers Enterprise Data Model: Conceptual ModelEnterprise Data Model: Conceptual ModelThe Buddhist Mandala metaphor

The conceptual model is The purpose of an ETL is to increase the •Octreotide/glucagon intravenously – in order to replace counter-regulatory hormonesData Application: Platform for 5 Architectural LayersData Application: Platform for 5 Architectural Layers Enterprise Data Model: Conceptual ModelEnterprise Data Model: Conceptual ModelThe Buddhist Mandala metaphor

helps visualize an architectureThe conceptual model is business-oriented, while the

The purpose of an ETL is to increase the integration & analytical capability of

•Subcutaneous Octreotide - hypoglycaemia worsened1METAMETA

helps visualize an architecture

where topological relationships

between DW components

business-oriented, while the

logical model is focused on

integration & analytical capability of

sourced data. An ETL data path consists of 5 •Subcutaneous Octreotide - hypoglycaemia worsened1

AuthoringAuthoringbetween DW components

including: data modeling, ETL

logical model is focused on

content and application. The

sourced data. An ETL data path consists of 5

layers, each conducting a different set of data

transformations. The first two “IN” stages

•Prednisolone – developed fluid retention

including: data modeling, ETL

data flows, and surrounding

content and application. The

functional data model, proposed

herein, places itself in the gap

transformations. The first two “IN” stages

integrate the data to enable multiple •Prednisolone – developed fluid retention

STARSTAR

Trn MapsTrn Maps

data flows, and surrounding actors & applications; functionally

herein, places itself in the gap between the two. It extends the

integrate the data to enable multiple interpretations, while the last two “OUT”

•Hepatic Arterial Embolisation (HAE) performed twice with initial improvement (post-procedure insulin 29 pmol/l), but relapsed after 4 weeks.STAR

EDMSTAG MARTIN OUT

STAR

EDMSTAG MARTIN OUT

interact. The architecture is

similar to a road crossway that

between the two. It extends the

number of entity types from the

interpretations, while the last two “OUT”

stages specialise the data such that it •Hepatic Arterial Embolisation (HAE) performed twice with initial improvement (post-procedure insulin 29 pmol/l), but relapsed after 4 weeks.EDMIN OUT

Extraction:::Staging:::::Integration::::Publishing:User Access

EDMIN OUT

Extraction:::Staging:::::Integration::::Publishing:User Access ProfilesAttendance

ProfilesAttendancesimilar to a road crossway that

gives essential context and

number of entity types from the

initial fact and dimension with becomes fit-for-purpose. This mirror-effect is

referred to as ETL chirality. Extraction:::Staging:::::Integration::::Publishing:User AccessExtraction:::Staging:::::Integration::::Publishing:User Access Profiles

Roster

Profiles

Roster

gives essential context and

movement to data and new types for holding hierarchies,

dim. ids, details & associations.

referred to as ETL chirality.The following are important elements when selecting the most appropriate data modelling

SINKSINK

Roster

Trainings

Roster

Trainings

movement to data and operations. The context is

composed of 5 distinct locations

dim. ids, details & associations. This improves history-keeping

The following are important elements when selecting the most appropriate data modelling technique to use: a) the degree of convergence built into the data during ETL; b) the number of

SINKSINKTrainingsTrainings

composed of 5 distinct locations

(STAG, META, SINK, MART, &

This improves history-keeping

and makes the core data model

technique to use: a) the degree of convergence built into the data during ETL; b) the number of

unique pathways in the dimensional model; c) increased data flow resilience by decreased data

reliance; and d) ability for decoupling of model components. Erroneous

Data Repo

Erroneous

Data Repo

(STAG, META, SINK, MART, &

STAR) that provide a clear

and makes the core data model

more resilient while decoupling

the associated data flows.

reliance; and d) ability for decoupling of model components.

Data disturbances will occur either from external sources in an unexpected, subtle or extreme Data RepoData RepoSTAR) that provide a clear

logical structure determiningThere is a need to extend our “vocabulary of forms” when modelling. This involves adding

the associated data flows.Data disturbances will occur either from external sources in an unexpected, subtle or extreme

manner, which requires a faster data recovery by minimising data reload to only what is relevant. logical structure determining

data, flows, modeling patterns, access and security, user groups, and integration methods. Moreover, locations enable data persistence facilitating data recovery and transformations.

There is a need to extend our “vocabulary of forms” when modelling. This involves adding functional features so as to harmonise “form with function” and thus achieve a greater

manner, which requires a faster data recovery by minimising data reload to only what is relevant. Resilience also involves cyclic transfer of information across data applications reinforcing each

Moreover, locations enable data persistence facilitating data recovery and transformations. functional features so as to harmonise “form with function” and thus achieve a greater

decoupling of DW artefacts, whilst maintaining data cohesion. system data quality and monitoring usage.

decoupling of DW artefacts, whilst maintaining data cohesion.

8.Work Matrix from Method & Conclusion7. Agile DW Meets Functional ER Model6.Volatility of Functional Entities 8.Work Matrix from Method & Conclusion7. Agile DW Meets Functional ER Model6.Volatility of Functional Entities

ETL Decomposition Process: Classes as Tiers RevisitedETL Decomposition Process: Classes as Tiers Revisited ETL Decomposition Process: Work Units and ModulesETL Decomposition Process: Work Units and ModulesETL Decomposition Process: Topic Areas & Tiers for EDMETL Decomposition Process: Topic Areas & Tiers for EDM Development progress follows an The ETL decomposition process This functional data model ETL Decomposition Process: Classes as Tiers Revisited DEPENDENCY LAYERS as TIERS

ETL Decomposition Process: Classes as Tiers Revisited DEPENDENCY LAYERS as TIERS DEPENDENCY LAYERS as TIERS

ETL Decomposition Process: Work Units and ModulesETL Decomposition Process: Work Units and ModulesETL Decomposition Process: Topic Areas & Tiers for EDM

EVALUATION_QUESTIONS3TYPES_OF_NEEDS3 OTHER_CATALOG3

EVALUATION_ANSWERS3

ETL Decomposition Process: Topic Areas & Tiers for EDM

EVALUATION_QUESTIONS3TYPES_OF_NEEDS3 OTHER_CATALOG3

EVALUATION_ANSWERS3

Development progress follows an iterative and mostly top-down

The ETL decomposition process requires an entity having both a

This functional data model applies additional data entity DEPENDENCY LAYERS as TIERS

TIERSDATA CLASSES

DEPENDENCY LAYERS as TIERS

TIERSDATA CLASSES

DEPENDENCY LAYERS as TIERS

TIERSDATA CLASSES

1,n

1,n 0,n

1,n TRAINING_NEEDS_ACTUAL

EVALUATION_QUESTIONS3TYPES_OF_NEEDS3

PRIORITY

SOLUTION

<Undefined>

<Undefined>

OTHER_CATALOG3EVALUATION_ANSWERS3

1,n

1,n 0,n

1,n TRAINING_NEEDS_ACTUAL

EVALUATION_QUESTIONS3TYPES_OF_NEEDS3

PRIORITY

SOLUTION

<Undefined>

<Undefined>

OTHER_CATALOG3EVALUATION_ANSWERS3 iterative and mostly top-down

approach: one topic area, one

requires an entity having both a

class and a theme. For each

entity the process allocates: a) a

applies additional data entity

classes giving it increased ETL

flexibility. The classes follow a T00

Metadata(static) T00

Metadata(static) T00

Metadata(static) EVA-4 TNA-5

1,n1,n TRAINING_NEEDS_ACTUAL

TNA_NO_NEEDS_VAALUE

TNA_SUGGESTIONS_DESC

TNA_OBJECTIVE_DESC

...

<Undefined>

<Undefined>

<Undefined>

TRAININGS_EVALUATIONS

EVA-4 TNA-5

1,n1,n TRAINING_NEEDS_ACTUAL

TNA_NO_NEEDS_VAALUE

TNA_SUGGESTIONS_DESC

TNA_OBJECTIVE_DESC

...

<Undefined>

<Undefined>

<Undefined>

TRAININGS_EVALUATIONS

approach: one topic area, one

module, one tier at a time, although

units within same tier can be

entity the process allocates: a) a

tier corresponding to a class; b) flexibility. The classes follow a

step-wise approach whereby they T01

DimensionsT01

DimensionsT01

Dimensions

TMA-2

EVA-4 TNA-5

ORGANISATION_HIERARCHY3

WORKFLOW_STATUS6TMA-2

EVA-4 TNA-5

ORGANISATION_HIERARCHY3

WORKFLOW_STATUS6 units within same tier can be

developed in parallel. Once there is

tier corresponding to a class; b)

a topic area corresponding to a step-wise approach whereby they

become increasingly volatile T01base, details & struct)

Facts

T01base, details & struct)

Facts

T01base, details & struct)

Facts

TRA-3

T1SUB_TIME SUP_TIME

0,n1,nSUB SUP

sup sub

TRAINING_MAP_ACTUAL

TMA_DOSSIER_ID

TMA_STATUS_DATE

<Undefined>

<Undefined>TIMES3

TIMES_HIERARCHY3

WORKFLOW_STATUS5

WFS_SID

WFS_CODE

<Undefined>

<Undefined>

ORGANISATION_HIERARCHY3

ROOMS3 TRA-3

T1SUB_TIME SUP_TIME

0,n1,nSUB SUP

sup sub

TRAINING_MAP_ACTUAL

TMA_DOSSIER_ID

TMA_STATUS_DATE

<Undefined>

<Undefined>TIMES3

TIMES_HIERARCHY3

WORKFLOW_STATUS5

WFS_SID

WFS_CODE

<Undefined>

<Undefined>

ORGANISATION_HIERARCHY3

ROOMS3developed in parallel. Once there is a Reference Model (built prototype)

a topic area corresponding to a theme; and c) a priority

become increasingly volatile(data susceptibility to change).

Facts(events & profiles) T02

Facts(events & profiles) T02

Facts(events & profiles) T02

T0

T1

T2 1,n0,n

0,n

0,n

TMA_STATUS_DATE

TMA_VERSION_NO

TMA_VERSION_START_DATE

TMA_VERSION_END_DATE...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

TIME_SID

TIME_DATE

<Undefined>

<Undefined>TRAINING_MAP_EXERCISE3

TME_SID

TME_NAME

<Undefined>

<Undefined>

ORGANISATIONS3

ORG_SID

ORG_CODE

...

<Undefined>

<Undefined>COURSE_DETAILS3T0

T1

T2 1,n0,n

0,n

0,n

TMA_STATUS_DATE

TMA_VERSION_NO

TMA_VERSION_START_DATE

TMA_VERSION_END_DATE...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

TIME_SID

TIME_DATE

<Undefined>

<Undefined>TRAINING_MAP_EXERCISE3

TME_SID

TME_NAME

<Undefined>

<Undefined>

ORGANISATIONS3

ORG_SID

ORG_CODE

...

<Undefined>

<Undefined>COURSE_DETAILS3

a Reference Model (built prototype)

for a Tier work unit, estimates for all

theme; and c) a priority

corresponding to prevailing

business needs. An entity defines

(data susceptibility to change).

Since data in lower classes is

used to derive new data in upper T03

Grids(associations) T03

Grids(associations) T03

Grids(associations)

T0

T40,n

0,n0,n

Owner

1,n

0,nDetails0,n1,n1,n

TME_NAME

...

<Undefined>

STATUTORIES3

COURSES3

COURSE_SID <Undefined>TRAININGS_ATTENDANCE_PAG

T0

T40,n

0,n0,n

Owner

1,n

0,nDetails0,n1,n1,n

TME_NAME

...

<Undefined>

STATUTORIES3

COURSES3

COURSE_SID <Undefined>TRAININGS_ATTENDANCE_PAG

for a Tier work unit, estimates for all

units can be based on the reference

model and adjusted according to

business needs. An entity defines

the work unit in which it is used to derive new data in upper

classes, the data volatility in T04

Densification &Conformations T04Densification &Conformations T04Densification &Conformations Details

0,n Participant

1,n 1,n 1,n

1,n

1,n

PARTICIPANTS3

PTCD_SID

PTC_LAST_NAME

...

<Undefined>

<Undefined>

PARTICIPANTS_PROFILES_ACTUAL

PPA_ONE_VALUE

PPA_VALID_START_DATE

PPA_VALID_END_DATE

PPA_ACTUAL_FLAG

<Undefined>

<Undefined>

<Undefined>

<Undefined>

ADMIN_POSITIONS3

COURSE_SID

COURSE_CODE

<Undefined>

<Undefined>

PARTICIPANT_STATUS3

SESSION_DURATION

SESSION_EQUIVALENT_VALUE

SESSION_EVENT_DATE

SESSION_VIRTUAL_FLAG...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

Details

0,n Participant

1,n 1,n 1,n

1,n

1,n

PARTICIPANTS3

PTCD_SID

PTC_LAST_NAME

...

<Undefined>

<Undefined>

PARTICIPANTS_PROFILES_ACTUAL

PPA_ONE_VALUE

PPA_VALID_START_DATE

PPA_VALID_END_DATE

PPA_ACTUAL_FLAG

<Undefined>

<Undefined>

<Undefined>

<Undefined>

ADMIN_POSITIONS3

COURSE_SID

COURSE_CODE

<Undefined>

<Undefined>

PARTICIPANT_STATUS3

SESSION_DURATION

SESSION_EQUIVALENT_VALUE

SESSION_EVENT_DATE

SESSION_VIRTUAL_FLAG...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

model and adjusted according to

variable difficulty factors. Once

the work unit in which it is

contained. A work unit is the classes, the data volatility in

lower class will be less than that

Data MartsT05

T04Conformations(ballasting & mappings)

Data MartsT05

T04Conformations(ballasting & mappings)

Data MartsT05

T04Conformations(ballasting & mappings)

T2T1T3

0,n 0,n

Details

0,n

0,n

1,n

1,n

PARTICIPANTS_DETAIL3

PPA_ACTUAL_FLAG <Undefined>JOBS3

CATEGORIES_GRADES3

TRAININGS_ROSTER_PTG

FINAL_EXAM_MARK

INSCRIPTION_DATE...

<Undefined>

<Undefined>T2

T1T30,n 0,n

Details

0,n

0,n

1,n

1,n

PARTICIPANTS_DETAIL3

PPA_ACTUAL_FLAG <Undefined>JOBS3

CATEGORIES_GRADES3

TRAININGS_ROSTER_PTG

FINAL_EXAM_MARK

INSCRIPTION_DATE...

<Undefined>

<Undefined>

variable difficulty factors. Once completed & tested, work units

contained. A work unit is the smallest functional artefact

lower class will be less than that in upper classes. Volatility

Data Marts(lov, hier & msr)

T05

T06T07

Data Marts(lov, hier & msr)

T05

T06T07

Data Marts(lov, hier & msr)

T05

T06T07

T1

T30,n 0,n

1,n

1,n

1,n

RESOURCE_CENTERS3 TRAININGS_ACTUAL

COST

NO_SESSION_VALUE

<Undefined>

<Undefined>

CATEGORIES_GRADES3

CGR_SID

CGR_CODE

...

<Undefined>

<Undefined>

DOMAINS3

T1

T30,n 0,n

1,n

1,n

1,n

RESOURCE_CENTERS3 TRAININGS_ACTUAL

COST

NO_SESSION_VALUE

<Undefined>

<Undefined>

CATEGORIES_GRADES3

CGR_SID

CGR_CODE

...

<Undefined>

<Undefined>

DOMAINS3

completed & tested, work units

modules are then promoted to next determining granularity of

resource allocation. Regrouping

in upper classes. Volatility

determines which data flows are

performed in parallel or in T08T09

Segments & KPI

(compositions & formula)

T08T09

Segments & KPI

(compositions & formula)

T08T09

Segments & KPI

(compositions & formula)

PPA-1

T1

T1T2

0,n 1,n

1,n

1,n

1,n0,n

NO_SESSION_VALUE

DURATION_CALCUL_VALUE

DURATION_MANUAL_VALUE

NO_MAX_PARTICIPANTS

...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

...

CITIES3

COURSE_TYPES3

PPA-1

T1

T1T2

0,n 1,n

1,n

1,n

1,n0,n

NO_SESSION_VALUE

DURATION_CALCUL_VALUE

DURATION_MANUAL_VALUE

NO_MAX_PARTICIPANTS

...

<Undefined>

<Undefined>

<Undefined>

<Undefined>

...

CITIES3

COURSE_TYPES3dev. environment.resource allocation. Regrouping

work units is as follows: Work performed in parallel or in

sequence. This principle T99

Topic Area

PromotionT99

Topic Area

PromotionT99

Topic Area

PromotionTOPIC

PPA-1 T1

T1TIERS TOPICTIERS

1,n

COURSE_STATUS3TRAINERS3

TOPIC

PPA-1 T1

T1TIERS TOPICTIERS

1,n

COURSE_STATUS3TRAINERS3

work units is as follows: Work

unit >> Module >> Topic Area >> sequence. This principle

drastically reduces ETL execution Conclusion: The advantages of implementing such a topological architecture include: greater T99Promotion

T99Promotion

T99Promotion

T1T1unit >> Module >> Topic Area >> Enterprise Data Model (EDM).

drastically reduces ETL execution and development time.

Conclusion: The advantages of implementing such a topological architecture include: greater scalability of additional data themes, enhanced performance of data flows, increased resilience of

decoupled artifacts, sturdier quality control, and lower operating and development costs. The Concept of Tiers and Topic Areas is borrowed from R. Hughes’ book: The Agile Data Warehousing, iUniverse Inc, Bloomington, 2008

decoupled artifacts, sturdier quality control, and lower operating and development costs. The

framework provides a comprehensive, reproducible, and proven DW architecture solution.The 6 classes are mapped to 9 tiers that maintain volatility.

Warehousing, iUniverse Inc, Bloomington, 2008 framework provides a comprehensive, reproducible, and proven DW architecture solution.

Ecole Centrale Paris, Châtenay-MalabryEcole Centrale Paris, Châtenay-Malabry