Data Warehouse Design @ UOWData Warehouse Design @ UOW
Techniques, Tips, Mistakes & Lessons Learnt
Techniques, Tips, Mistakes & Lessons Learnt
AgendaAgenda
Background of the Performance Indicators Team
Techniques & Technical Design Process
Mistakes & Lessons Learnt
Background of the Performance Indicators Team
Techniques & Technical Design Process
Mistakes & Lessons Learnt
BackgroundBackgroundThe Performance Indicators Project (PIP) was formed in October 2006
Vision:
A transformed University of Wollongong
that gives all decision-makers access to
accurate, relevant
and shared information,
in a quick and secure manner,
that allows them to
plan, monitor, analyse & manage
the performance of the university
The Performance Indicators Project (PIP) was formed in October 2006
Vision:
A transformed University of Wollongong
that gives all decision-makers access to
accurate, relevant
and shared information,
in a quick and secure manner,
that allows them to
plan, monitor, analyse & manage
the performance of the university
Techniques & Technical Design Process
Techniques & Technical Design Process
Technical Design Process
Techniques adopted at UOW
Technical Design Process
Techniques adopted at UOW
Technical Design ProcessTechnical Design ProcessReview Business RequirementsReview Business Requirements
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Create Source ER Diagram
Review Business Requirements
Gain Access to source systems
Research source system
Create Source ER Diagram
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Create Source ER Diagram
Create Logical Dimensional Model
Review Business Requirements
Gain Access to source systems
Research source system
Create Source ER Diagram
Create Logical Dimensional Model
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Review Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Cube Model
Review Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Cube Model
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Cube Model
Create Metrics Dictionary
Review Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Cube Model
Create Metrics Dictionary
Technical Design ProcessTechnical Design ProcessReview Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Transformer Model
Create Metrics Dictionary
Create Business Glossary
Review Business Requirements
Gain Access to source systems
Research source system
Document Source ER Diagram
Document Logical Dimensional Model
Document Physical Dimensional Model
Document Transformer Model
Create Metrics Dictionary
Create Business Glossary
Start Development!!!Start Development!!!
Now that all the paper work is done we can finally start developmentNow that all the paper work is done we can finally start development
Techniques Adopted at UOWTechniques Adopted at UOW
DimensionsConformed DimensionsOutrigger tables
FactsTypes of Fact tablesGrainBridge Tables
DimensionsConformed DimensionsOutrigger tables
FactsTypes of Fact tablesGrainBridge Tables
DimensionsDimensions
Represent characteristics of an objectSurrogate KeyBusiness KeysDescriptorsHierarchies & RollupsOther Attributes
Represent characteristics of an objectSurrogate KeyBusiness KeysDescriptorsHierarchies & RollupsOther Attributes
Student Key (PK)
Student Number (BK)
Student Surname
Student First Name
Conformed DimensionsConformed Dimensions
One copy of a dimension shared across subject areas
One copy of a dimension shared across subject areas
Organisational Structure Dimension
Finance Facts
Research Facts
Staff Facts
Student Facts
Conformed DimensionsConformed Dimensions
ChallengeMultiple source systems using different keys to represent the same thing
SolutionWear the pain – Mapping filesBusiness buy in – Sell the advantages
ChallengeMultiple source systems using different keys to represent the same thing
SolutionWear the pain – Mapping filesBusiness buy in – Sell the advantages
How to Avoid Snowflaking Dimensions
How to Avoid Snowflaking Dimensions
At times there will be a logical relationship between dimensions that may cause a star schema to snowflake
At UOW we had the example of the relation between Organisational Structure Dimension and Cost Centre Dimension
At times there will be a logical relationship between dimensions that may cause a star schema to snowflake
At UOW we had the example of the relation between Organisational Structure Dimension and Cost Centre Dimension
The Solution - Outrigger TablesThe Solution - Outrigger Tables
Cost Centre Dimension
Organisation Structure Dimension
Financial Facts
Cost Centre Dimension Organisation Structure Dimension
Financial Facts
Fact TablesFact Tables
Fact tables (in star schemas) typically hold a heap of surrogate keys joining back to dimensions with numerical data representing some type of measurement
Fact tables (in star schemas) typically hold a heap of surrogate keys joining back to dimensions with numerical data representing some type of measurement
Date (FK)
Student key (FK)
Subject key (FK)
EFTSL
Types of Fact tablesTypes of Fact tables
TransactionNumber of publications
Periodic SnapshotMonthly FTE
Accumulating SnapshotNot currently used at UOW
TransactionNumber of publications
Periodic SnapshotMonthly FTE
Accumulating SnapshotNot currently used at UOW
Grain of Fact tablesGrain of Fact tables
Need to clearly Identify the level of detail within the fact tableOne grain per fact table at the lowest level of detail for flexibility
Need to clearly Identify the level of detail within the fact tableOne grain per fact table at the lowest level of detail for flexibility
The problem with grain?The problem with grain?
ScenarioThe lowest atomic level of detail needs to be broken up even further
Examplea single publication can be broken down even further to author percentages for a publication
ScenarioThe lowest atomic level of detail needs to be broken up even further
Examplea single publication can be broken down even further to author percentages for a publication
Methods to Resolve GrainMethods to Resolve Grain
Create a bridging tableRatio the FactCreate a bridging tableRatio the Fact
Bridging TablesBridging Tables
Publications Fact
Date
Publication Key
Organisational Key
Number of Publications
Bridge Table
Publication Key
Author Key
Author Percentage
Author Dimension
Author Key
Author Name
Author DOB
Ratio FactsRatio Facts
Publications Fact
Date
Publication Key
Organisational Key
Author Key
Author Percentage
Number of Publications
Author Dimension
Author Key
Author Name
Author DOB
How Does it Change the Grain?How Does it Change the Grain?
Date Publication key Organisation key Number of Publications
01/01/08 128 WFACEDU100 1
Date Publication key
Organisation key
Author key
Author Percent
Number of Publications
01/01/08 128 WFACEDU100 103456 30 .3
01/01/08 128 WFACEDU100 9421 50 .5
01/01/08 128 WFACEDU100 89632 20 .2
Date DimensionDate Dimension
An important dimension to get correct – it will be used in every data mart in data warehouse
Almost everything we want to measure or record will have a date associated with it
An important dimension to get correct – it will be used in every data mart in data warehouse
Almost everything we want to measure or record will have a date associated with it
The Date Dimension is a database table in the data warehouse that allows you to “Roll Up” Facts via a Date Hierarchy
The Date Dimension is a database table in the data warehouse that allows you to “Roll Up” Facts via a Date Hierarchy
What is The Date Dimension?What is The Date Dimension?
1..1
1..n
Date Dimension
Publications FactDate Publication
SkeyStaff SkeyOrg Unit Skey
01/01/2008
13901/01/2008
01/01/2008
239
349
Publication Count
1
1
1
Dest Points Count
0
0
1
Date Dimension UsageDate Dimension Usage
The ProblemThe Problem
Date Publication SkeyStaff SkeyOrg Unit Skey
01/01/2008
13901/01/2008
239
349
Publication Count
1
1
1
Dest Points Count
0
0
1
?
• Not all fact data always has a valid date
• Sometimes we want to report on facts that have no date
UOW’s MistakeUOW’s Mistake
We designed our date dimension and fact tables with a DATE data type as the joining key
We designed our date dimension and fact tables with a DATE data type as the joining key
1..1
1..n
Date Dimension
Publications Fact
We have no way of representing unknown dates within our date hierarchy
Since our date dimension must join via a valid date, and we must report on records with no valid date, we are forced pick a date with business logic to represent these situations
For example 01/01/1950 is the date we have used for Unknown Dates
We have no way of representing unknown dates within our date hierarchy
Since our date dimension must join via a valid date, and we must report on records with no valid date, we are forced pick a date with business logic to represent these situations
For example 01/01/1950 is the date we have used for Unknown Dates
UOW’s MistakeUOW’s Mistake
Design the dimension with the flexibility to allow non-date records for representing the exceptions
Every dimension should be able to represent:
UnknownNot Yet DeterminedNot Applicable
Design the dimension with the flexibility to allow non-date records for representing the exceptions
Every dimension should be able to represent:
UnknownNot Yet DeterminedNot Applicable
The SolutionThe Solution
Date Dimension
Publications Fact
1..1
1..n
The SolutionThe Solution
Date YearQuarterMonth
2008Quarter 1January20080101
INTEGER VARCHAR2 VARCHAR2 VARCHAR2
-1 UnknownUnknownUnknown
-2 Not Yet Determined
Not Yet Determined
Not Yet Determined
-3 Not ApplicableNot ApplicableNot Applicable
Date Skey
4
INTEGER
1
2
3
Design the dimension with the flexibility to represent the exceptions
Every dimension should be able to represent:
UnknownNot Yet DeterminedNot Applicable
Design the dimension with the flexibility to represent the exceptions
Every dimension should be able to represent:
UnknownNot Yet DeterminedNot Applicable
Lesson LearntLesson Learnt
Many issues to resolve in creating conformed dimensions with clean hierarchies
Unbalanced HierarchiesHierarchies with infinite possible levelsFact data at different levels of hierarchyFact data which could be at any level of hierarchy
Many issues to resolve in creating conformed dimensions with clean hierarchies
Unbalanced HierarchiesHierarchies with infinite possible levelsFact data at different levels of hierarchyFact data which could be at any level of hierarchy
Issues With HierarchiesIssues With Hierarchies
The ProblemThe Problem
Different source systems don’t always implement the same hierarchy exactly the same way
Different source systems don’t always implement the same hierarchy exactly the same way
The ProblemThe Problem
Fact Data at different levels of a hierarchyFact Data at different levels of a hierarchy
Publications Fact Data
RFCD Dimension
The ProblemThe Problem
Facts data that could be at any level of a hierarchyFacts data that could be at any level of a hierarchy
RFCD Dimension
UOW’s MistakeUOW’s Mistake
Designed RFCD Dimension for Publications data mart without considering how it could be used in the future, and how different systems could implement this hierarchy
Forced to totally redesign and reimplimentdimension in order to be conformed across new data marts
Designed RFCD Dimension for Publications data mart without considering how it could be used in the future, and how different systems could implement this hierarchy
Forced to totally redesign and reimplimentdimension in order to be conformed across new data marts
The SolutionThe Solution
Division Code Discipline DescDiscipline CodeDivision Desc
Genetics270200Bio Sciences270000
Subject Code
270201
Subject Desc
Gene Expression
RFCD Skey
1
• “Dense” Balance the hierarchy
Division Code Discipline DescDiscipline CodeDivision Desc
L1_270000
GeneticsL2_270200Bio SciencesL1_270000
GeneticsL2_270200Bio Sciences
Unknown Bio SciencesL2_270000Bio Sciences
Subject Code
270201
270200
270000
Subject Desc
Gene Expression
Unknown Bio Sciences
Unknown Genetics
L1_270000
RFCD Skey
1
3
2
The SolutionThe Solution
Facts joining at different levels of a hierarchyFacts joining at different levels of a hierarchy
Publications Fact Data
RFCD Dimension
The SolutionThe Solution
Facts joining at any level of a hierarchyFacts joining at any level of a hierarchy
RFCD Dimension
Lesson LearntLesson Learnt
Don’t design a dimension without considering how it might be used by other data marts in the future
Know your data - anticipate issues with systems implementing common hierarchies in different ways
Don’t design a dimension without considering how it might be used by other data marts in the future
Know your data - anticipate issues with systems implementing common hierarchies in different ways
Questions?Questions?
Any Questions?
Contact information:
Brad Dixon – [email protected] Thomas – [email protected]
Any Questions?
Contact information:
Brad Dixon – [email protected] Thomas – [email protected]