book of lars frank, chapter 10, scd (slowly changing dimensions) :
Post on 31-Dec-2015
43 Views
Preview:
DESCRIPTION
TRANSCRIPT
Book of Lars Frank, Chapter 10, SCD (Slowly Changing Dimensions):
The hidden slides of this slideshow may be important. However, I will focus on leaning by exercises and therefore, rattling off new concepts are often done in hidden slides.
Introduction to Slowly Changing Dimensions (SCD)
Bank accounts
Branch-offices
- Account# - Interest-last-year - Cost-last-year - Branch#
- Branch# - Branch-name - Branch-size
Fact table Dimension
If the attributes of a dimension is dynamic (e.i. they may be updated) we say that they are slowly changing.
May the Branch-size of a Branch-office change after e.g. a renovation?May the Branch-name of a Branch-office change?
Exercise in SCD:
Soppose the attribute Branch-size is dynamic and aggregations is made to the level (Branch-size, Year) or (Branch-size, Month) .
Does this aggregation make sense and how would you solve possible problems?
Bank accounts
Branch-offices
- Account# - Interest-last-year - Cost-last-year - Branch#
- Branch# - Branch-name - Branch-size
Fact table Dimension
Exercise in SCD:
Soppose the attribute Branch-name is dynamic and aggregations is made to the level (Branch-name, Year).
Does this aggregation make sense and how would you solve possible problems?
Bank accounts
Branch-offices
- Account# - Interest-last-year - Cost-last-year - Branch#
- Branch# - Branch-name - Branch-size
Fact table Dimension
Problems with slowly changing dimensions:
TimeID
Branch Office
ProductID
…
Amount
Price
ProductID
Product name
Product group
Price category
Branch Office
Address
City
District
Size group
Value group
TimeID
Dayname
Week
Month
Quarter
Year
Day no
Working day
•If you do not update a dynamic attribute the datawarehouse is stale. •If you update a dynamic attribute the old measures may be aggregated to a wrong attribute level value as e.g. the Branch office size!
Which dimension attributes and relationships may be slowly changing and which of these give aggregation problems?
Response type Evaluation criteriaIs historical information preserved
Aggregation performance Storage consumption
Response 1 where dimension records are overwritten
No In the evaluation, we define this solution to have average performance
Only the current dimension record version is stored. No redundant data is stored
Response 2 where new versions are created
Yes Version records makes performance slower proportional to the number of changes
All old versions of dimension records are stored often with redundant attributes
Response 3 where only one historical version is saved
The current version and a single history destroying version are saved
No performance degradation occurs if either the current or the historical version are used in a query
Normally, only a single extra attribute version is stored
Response 4 that use the top of a dynamic dimension hierarchy as a new static dimension
Yes Better or worse depen-ding on whether both dimension tables are used in a query
The relatively large fact table must have an extra foreign key attribute
Response 5 with dimension data as fact data
Yes Better or worse depen-ding on whether the new fact data are used in a query
The relatively large fact table must have an extra attribute for each dynamic dimension attribute
Response 6 that use fine granularity in combination with response 1 or 3
The finer the granularity, the more historical state information is preserved
The finer the granularity, the slower the performance
The finer the granularity, the more storage consumption
Response 7 that stores dynamic dimension data as static facts in another data mart
Yes Better or worse depen-ding on whether both fact tables are used in a drill across query
This is the most storage consuming solution as at least a new fact and foreign key are stored in the new fact table
Kimball’s type 1 response:
Owerwrite the old value:
Bank account Fact- Account-ID- Time-ID- Branch-ID- Interest-last-month- Cost-last-month
Branch-office Dimension- Branch-ID- Branchname
Figure 3.2
Time Dimension- Time-ID- Monthname
Response 1 used with dimension attribute change:
2000
Quantity
001
……Bran-ID Centre
Br-Name
001
……Bran-ID
2000
Quantity
001
……Bran-ID
West
Br-Name
001
……ButikID
2000001
3500
Quantity
001
……Bran-ID
West
Br-Name
001
……ButikID
Sales fact table Branch office dimension
In response 2 you create a new version of the changed record:
Bran-ID … Quantity …
001 2000
Bran-ID … Bran-Size …
001 250
Bran-ID … Quantity …
001 2000
Bran-ID … Bran-Size …
001 250
002 450
Bran-ID … Quantity …
001 2000
002 3500
Bran-ID … Bran-Size …
001 250
002 450
How is it possible to aggregate to the fhysical Branch office level?
Sales fact table Branch office dimension
Exercise in SCD:
Soppose the attribute Branch-name and Branch-size use response type 1 and 2, respectively and are changed at the same time.
How is it in this situation possible not to preserve the historic Branch-name information as the this gives wrong name level aggregations?
Bank accounts
Branch-offices
- Account# - Interest-last-year - Cost-last-year - Branch#
- Branch# - Branch-name - Branch-size
Fact table Dimension
Exercise:What SCD responces will you recommend for the datawarehouses designed in the car rentel case of slideshow 1.
Customers
Car types
Reservations
Orders
Branch offices
Cars
GaragesGarage services
Pick up
Contracts
Car return
Kimball’s 3 responces to slowly changing dimensions :
1. Owerwrite the old value.
2. Create a new dimension record with the new value.
3. Create an extra attribute for the changed dimension value.
Kimball’s type 3 response:Create an extra attribute for the changed dimension
relationship.
Suppose the product group of a product may be changed.Does this solution make meaningful aggregations to the two group levels?
In response 3, you create a new version attribute:
Bran-ID … Quantity …
001 2000
Bran-ID … Old-Size New-Size …
001 250 250
Bran-ID … Quantity …
001 2000
Bran-ID … Quantity …
001 2000
001 3500
Bran-ID … Old-Size New-Size …
001 250 450
Bran-ID … Old-Size New-Size …
001 250 450
Order-line fact table Branch office dimension
Does this solution make meaningful aggregations to the two Size levels?
Response 3 should only be used for a new grouping criteria:
Prod-ID … Quantity …
001 2000
Prod-ID … Old-group New-group …
001 A
Prod-ID … Quantity …
001 2000
Prod-ID … Quantity …
001 2000
001 3500
Prod-ID … Old-group New-group …
001 A B
Prod-ID … Old-group New-group …
001 A B
Order-line fact table Product dimension
What is the difference between the Grouping update and the previous Branch size update as the Grouping aggregations functions well while the Branch-size aggregations does not give any meening?
Suppose the product group of a product may be changed.
Product dimension- Product-ID- Group-ID- Product-name
Orderdetail fact- Order-ID- Product-ID- Qty- Price
Productgroup dimension- Group-ID- Group-name
How would you implement SCD response 2 in this example?
Will SCD response 2 make meaningful aggregations if you want to compare product group sale over time?Will SCD response 3 make meaningful aggregations?
Exercise in when to preserve historic information.
Product dimension- Product-ID- Group-ID- Product-name
Orderdetail fact- Order-ID- Product-ID- Qty- Price
Productgroup dimension- Group-ID- Group-name
Exchange the Product dimension with a Branch office dimension and the Productgroup dimension with a Branch-Size dimension in the following example!
Will SCD response 2 make meaningful aggregations if you want to compare the sale of the Branch-Size over time?Will SCD response 3 make meaningful aggregations?
Notice!It may be both attribute and business dependent whether you want to preserve historic information or not.
Suppose the product group of a product may be updated.
Product dimension- Product-ID- Product-name- Group-ID- Group-name
Orderdetail fact- Order-ID- Product-ID- Qty- Price
Will the response type 1 give correct aggregations to the group level if you want to compare product group sale over time?
Suppose the product group of a product may be changed.
Product dimension- Product-ID- Product-name
Orderdetail fact- Order-ID- Product-ID- Group-ID- Qty- Price
Productgroup dimension- Group-ID- Group-name
Will the solution below give correct aggregations to the group level if you want to compare product group sale over time?
SCD Type 4 may be used in dynamic dimension hierarchies:Order Dimension- Order-ID- Ordertype. . .
Orderdetails Fact- Product-ID- Order-ID- Date-ID- Salesman-ID- Qty- Price
Time Dimension- Date-ID- Date- Month- Year- Holiday indication
Product Dimension- Product-ID- Product-name- Product-group-name
Salesman- Salesman-ID- Salesman-name- Salary-group-ID
Salary-Group- Salary-group-ID- Salary-name- Salary. . .
Dimension Hierachy
Figure 2.1
Suppose both salary group and product group are dynamic. Does this make SCD problems?
The Type 4 Responce:Dynamic relationships in a dimension hierarchy may be related directly to the fact table
Order Dimension- Order-ID- Ordertype. . .
Orderdetails Fact- Product-ID- Order-ID- Date-ID- Salesman-ID- Salary-group-ID- Product-group-ID- Qty- Price
Time Dimension- Date-ID- Date- Month- Year- Holiday indication
Product Dimension- Product-ID- Product-name
Salesman Dimension- Salesman-ID- Salesman-name. . .
Salary-GroupDimension- Salary-group-ID- Salary-name- Salary
Figure 3.1
Product-groupDimension- Product-group-ID- Product-group-name
SCD Type 5 store dynamic attributes in the fact table:
- Product#- Order#- Qty- Date#- Salesman#
Fact table
Orders
Orderdetails
Time
Products Salesmen
Dimension Dimension
Dimension
Dimension
- Product#- Product-name- Price
- Order#- Ordertype
- Salesman#- Salesman-name
- Date#- Date-Name
SCD Type 6 Responce:
Use fine granularity:
Bank account Fact- Account-ID- Time-ID- Branch-ID- Interest-last-month- Cost-last-month
Branch-office Dimension- Branch-ID- Branchname
Figure 3.2
Time Dimension- Time-ID- Monthname
The Type 7 Response: Store the Dynamic Dimension Data as Static Facts in another Mart.
Example Let us suppose a fact table stores the sale of products in a department store. In this example the department records may have an attribute with the number of salesmen as well as well as an attribute with the monthly costs of the departments. These attributes are dynamic!
Which response type would you recommend?
Time sheets per day per salesman per department
Orderdetails
- Product# - Order# - Qty
Fact table
Salesmen Products
- Product# - Product-name - Price - Group#
Product groups
- Group# - Group-name - Department#
Departments
- Salesman# - Salesman-name
Department# Department name No. of employes Department costs
Exercise: Select responses to SCD for theAirline DW.
Flight routes
Subroutes
Departures
Airports
Tickets
Travelarrangement
Customers
Airlinecompanies
Exercise: Select responses to SCD for the Hotel DW.
Hotels
Rooms
Room reservations
Services/ tours/ car rentals Check-in
periods
Customers Customer groups
Hotel chains
Exercise. Select responses to SCD for the travel agency.
Customers
Reservations
Orders
Departures/Hotel rooms/Car rentals/
etc.
Flight routes/Room types/Car types/
service types
Buyer
Bookings
Traveler
Product owners
Exercise.Design a datawarehouse for a promotion company.
Customers
Presentation blocks/types
Order lines
Orders
Logical promotions
Physical promotions
Promotion media
How is it possible to measure the results of promotions and where should these measures be stored in the data warehouse?
Exercise:
Design a DW for a commercial TV channel
HRM exercise:
Make some requerements for a HRM system and try to group them in OLTP and OLAP requerements.
Make an ER diagram for an OLTP database and one or more OLAP datamarts that can fulfill the requerements.
Design a datawarehouse for a bank:
It should be possible to analyze both costs and revenye for customers, households, branch offices, regions, account managers etc.
Exercise:
Design a datawarehouse for a housing association that let out flats, shops and office areas.
It is possible to sign up on vaiting lists for these.
Exercise:
Design et datawarehouse for DSB in order to deminish train delays.
Exercise:
Design a datawarehouse for stock exchange dealers in a bank.
Kimball’s type 2 response:Suppose an account shifts Branch relationship in the middle of the month. Will the aggregations be correct and how will you solve possible problems?
Bank account Fact- Account-ID- Date-ID- Branch-ID- Interest-last-month- Cost-last-month
Branch-office Dimension- Branch-ID- Branchname
Figure 3.2
Time Dimension- Date-ID- Monthname
Can you find more solutions?
Kimball’s type 2 response:
Suppose both the Branch relationship and the Branch-size are dynamic.How can aggregations be correct?
Bank account Fact- Account-ID- Date-ID- Branch-ID- Interest-last-month- Cost-last-month
Branch-office Dimension- Branch-ID- Branchname
Time Dimension- Date-ID- Monthname
End of session
Thank you !!!Thank you !!!
top related