fact table design
DESCRIPTION
this document is regarding explanation of data warehouse tables. Mainly it focuses on fact table design how to perform. This document is very useful data warehouse architects.TRANSCRIPT
![Page 1: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/1.jpg)
48
Degenerate Dimensions
• Situation:– in many cases line items group into ‚containers‘
(invoice, receipt, order, bill of lading,..)
• Problem:– facts are stored at the level of line items– attributes of the container dimension are spread all over
the design (customer, time,..)– what to do with the container key?
• Solution:– put it as an attribute into the fact table
![Page 2: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/2.jpg)
49
Four steps to fact table design
• Choose the data mart• declare the fact table grain• choose the dimensions• choose the facts
![Page 3: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/3.jpg)
50
Choosing the data mart• Consider the subject area and the available data
sources:– single-source data marts
• purchase orders• shipments• sales
– multiple-source data marts• customer profitability
• start your portfolio of data marts with a single source data mart !
![Page 4: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/4.jpg)
51
Declaring the fact table grain
• In general it should be chosen as low as possible• Typical choices
– transactions• each sales transaction• each ATM transaction
– line items• each line item on each order• each line item on each shipment‘s invoice
![Page 5: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/5.jpg)
52
Choosing the Dimensions
• A minimal set of dimensions is straight forward from the grain– e.g. dimensions for order line items:
• order date, customer, product, • order number (degenerate dimension)
• Others may be added, if data is available– e.g. dimensions for order line items:
• delivery date• order status• delivery mode
![Page 6: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/6.jpg)
53
Choosing the Facts
• Again, the grain gives some hints– transaction fact tables often have just one fact
• e.g. the amount of money withdrawn at an ATM
– line item fact tables often store several facts• e.g. quantity, gross amount, discount, net amount and tax
• Facts should always be specific to the grainAggregates are stored in separate tables
![Page 7: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/7.jpg)
54
Core Fact Table of a Bank
Quelle: Kimball, Ralph The Data Warehouse - Lifecycle Toolkit, NY, 1998, S . 204
![Page 8: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/8.jpg)
55
Storing Facts for Checking Accounts
Quelle: Kimball, Ralph (1998), S . 205
![Page 9: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/9.jpg)
56
Low Level Facts and Aggregation• The most important facts are low level data
(transactions & line items)• aggregations are only stored (in addition) to
improve performance– aggregates: correspond to the results of
roll-up operations (not known to the user)– snapshots: are concerned with status information
at certain points in time (known to the user)• what was the average number of transactions last month?• How do costs of this october compare to costs of october
2001?
![Page 10: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/10.jpg)
57
A Companion ATM Snapshot Schema to the ATM Transaction Schema
Quelle: Kimball, Ralph (1998), S . 210
![Page 11: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/11.jpg)
58
A Frequent Use of Snapshots
• Snapshots are often kept in a rolling horizonmanner (e.g. monthly snapshots for 36 months)
• A 37th snapshot is incrementally built by adding the effect of each days transactions (at least for the additive facts).
• At the end of the month, the semi-additive facts are computed, and the oldest snapshot is being replaced by the new one
![Page 12: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/12.jpg)
59
The Aggregate Navigator
Quelle: Kimball, Ralph (1998), S . 384
![Page 13: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/13.jpg)
60
Design Goals for Aggregates• Aggregates must be stored in their own fact tables,
separate from the low level data; each aggregation level must occupy ist own unique fact table
• The dimension tables attached to the aggregates should be shrunken versions of the base level dimension tables wherever possible
• The low level schema and the derived aggregate schemas must be associated together as a family of schemas so that the navigator knows which tables are related to each other
![Page 14: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/14.jpg)
61
Low Level Sales Schema
Quelle: Kimball, Ralph (1998), S . 558
![Page 15: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/15.jpg)
62
Aggregate along the Product Dimension
Quelle: Kimball, Ralph (1998), S . 559
![Page 16: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/16.jpg)
63
Further Aggregation
Quelle: Kimball, Ralph (1998), S . 562
![Page 17: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/17.jpg)
64
Aggregate Navigation Algorithm
• Sort alle schemas of the family from smallest to largest row count of the fact tables and consider them in this order
• Compare the attributes in the SQL statement to the table fields in the fact and dimension tables(headers). If all attributes can be found, alter the table names in the original SQL statement, otherwise consider the schema with the next larger fact table
• Run the altered SQL statement
![Page 18: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/18.jpg)
65
Example: Original SQL Statement
• Select p.category, sum(f.dollar_sales), sum(f.dollar.cost)
from sales_fact f, product p, time t, store swhere f.product_key = p.product_key
and f.time_key = t.time_keyand f.store_key = s.store_keyand p.department = ‚food‘and t.day_of_week = ‚Saturday‘and s.floor_plan_type = ‚Super Market‘
• group by p.category
![Page 19: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/19.jpg)
66
Example: Modified SQL Statement• Select p.category, sum(f.dollar_sales),
sum(f.dollar.cost)from sales_fact_by_category f, category p,
time t, store swhere f.product_key = p.product_key
and f.time_key = t.time_keyand f.store_key = s.store_keyand p.department = ‚food‘and t.day_of_week = ‚Saturday‘and s.floor_plan_type = ‚Super Market‘
• group by p.category
![Page 20: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/20.jpg)
67
Modeling Events in a Factless Fact Table
Quelle: Kimball, Ralph (1998), S . 213
![Page 21: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/21.jpg)
68
Another example: promotion events
Quelle: Kimball, Ralph (1998), S . 215
![Page 22: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/22.jpg)
69
Levels in Database Design• The semantic model (e.g. ERM)
– formal description of the information used in an enterprise, independent of all physical considerations
• The logical model (e.g. relational schema)– description of the information used in an enterprise
based on a specific data model, but still independent of physical considerations
• The physical model– implementation of the database on secondary storage;
file organization, indexes for efficient access, integrity constraints, security measures
![Page 23: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/23.jpg)
70
ADAPT (Application Design for Analytical Processing Technology)
• Rich notation to represent multidimensional structures
• four step design process:
problem analysisand -definition
definition ofhypercubes
analysis of indicatorsand derived data
aggregationoperators
datasources
modeling ofdimensions
![Page 24: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/24.jpg)
71
ADAPT: high level model• Objects
• example
hypercube
dimension 1dimension 2
dimension N f () aggregationoperator hierarchy
datasource
unit_price
f () unit_price *items_sold
sales
items_sold
timeproductgeography
timeproductgeography
timeproductgeography
![Page 25: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/25.jpg)
72
ADAPT: Types of Dimensions
• Problems:– I feel that a feature dimension rather is an attribute
outside the hierarchy– These types are not disjoint. The product dimension, for
example, very often is a slowly changing dimension.
aggregating dimension(e.g. product)
sequential dimension(e.g. time)
feature dimension(e.g. location of a store in the geography dimension)
slowly changingdimension
indicator dimension(e.g. units of measurement)
tupel dimension(factless fact table)
![Page 26: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/26.jpg)
73
ADAPT: Elements of Dimensions
{ } attribute value
{ } dimensionalattribute
{ } subset of adimension
non-dimensionalattribute
subset of a hypercube
![Page 27: Fact Table Design](https://reader038.vdocuments.site/reader038/viewer/2022102714/5695d0a31a28ab9b02934292/html5/thumbnails/27.jpg)
74
ADAPT: More Detailed Modelsales
timeproductgeography
time product geography
time hierarchy
{ } year
{ } month
{ } day
product hierarchy
{ } department
{ } product
geographyhierarchy
{ } country
{ } district
{ } store
location
{ } downtown
{ } suburb
{ } out of town
# parking lots