fact table design

27
48 Degenerate Dimensions • Situation: in many cases line items group into ‚containers‘ (invoice, receipt, order, bill of lading,..) • Problem: facts are stored at the level of line items attributes of the container dimension are spread all over the design (customer, time,..) what to do with the container key? • Solution: put it as an attribute into the fact table

Upload: amol-palkar

Post on 29-Jan-2016

245 views

Category:

Documents


8 download

DESCRIPTION

this document is regarding explanation of data warehouse tables. Mainly it focuses on fact table design how to perform. This document is very useful data warehouse architects.

TRANSCRIPT

Page 1: Fact Table Design

48

Degenerate Dimensions

• Situation:– in many cases line items group into ‚containers‘

(invoice, receipt, order, bill of lading,..)

• Problem:– facts are stored at the level of line items– attributes of the container dimension are spread all over

the design (customer, time,..)– what to do with the container key?

• Solution:– put it as an attribute into the fact table

Page 2: Fact Table Design

49

Four steps to fact table design

• Choose the data mart• declare the fact table grain• choose the dimensions• choose the facts

Page 3: Fact Table Design

50

Choosing the data mart• Consider the subject area and the available data

sources:– single-source data marts

• purchase orders• shipments• sales

– multiple-source data marts• customer profitability

• start your portfolio of data marts with a single source data mart !

Page 4: Fact Table Design

51

Declaring the fact table grain

• In general it should be chosen as low as possible• Typical choices

– transactions• each sales transaction• each ATM transaction

– line items• each line item on each order• each line item on each shipment‘s invoice

Page 5: Fact Table Design

52

Choosing the Dimensions

• A minimal set of dimensions is straight forward from the grain– e.g. dimensions for order line items:

• order date, customer, product, • order number (degenerate dimension)

• Others may be added, if data is available– e.g. dimensions for order line items:

• delivery date• order status• delivery mode

Page 6: Fact Table Design

53

Choosing the Facts

• Again, the grain gives some hints– transaction fact tables often have just one fact

• e.g. the amount of money withdrawn at an ATM

– line item fact tables often store several facts• e.g. quantity, gross amount, discount, net amount and tax

• Facts should always be specific to the grainAggregates are stored in separate tables

Page 7: Fact Table Design

54

Core Fact Table of a Bank

Quelle: Kimball, Ralph The Data Warehouse - Lifecycle Toolkit, NY, 1998, S . 204

Page 8: Fact Table Design

55

Storing Facts for Checking Accounts

Quelle: Kimball, Ralph (1998), S . 205

Page 9: Fact Table Design

56

Low Level Facts and Aggregation• The most important facts are low level data

(transactions & line items)• aggregations are only stored (in addition) to

improve performance– aggregates: correspond to the results of

roll-up operations (not known to the user)– snapshots: are concerned with status information

at certain points in time (known to the user)• what was the average number of transactions last month?• How do costs of this october compare to costs of october

2001?

Page 10: Fact Table Design

57

A Companion ATM Snapshot Schema to the ATM Transaction Schema

Quelle: Kimball, Ralph (1998), S . 210

Page 11: Fact Table Design

58

A Frequent Use of Snapshots

• Snapshots are often kept in a rolling horizonmanner (e.g. monthly snapshots for 36 months)

• A 37th snapshot is incrementally built by adding the effect of each days transactions (at least for the additive facts).

• At the end of the month, the semi-additive facts are computed, and the oldest snapshot is being replaced by the new one

Page 12: Fact Table Design

59

The Aggregate Navigator

Quelle: Kimball, Ralph (1998), S . 384

Page 13: Fact Table Design

60

Design Goals for Aggregates• Aggregates must be stored in their own fact tables,

separate from the low level data; each aggregation level must occupy ist own unique fact table

• The dimension tables attached to the aggregates should be shrunken versions of the base level dimension tables wherever possible

• The low level schema and the derived aggregate schemas must be associated together as a family of schemas so that the navigator knows which tables are related to each other

Page 14: Fact Table Design

61

Low Level Sales Schema

Quelle: Kimball, Ralph (1998), S . 558

Page 15: Fact Table Design

62

Aggregate along the Product Dimension

Quelle: Kimball, Ralph (1998), S . 559

Page 16: Fact Table Design

63

Further Aggregation

Quelle: Kimball, Ralph (1998), S . 562

Page 17: Fact Table Design

64

Aggregate Navigation Algorithm

• Sort alle schemas of the family from smallest to largest row count of the fact tables and consider them in this order

• Compare the attributes in the SQL statement to the table fields in the fact and dimension tables(headers). If all attributes can be found, alter the table names in the original SQL statement, otherwise consider the schema with the next larger fact table

• Run the altered SQL statement

Page 18: Fact Table Design

65

Example: Original SQL Statement

• Select p.category, sum(f.dollar_sales), sum(f.dollar.cost)

from sales_fact f, product p, time t, store swhere f.product_key = p.product_key

and f.time_key = t.time_keyand f.store_key = s.store_keyand p.department = ‚food‘and t.day_of_week = ‚Saturday‘and s.floor_plan_type = ‚Super Market‘

• group by p.category

Page 19: Fact Table Design

66

Example: Modified SQL Statement• Select p.category, sum(f.dollar_sales),

sum(f.dollar.cost)from sales_fact_by_category f, category p,

time t, store swhere f.product_key = p.product_key

and f.time_key = t.time_keyand f.store_key = s.store_keyand p.department = ‚food‘and t.day_of_week = ‚Saturday‘and s.floor_plan_type = ‚Super Market‘

• group by p.category

Page 20: Fact Table Design

67

Modeling Events in a Factless Fact Table

Quelle: Kimball, Ralph (1998), S . 213

Page 21: Fact Table Design

68

Another example: promotion events

Quelle: Kimball, Ralph (1998), S . 215

Page 22: Fact Table Design

69

Levels in Database Design• The semantic model (e.g. ERM)

– formal description of the information used in an enterprise, independent of all physical considerations

• The logical model (e.g. relational schema)– description of the information used in an enterprise

based on a specific data model, but still independent of physical considerations

• The physical model– implementation of the database on secondary storage;

file organization, indexes for efficient access, integrity constraints, security measures

Page 23: Fact Table Design

70

ADAPT (Application Design for Analytical Processing Technology)

• Rich notation to represent multidimensional structures

• four step design process:

problem analysisand -definition

definition ofhypercubes

analysis of indicatorsand derived data

aggregationoperators

datasources

modeling ofdimensions

Page 24: Fact Table Design

71

ADAPT: high level model• Objects

• example

hypercube

dimension 1dimension 2

dimension N f () aggregationoperator hierarchy

datasource

unit_price

f () unit_price *items_sold

sales

items_sold

timeproductgeography

timeproductgeography

timeproductgeography

Page 25: Fact Table Design

72

ADAPT: Types of Dimensions

• Problems:– I feel that a feature dimension rather is an attribute

outside the hierarchy– These types are not disjoint. The product dimension, for

example, very often is a slowly changing dimension.

aggregating dimension(e.g. product)

sequential dimension(e.g. time)

feature dimension(e.g. location of a store in the geography dimension)

slowly changingdimension

indicator dimension(e.g. units of measurement)

tupel dimension(factless fact table)

Page 26: Fact Table Design

73

ADAPT: Elements of Dimensions

{ } attribute value

{ } dimensionalattribute

{ } subset of adimension

non-dimensionalattribute

subset of a hypercube

Page 27: Fact Table Design

74

ADAPT: More Detailed Modelsales

timeproductgeography

time product geography

time hierarchy

{ } year

{ } month

{ } day

product hierarchy

{ } department

{ } product

geographyhierarchy

{ } country

{ } district

{ } store

location

{ } downtown

{ } suburb

{ } out of town

# parking lots