competitive (business) intelligence systems the road to denormalization (starring charlie sheen...

19
Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Upload: scott-mitchell

Post on 17-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Competitive (Business)Intelligence Systems

The Road to Denormalization(starring Charlie Sheen & other

Random Celebrities)

Page 2: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

The Road to Denormalization

Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized

Data Warehouse

TransxDataTransx

DataTransxData

Page 3: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Normalization

But before you can understand Denormalization, you must understand Normalization . . .

And to understand Normalization, you must understand Relational Databases

I’ve beenDenormalized!

Page 4: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Relational Databases

Collection of linked tables

Tables linked by Primary Key / Foreign Key relationships (Referential Integrity)

Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number)

Foreign Key – column in child table that links to the Primary Key in the parent table

Page 5: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Relational DB Example

Cust # Cust Name100 Moe101 Larry102 Curly

Order # Prod# Qty Cust#1 QR22 1 1002 QR22 25 1003 SB56 3 102

CUSTOMER TABLE ORDER TABLE

Primary Key Foreign Key

“Parent” table . . . “Child” table . . .

Page 6: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Database Structure & Design

2 Approaches:

1. Optimize forData Capture

i.e., CapturingTransactions

2. Optimize forData Access

i.e., Queries & Reporting

Conflict

I loveconflict!

Page 7: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Approach #1: Optimize for Data Capture

To optimize for data capture, you must:• Eliminate redundancy of data (or else wasted space &

processing occurs)

• Ensure data integrity (or else data anomalies)

• Ensure that changes in data (modifications, deletions, etc. only have to happen in one place)

Normalization – process by which a database is optimized for data capture• All data “redundancy” is removed from Database

• Has multiple forms (0, 1st, 2nd, 3rd, et al.)

Page 8: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Moving from 0NF to 1NFRule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further)

Cust # CustName100, 101, 102 Moe Howard,

Larry Fine, Curly Howard

CUSTOMER DATA

ONF

1NFCust # FName LName100 Moe Howard101 Larry Fine102 Curly Howard

CUSTOMER TABLE

I’M NOTMOVING!

Page 9: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Moving from 1NF to 2NFRule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign)

Cust # FName Order#100 Moe 1100 Moe 2101 Larry 3

TABLE X

1NF

Cust # FName100 Moe101 Larry102 Curly

Order # Cust#1 1002 1003 101

CUSTOMER TABLE ORDER TABLE

2NF

100 Moe100 Moe

Dependency on Primary Key

Page 10: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Moving from 2NF to 3NFRule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column)

Cust # City Order# ShipTime100 NY 1 2 days101 NY 2 2 days102 LA 3 5 days

TABLE X

2NF NY 2 daysNY 2 days

Dependency b/t 2 non-key columns

City # City ShipTime10 NY 2 days20 LA 5 days

Cust # City#100 10101 10102 20

SHIP TIME TABLE CUSTOMER TABLE

3NF

Page 11: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Normalized DB Example

11

MANY database tablesensure against redundantdata (and help prevent data integrity issues)

Page 12: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Database Structure & Design

2 Approaches:

1. Optimize forData Capture

i.e., CapturingTransactions

2. Optimize forData Access

i.e., Queries & Reporting

Conflict

Page 13: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Approach #2: Optimize for Data Access(in a separate, read-only Data Warehouse)

To optimize for data access, you must:• Change the data layout to a different structure

• Allow data redundancy

• Reduce the number of table joins (i.e., reduce links among tables by combining tables)

Denormalizing – Adding redundancy & reducing joins in a relational database

Page 14: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Denormalizing – Most Common Approach

Star Schema (Clustering)• Fact (core or transaction) Tables in middle of star

• Dimensional (structural or “lookup”) Tables around “points” of star

Order # Date Cust# Prod# Loc#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000

SALES ORDER (FACT) TABLE

Cust # CustName100 Moe101 Larry102 Curly

CUSTOMER DIMENSIONTABLE

Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch

PRODUCT DIMENSIONTABLE

Loc # LocName1000 NY2000 LA3000 PGH

LOC DIMENSIONTABLE

Date Quarter06/29/XX 2 Bob06/30/XX 2 Sue07/01/XX 3

DATE DIMENSIONTABLE

Page 15: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

These 2 tables become the “SALES FACT” table in the Data Warehouse

These 3 tablesbecome the

“Customer Dimension”

These 5 tables become the

“Product Dimension”

This Date Field helpsbuild the “Date

Dimension”

Page 16: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Resulting Star Schema Data Warehouse

Order # Date Cust# Prod# Rep#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000

SALES ORDER (FACT) TABLE

Cust # CustName100 Moe101 Larry102 Curly

CUSTOMER DIMENSION

Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch

PRODUCT DIMENSION

Date Quarter06/29/XX 2 Bob06/30/XX 2 Sue07/01/XX 3 Juan

DATE DIMENSION

Hey, hot stuff!

Page 17: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Common (Conformed) Dimensions

Denormalizing (continued)Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse

Prod# ProdName Stock Date Units QR22 Rake 03/23/XX 150 TW43 Mulch 04/15/XX 1452 SR56 Spade 05/01/XX 997

INVENTORY (FACT) TABLE

ORDER TABLE

Cust # CustName100 Moe101 Larry102 Curly

CUSTOMER DIMENSION

Prod # ProdNameQR22 RakeSR56 SpadeTW43 Mulch

PRODUCT DIMENSION

Loc # LocName1000 NY2000 LA3000 PGH

LOC DIMENSION

CUSTOMER TABLETIME

Order # Date Cust# Prod# Loc#1 06/15/XX 100 QR22 10002 07/19/XX 100 QR22 10003 08/30/XX 101 SR56 2000

Date Quarter06/29/XX 206/30/XX 2 S07/01/XX 3 Juan

SALES ORDER (FACT) TABLE

DATE DIMENSION

Page 18: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Mapping Normalized Tables to Denormalized (Data Warehouse) TablesUsing ETL Tools (like MS-SSIS)

These are 2 NormalizedTransaction Tables

EXTRACT

The data are “Transformed”in these steps

TRANSFORM

This is the resulting,Denormalized

Product Dimension

LOAD

Page 19: Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

The End

That’s all!Bye, bye!