mis2502: data analytics - temple mis...mar 09, 2018  · dimensional data modeling ... could you...

31
MIS2502: Data Analytics Dimensional Data Modeling JaeHwuen Jung [email protected] http://community.mis.temple.edu/jaejung

Upload: others

Post on 13-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

MIS2502:Data AnalyticsDimensional Data Modeling

JaeHwuen [email protected]

http://community.mis.temple.edu/jaejung

Page 2: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Where we are…

Transactional Database

Analytical Data Store

Stores real-time transactional data

Stores historical transactional and

summary data

Data entry

Data transformation

Data analysis

Now we’re here…

Page 3: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Comparing Transactional and Analytical Data Stores

Transactional Database Analytical Data Store

Based on Relationalparadigm

Based on Dimensionalparadigm

Storage of real-timetransactional data

Storage of historical transactional data

Optimized for storage efficiency and data integrity

Optimized for data retrieval and summarization

Supports day-to-dayoperations

Supports periodic and on-demand analysis

Page 4: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Some terminology

• Takes many forms

• Really is just a repository for historical data

Data Warehouse

• Subset of the Data Warehouse

• Designed for specific analysisData Mart

• Organization of data as a “multidimensional matrix”

• Implementation of a Data MartData Cube

Page 5: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

The Actual Process

Transactional Database 2

Transactional Database 1

Other Sources

ETL

ETL

ETL

Analytical Data Store

Data Mart(Finance)

Data Mart(Sales)

Data Mart(Inventory)

Data Warehouse

Data Cube

Page 6: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

How they all relate

The data in the

transactional database…

…is put into a data

warehouse…

…which feeds the data mart…

…and is analyzed as a

cube.

We’ll start here.

Page 7: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

The Data Cube

• Core component of Analytical Data Storeand Multidimensional Data Analysis

• Made up of “facts” and “dimensions”

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

Product

Store

M&MsDiet

CokeDoritos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2013

Feb. 2013

Mar. 2013

Quantity sold and total price are measured facts.Why isn’t product price a measured fact?

Cheetos

Page 8: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

The Data Cube

• Fact (What happened)– Quantity

– Total price

– Number of promotions

– Etc.

• Dimension(Attributes that describes what happened)– When/where/what

product was sold

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

Product

Store

M&MsDiet

CokeDoritos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2013

Feb. 2013

Mar. 2013

Cheetos

Page 9: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

The Data Cube

quantity quantity quantity quantity

quantity quantity quantity quantity

quantity quantity quantity quantity

quantity quantity quantity quantity

Product

Store

M&MsDiet

CokeDoritos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2013

Feb. 2013

Mar. 2013

The highlighted element represents the quantity M&Ms were sold in Ardmore, PA in

January, 2013

A single summary record representing a

business event (monthly sales).

Cheetos

Page 10: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Slicing the Data

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

Product

Store

M&MsDiet

CokeDoritos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2013

Feb. 2013

Mar. 2013

The highlighted elements represent

Cheetos cookies sold on Temple’s Main

campus from January to March, 2013

This is called “slicingthe data.”

Cheetos

Page 11: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

Product

Store

M&MsDiet

CokeDoritos Cheetos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2013

Feb. 2013

Mar. 2013

What do the orange highlighted elements

represent?

What do the purple highlighted elements

represent?

Slicing the Data

Page 12: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Could you have a data mart with five dimensions?

Then why does our example (and most others) only have three?

Page 13: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Modeling a data cube:

Page 14: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Designing the Star Schema

Kimball’s Four Step Process for Data Cube Design (Kimball et al., 2008)

1. Choose the business process

2. Decide on the level of granularity

3. Identify the dimensions

4. Identify the fact

http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/four-4-step-design-process/

Page 15: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Choose the business process• Business processes are the operational activities

performed by your organization

• Determined by the questions you want to answer about your organization

Question Business Process

Who is my best customer? Sales

What are my highest selling products? Sales

Which supplier is offering us the best deals?

Purchasing

Which type of promotions work the best? Marketing

Note that a “business process” is not always about business.

Page 16: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Decide on the level of granularity

• Level of detail for each business process event

• Will determine the data in the dimensions

• Example: What are the highest sellingproducts?

– Choices for time: yearly, quarterly, monthly, daily

– Choices for store: store, city, state– Choices for product: product, category

How would you select the right granularity?

Page 17: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Identify the dimensions

• Description of the context of the business process

• The key elements of the process needed to answer the question (“fact”)– who, what, where, when, why, and

how

• Example: Sales transaction– A “sale” is the fact– Dimensions: product, store, and

time– Could this data mart tell you

• What is the best selling product?• Who is the best customer?

Page 18: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Identify the factThe fact table contains data called facts associated with the business process event

Keys

• Primary key for each event

• Foreign keys for the associated dimensions

• Example: Sales has Sales_IDas primary key, and Product_ID, Store_ID, and Time_ID as foreign keys

Facts: Measured, numeric data

• Facts: Quantifiable information for each business event – almost always numeric

• Describes a particular combination of dimensional data

• Example: Sales has quantity_sold and total_price.

Page 19: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Modeling a data cube: The Star Schema

ProductProduct_ID

Product_NameProduct_Price

Product_Weight

StoreStore_ID

Store_AddressStore_City

Store_StateStore_Type

TimeTime_ID

DayMonth

Year

Fact

Dimension

Dimension

Dimension

Transactional databases aren’t built around dimensions

• They don’t map well to cubes

• They aren’t set up for summarization

So we build a star schema

• Built around “dimensions” and “facts”

• Simplified relational model

The star schema facilitates

• Aggregating individual transactions

• Creation of cubes

SalesSales_ID

Product_IDStore_IDTime_ID

Quantity SoldTotal Price

Page 20: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Fact Table

• Contain facts (numeric measurements) associated with a specific business process

• Contain foreign keys that refer to dimension tables

Fact Sales

Sales_IDProduct_ID

Store_IDTime_ID

Quantity SoldTotal Price

Page 21: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Dimension Tables

• Contain text and descriptive information

• Provide the “who, what, where, when, why, and how” context surrounding a business process event

ProductProduct_ID

Product_NameProduct_Price

Product_Weight

StoreStore_ID

Store_AddressStore_City

Store_StateStore_Type

TimeTime_ID

DayMonth

Year

Fact

Dimension

Dimension

Dimension

SalesSales_ID

Product_IDStore_IDTime_ID

Quantity SoldTotal Price

Page 22: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

From Star Schema to Data Cube

A Cube typically uses a Star Schema as its source

and stores precomputed summarized (aggregated) data

Much more efficient, but can’t be changed (non-volatile)

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

quantity

& total

price

Product

Store

M&MsDiet

CokeDoritos

Famous

Amos

Ardmore,

PA

Temple

Main

Cherry Hill,

NJ

King of

Prussia, PAJan. 2012

Feb. 2012

Mar. 2012

Page 23: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Advantages of data cube

• Fast response to give you the information you have previously designed in the cube

Speed

• The data multi-dimensional data structure allows the data to be analyzed in the most logical way.

Analysis

Page 24: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Data cube caveats

• The cube is “non volatile,” so you’re locked in

– Measured facts

– Dimensions

– Granularity

• So choose wisely!

– For example: You can’t track daily sales if “date” is monthly

– So why not include every single sale and do no aggregation?

Page 25: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Pivot tables in Excel

• PivotTable is a data summarization tool in Excel

– the easiest way to learn multidimensional data and generate simple reports

• Data cubes can act as the data source for Pivot Table in Excel

Page 26: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Summary

• Data warehouse vs. data mart vs. data cube

• Star schema

• Kimball’s four step process for data mart design

• Pivot tables in Excel

Page 27: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

ICA #9

• In ICA #9, we learned to how to create a pivot table in Excel

– Identify which fields are assigned as VALUES and which ones are assigned as ROWS

– Identify the correct function for aggregation: e.g., SUM, COUNT, AVERAGE, MAX, MIN

Page 28: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

The star schema in ICA #9

• Measured Fact: Order amount

• Three dimensions: Salesperson, Country, and Time.

Order

amount

Order

amount

Order

amount

Order

amount

Order

amount

Order

amount

Order

amount

Order

amount

Order

amount

Country

Salesperson

USA

Bob

Buchanan

Martha

Suyama

Paul

Peacock

Susan

Leverling

7/10/2008

Order

amount

UK

7/11/2008

7/12/2008

7/15/2008

7/16/2008

7/17/2008

Susan

Leverling

…more

salespeople

…more time

periods

Page 29: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Pivot Table and Data Cube

• The fields in the ROWS box correspond to dimensions in a data cube

• The fields in the VALUES box correspond to measured facts in a data cube

Page 30: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Example 1

Dimension Measured Fact

Page 31: MIS2502: Data Analytics - Temple MIS...Mar 09, 2018  · Dimensional Data Modeling ... Could you have a data mart with five dimensions? Then why does our example (and most others)

Example 2

Dimensions Measured Fact