data warehouse part 02 - university of houstonsmiertsc/4397cis/data_warehouse_part… ·  ·...

26
Data Warehouse – Part 02 Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Mining: A Tutorial Based Primer by Roiger and Geatz 1

Upload: lamngoc

Post on 01-May-2018

218 views

Category:

Documents


1 download

TRANSCRIPT

Data Warehouse – Part 02

Based on Chapter 06 The Data Warehouse in Data-Mining: A Tutorial-Based Primer by Roiger and Mining: A Tutorial Based Primer by Roiger and

Geatz

1

Data Warehouse PurposeData Warehouse Purpose

House data for decision supportpp

Support organizational decision making – so that it can be fact-based instead of ad-hoc

2

Decision Support CategoriesDecision Support CategoriesReportingAnalyzingKnowledge Discovery

3

Sample of Credit Card Promotion Data (f T bl 2 3)(from Table 2.3)Income Range

Magazine Promo

Watch Promo

Life InsPromo

CC Ins Sex AgeRange Promo Promo Promo

40-50K Yes No No No Male 45

30-40K Yes Yes Yes No Female 40

40 0 l 4240-50K No No No No Male 42

30-40K Yes Yes Yes Yes Male 43

50-60K Yes No Yes No Female 38

20-30K No No No No Female 55

30-40K Yes No Yes Yes Male 35

20-30K No Yes No No Male 2720 30K No Yes No No Male 27

30-40K Yes No No No Male 43

30-40K Yes Yes Yes No Female 41

4

Credit Card Purchases and Promotions C t ll ti D igConstellation Design

5

Online Analytical Processing (OLAP)Online Analytical Processing (OLAP) Query-based methodology that supports data analysis

OLAP engine structures data as a cube A cube can have more than three dimensions – as the term cube

is used in business intelligence/data warehousingis used in business intelligence/data warehousing

6

Find the Total Sales by Product by Year d b R giand by Region

RegionSouth

Mythic World

Central

Product

World

2005

7 Year

2005

Data CubesData Cubes

http://www.info-source.us/data_warehousing_mining/Data-

8

http://zeesql.wordpress.com/2008/05/21/data-cubes/

_ g_ gMining-and-Data-Warehousing-in-Biology-Medicine-and-Health-Care.html

Data Cube CharacteristicsData Cube Characteristics Designed for a specific

purpose

For four dimensions, visualize multiple cubes visualize multiple cubes with same three dimension, but each cube represents a pparticular value of the fourth dimension

E l Extrapolate to n dimensions

9

http://zeesql.wordpress.com/2008/05/21/data-cubes/

Data Cube CharacteristicsData Cube Characteristics Cubes with many empty

cells are not as useful Thus, a cube with two time

dimensions is not a good dimensions is not a good design, b/c intersection of quarter and month would b f be often empty

http://www.info-source.us/data_warehousing_mining/Data-

10

_ g_ gMining-and-Data-Warehousing-in-Biology-Medicine-and-Health-Care.html

Data Store Behind Data CubeData Store Behind Data Cube

Relational MultidimensionalRelational Multidimensional

Star schema

Advantage user can view

Arrays

Advantage query speed Advantage: user can view data at detail level defined by star schema

Advantage: query speed

y

11

OLAP InterfacesOLAP Interfaces Many are emerging – especially interfaces designed for visual

exploration

Default interface is a spreadsheet workbook format

OLAP f l f l OLAP useful functionality Different views of data Statistical calculationsStatistical calculations Drill-down and reverse drill down (or roll-up) Look at data at a more granular (detail) level or vice-versa

Short video in right panel demoing OLAP interface: http://www.softwarefx.com/Extensions/featuresOlap.aspx

12

SliceSliceA slice is a subset of a multi-dimensional array corresponding to a single value gfor one or more members of the dimensions not in the subset.

http://www.practicaldb.com/blog/cubes/

DiceDice

The dice The dice operation is a slice on more than two dimensions of

d t b a data cube (or more than two two consecutive slices))

OLAP Concept ExampleOLAP Concept Example Credit card purchase data

Month = Dec.

Region = TwoCategory = Vehicle

Count = 110Amount = 6,720Region Two

Total amount and Dec.

Sep.

Oct.

Nov.

Total amount and total number of

vehicle purchases in region two for the

May

Jun.

Jul.

Aug.

pM

onth

month of December

Mar.

Feb.

Apr.

y

Jan.

FourThreeTwo

Supe

rmar

ket

Mis

cella

neou

s

Res

taur

ant

Trav

el

Ret

ail

Vehi

cle

Category

RegionOne

wo

Figure 6.6 A multidimensional cube for credit card purchases

Category

Attributes May Be Based on Concept HierarchyConcept Hierarchy

17

LocationLocation

Excel Pivot TablesExcel Pivot Tables Accomplish the cube concept aggregate your information show a new perspective

htt // ti tl /5 i t ti / h k /l http://www.timeatlas.com/5_minute_tips/chunkers/learn_to_use_pivot_tables_in_excel_2007_to_organize_data

19

Excel Pivot Table Example p 1Excel Pivot Table – Example – p. 1 Open CreditCardPromotion.xlsx

Copy the original data to a new worksheet In order to preserve the original data

bl k l Remove any blank columns or rows

Each column must have a heading

C ll h ld b l f d f h d Cells should be properly formatted for the data type

Highlight the data

20

Excel Pivot Table Example p 2Excel Pivot Table – Example – p. 2 Click Insert

Select Pivot Table

Select Pivot Table to open the Create Pivot Table dialog box

Select Table/Range to make sure you selected the correct range

S l t N W k h t b tt Select New Worksheet button

Click Ok

21

Excel Pivot Table Example p 3Excel Pivot Table – Example – p. 3 Select Income-Range for

row labels

Select Income Range for valuesvalues

Click on Count of Income Rangeg

Go to Field Setting

Choose % of column setting

22

Excel Pivot Table Example p 4Excel Pivot Table – Example – p. 4 Highlight the percentages Total

Select Insert Pie Chart

20-30K

30-40K

40-50K

50-60K

23

Excel Pivot Table Example p 5Excel Pivot Table – Example – p. 5 Check out the drill-down functionality

Double click in the pivot table on the % value for a particular income range

Th d l l f h d l d The detail values for that income range are displayed in a new worksheet

24

Continue With Page 205Continue – With Page 205 Creating a Multidimensional Pivot Table

25

Data Warehouse – Part 02

Based on Chapter 06 The Data Warehouse in Data-Mining: A Tutorial-Based Primer by Roiger and Mining: A Tutorial Based Primer by Roiger and

Geatz

26