data warehouse part 02 - university of houstonsmiertsc/4397cis/data_warehouse_part… · ·...
TRANSCRIPT
Data Warehouse – Part 02
Based on Chapter 06 The Data Warehouse in Data-Mining: A Tutorial-Based Primer by Roiger and Mining: A Tutorial Based Primer by Roiger and
Geatz
1
Data Warehouse PurposeData Warehouse Purpose
House data for decision supportpp
Support organizational decision making – so that it can be fact-based instead of ad-hoc
2
Sample of Credit Card Promotion Data (f T bl 2 3)(from Table 2.3)Income Range
Magazine Promo
Watch Promo
Life InsPromo
CC Ins Sex AgeRange Promo Promo Promo
40-50K Yes No No No Male 45
30-40K Yes Yes Yes No Female 40
40 0 l 4240-50K No No No No Male 42
30-40K Yes Yes Yes Yes Male 43
50-60K Yes No Yes No Female 38
20-30K No No No No Female 55
30-40K Yes No Yes Yes Male 35
20-30K No Yes No No Male 2720 30K No Yes No No Male 27
30-40K Yes No No No Male 43
30-40K Yes Yes Yes No Female 41
4
Online Analytical Processing (OLAP)Online Analytical Processing (OLAP) Query-based methodology that supports data analysis
OLAP engine structures data as a cube A cube can have more than three dimensions – as the term cube
is used in business intelligence/data warehousingis used in business intelligence/data warehousing
6
Find the Total Sales by Product by Year d b R giand by Region
RegionSouth
Mythic World
Central
Product
World
2005
7 Year
2005
Data CubesData Cubes
http://www.info-source.us/data_warehousing_mining/Data-
8
http://zeesql.wordpress.com/2008/05/21/data-cubes/
_ g_ gMining-and-Data-Warehousing-in-Biology-Medicine-and-Health-Care.html
Data Cube CharacteristicsData Cube Characteristics Designed for a specific
purpose
For four dimensions, visualize multiple cubes visualize multiple cubes with same three dimension, but each cube represents a pparticular value of the fourth dimension
E l Extrapolate to n dimensions
9
http://zeesql.wordpress.com/2008/05/21/data-cubes/
Data Cube CharacteristicsData Cube Characteristics Cubes with many empty
cells are not as useful Thus, a cube with two time
dimensions is not a good dimensions is not a good design, b/c intersection of quarter and month would b f be often empty
http://www.info-source.us/data_warehousing_mining/Data-
10
_ g_ gMining-and-Data-Warehousing-in-Biology-Medicine-and-Health-Care.html
Data Store Behind Data CubeData Store Behind Data Cube
Relational MultidimensionalRelational Multidimensional
Star schema
Advantage user can view
Arrays
Advantage query speed Advantage: user can view data at detail level defined by star schema
Advantage: query speed
y
11
OLAP InterfacesOLAP Interfaces Many are emerging – especially interfaces designed for visual
exploration
Default interface is a spreadsheet workbook format
OLAP f l f l OLAP useful functionality Different views of data Statistical calculationsStatistical calculations Drill-down and reverse drill down (or roll-up) Look at data at a more granular (detail) level or vice-versa
Short video in right panel demoing OLAP interface: http://www.softwarefx.com/Extensions/featuresOlap.aspx
12
SliceSliceA slice is a subset of a multi-dimensional array corresponding to a single value gfor one or more members of the dimensions not in the subset.
http://www.practicaldb.com/blog/cubes/
DiceDice
The dice The dice operation is a slice on more than two dimensions of
d t b a data cube (or more than two two consecutive slices))
Month = Dec.
Region = TwoCategory = Vehicle
Count = 110Amount = 6,720Region Two
Total amount and Dec.
Sep.
Oct.
Nov.
Total amount and total number of
vehicle purchases in region two for the
May
Jun.
Jul.
Aug.
pM
onth
month of December
Mar.
Feb.
Apr.
y
Jan.
FourThreeTwo
Supe
rmar
ket
Mis
cella
neou
s
Res
taur
ant
Trav
el
Ret
ail
Vehi
cle
Category
RegionOne
wo
Figure 6.6 A multidimensional cube for credit card purchases
Category
Excel Pivot TablesExcel Pivot Tables Accomplish the cube concept aggregate your information show a new perspective
htt // ti tl /5 i t ti / h k /l http://www.timeatlas.com/5_minute_tips/chunkers/learn_to_use_pivot_tables_in_excel_2007_to_organize_data
19
Excel Pivot Table Example p 1Excel Pivot Table – Example – p. 1 Open CreditCardPromotion.xlsx
Copy the original data to a new worksheet In order to preserve the original data
bl k l Remove any blank columns or rows
Each column must have a heading
C ll h ld b l f d f h d Cells should be properly formatted for the data type
Highlight the data
20
Excel Pivot Table Example p 2Excel Pivot Table – Example – p. 2 Click Insert
Select Pivot Table
Select Pivot Table to open the Create Pivot Table dialog box
Select Table/Range to make sure you selected the correct range
S l t N W k h t b tt Select New Worksheet button
Click Ok
21
Excel Pivot Table Example p 3Excel Pivot Table – Example – p. 3 Select Income-Range for
row labels
Select Income Range for valuesvalues
Click on Count of Income Rangeg
Go to Field Setting
Choose % of column setting
22
Excel Pivot Table Example p 4Excel Pivot Table – Example – p. 4 Highlight the percentages Total
Select Insert Pie Chart
20-30K
30-40K
40-50K
50-60K
23
Excel Pivot Table Example p 5Excel Pivot Table – Example – p. 5 Check out the drill-down functionality
Double click in the pivot table on the % value for a particular income range
Th d l l f h d l d The detail values for that income range are displayed in a new worksheet
24