marlon dumas marlon.dumas ät ut . ee · 2009-10-15 · marlon dumas marlon.dumas ät ut . ee. ......

34
MTAT 03 183 MTAT.03.183 Data Mining Week 7: Online Analytical Processing Week 7: Online Analytical Processing and Data Warehouses Marlon Dumas marlon.dumas ät ut . ee

Upload: others

Post on 25-Jun-2020

24 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

MTAT 03 183MTAT.03.183Data Mining

Week 7: Online Analytical ProcessingWeek 7: Online Analytical Processing and Data Warehouses

Marlon Dumas

marlon.dumas ät ut . ee

Page 2: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

AcknowledgmentAcknowledgment

Thi lid d k i “ h ” f th f ll i• This slide deck is a “mashup” of the following publicly available slide decks:

// / / / C– http://www.postech.ac.kr/~swhwang/grass/DataCube.ppthtt // l b d /k h/i ft864/R di /Sh– http://classweb.gmu.edu/kersch/inft864/Readings/Shoshani/DataCube/CubeNotesKerschberg.ppthttp://ohr gsfc nasa gov/wfstatistics/Data Cube Traini– http://ohr.gsfc.nasa.gov/wfstatistics/Data_Cube_Training.ppt

– http://www cs uiuc edu/homes/hanj/bk2/03 ppthttp://www.cs.uiuc.edu/homes/hanj/bk2/03.ppt

2

Page 3: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

OutlineOutline

• The “data cube” abstraction• Multidimensional data models• Data warehouses

3

Page 4: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Typical Data Analysis ProcessTypical Data Analysis Process

• Formulate a query to extract relevant information• Extract aggregated data from the databasegg g• Visualize the result to look for patterns.• Analyze the result and formulate new queries• Analyze the result and formulate new queries.• Online Analytical Processing (OLAP) is about

ti hsupporting such processes• OLAP characteristics: No updates, lots of

aggregation, need to visualize and to interact• Let’s first talk about aggregation…

4

gg g

Page 5: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Relational Aggregation OperatorsRelational Aggregation Operators

• SQL has several aggregate operators:SQL has several aggregate operators:– SUM(), MIN(), MAX(), COUNT(), AVG()

• The basic idea is:• The basic idea is:– Combine all values in a column

into a single scalar value 5into a single scalar value• Syntax

SELECT AVG(T )

5172 ?

AVG()

– SELECT AVG(Temp)FROM Weather;

. . .

?…

13

5IDSLab.

Page 6: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

The Relational GROUP BY OperatorThe Relational GROUP BY Operator

• GROUP BY allows aggregates over table sub-groupsgg g g p– SELECT Time, Altitude, AVG(Temp)FROM WeatherGROUP BY Time, Altitude;

AltitudeTime Latitude Longitude Altitude(m) Temp

07/9/5:1500 … … 20 24 Time Altitude(m) AVG(Temp)

07/9/5:1500 … … 20 22

07/9/5:1500 … … 100 17

07/9/5:1500 20 23

07/9/5:1500 100 17

07/9/9:1500 … … 50 19

07/9/9:1500 … … 50 21

07/9/9:1500 50 20

6IDSLab.

Page 7: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Limitations of the GROUP BYLimitations of the GROUP BY• Group-by is one-dimensional: one group per p y g p p

combination of the selected attribute values Does not give sub-totals Does not give sub totalsModel Year Color Sales

Ch 1994 Bl k 50Chevy 1994 Black 50

Chevy 1995 Black 85

Chevy 1994 White 40Chevy 1994 White 40

Chevy 1995 White 115

1 Calculate total sales per year1. Calculate total sales per year2. Compute total sales per year and per color 3. Calculate sales per year, per color and per model

7

3. Calculate sales per year, per color and per model

Page 8: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Grouping with Sub-Totals (Pivot table)Grouping with Sub-Totals (Pivot table)

Sales by Model by Year by Color• Sales by Model by Year by Color

• Note that sub-totals by color are missing, if added it becomes a cross tab lationbecomes a cross-tabulation

8

Page 9: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Grouping with sub totals (cross tab)Grouping with sub-totals (cross-tab)

9

Page 10: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Grouping with Sub Totals (Relational version)Grouping with Sub-Totals (Relational version)

Sub-totals by color are still missing…

10IDSLab.

y g

Page 11: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

SQL QuerySQL Query

11

Page 12: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Adding the colorsAdding the colors…

12

Page 13: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

CUBE and Roll Up OperatorsCUBE and Roll Up Operators

Group By (with total)Sum

Aggregate

Chevy Ford By ColorCross Tab

REDWHITEBLUE

By Color(with total)Sum

The Data Cube and The Sub-Space Aggregates

REDWHITEBLUE

y

By Make

y Co o

Sum

REDBy Make & Year

By MakeBy Year

ySum

REDWHITEBLUE

B C l

By Make & ColorBy Color & Year

Sum

13

By ColorSum

Page 14: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

The Cube

• An Example of 3D Data Cube

By YearBy Year By MakeBy Make

SumSum

14IDSLab. 14

By ColorBy Color

Page 15: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Cube: Each Attribute is a DimensionCube: Each Attribute is a Dimension

• N dimensional Aggregate (sum() max() )• N-dimensional Aggregate (sum(), max(),...)– Fits relational model exactly:

• a a a f()• a1, a2, ...., aN, f()

• Super-aggregate over N-1 Dimensional sub-cubes• ALL, a2, ...., aN , f()ALL, a2, ...., aN , f()• a3 , ALL, a3, ...., aN , f()• ...

ALL f()• a1, a2, ...., ALL, f()– This is the N-1 Dimensional cross-tab.

• Super aggregate over N 2 Dimensional sub cubes• Super-aggregate over N-2 Dimensional sub-cubes• ALL, ALL, a3, ...., aN , f()• ...

15• a1, a2 ,...., ALL, ALL, f()

Page 16: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

The Data Cube Concept

BW

19941995

MAKE

Ford

19941995

BW

Chevy

Black 1994

YEARBlack

White1995 F

CF

COLOR

FB

W

C1994

1995

16CW

Page 17: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Sub-cube Derivation

<M,Y,C>

<M,Y,*> <M,*,C> <*,Y,C>

<M,*,*> <*,Y,*> <*,*,C>M, , ,Y,

<*,*,*>

17• Dimension collapse, * denotes ALL

Page 18: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

CUBE OperatorPossible syntax

• Proposed syntax example:– SELECT Model, Make, Year, SUM(Sales)FROM SalesWHERE Model IN {“Chevy”, “Ford”}AND Year BETWEEN 1990 AND 1994AND Year BETWEEN 1990 AND 1994GROUP BY CUBE Model, Make, YearHAVING SUM(Sales) > 0;HAVING SUM(Sales) > 0;

– Note: GROUP BY operator repeats aggregate list• in select list• in group by list

18IDSLab. 18

Page 19: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Rollup OperatorRollup Operator

• ROLLUP Operator: special case of CUBE OperatorReturn “Sales Roll Up by Store by Quarter” in 1994.:SELECT Store, quarter, SUM(Sales)

FROM Sales

WHERE nation=“Korea” AND Year=1994

GROUP BY ROLLUP Store, Quarter(Date) AS quarter;

19IDSLab.

Page 20: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Cube Operator Example

SALES Model Year Color Sales Chevy 1990 red 5

DATA CUBE Model Year Color Sales ALL ALL ALL 942Chevy 1990 red 5

Chevy 1990 white 87 Chevy 1990 blue 62 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 blue 49

ALL ALL ALL 942 chevy ALL ALL 510 ford ALL ALL 432 ALL 1990 ALL 343 ALL 1991 ALL 314 ALL 1992 ALL 285 Chevy 1991 blue 49

Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 blue 71 Ford 1990 red 64 Ford 1990 white 62

ALL ALL red 165 ALL ALL white 273 ALL ALL blue 339 chevy 1990 ALL 154 chevy 1991 ALL 199 h 1992 ALL 157

CUBEFord 1990 blue 63 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 blue 55 Ford 1992 red 27 F d 1992 hit 62

chevy 1992 ALL 157 ford 1990 ALL 189 ford 1991 ALL 116 ford 1992 ALL 128 chevy ALL red 91 chevy ALL white 236Ford 1992 white 62

Ford 1992 blue 39chevy ALL white 236 chevy ALL blue 183 ford ALL red 144 ford ALL white 133 ford ALL blue 156 ALL 1990 red 69ALL 1990 red 69 ALL 1990 white 149 ALL 1990 blue 125 ALL 1991 red 107 ALL 1991 white 104 ALL 1991 blue 104

20ALL 1992 red 59 ALL 1992 white 116 ALL 1992 blue 110

Page 21: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Summary

• Problems with GROUP BY– GROUP BY cannot directly construct

• Pivot tables / roll-up reports• Cross Tabs• Cross-Tabs

• CUBE OperatorGeneralizes GROUP BY and Roll Up and Cross Tabs!!– Generalizes GROUP BY and Roll-Up and Cross-Tabs!!

21IDSLab. 21

Page 22: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Now let’s have a look at oneNow let s have a look at one…

• NASA Workforce cubes http://nasapeople.nasa.gov/workforce

• Btell demo reports– http://www.btell.de– Follow the “demo” link and start a demo, the go to

reports

22

Page 23: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Multidimensional DataMultidimensional Data

S l l f ti f d t th• Sales volume as a function of product, month, and region

Dimensions: Product Location TimeDimensions: Product, Location, TimeHierarchical summarization paths

I d t R i Y

t

Industry Region Year

Category Country Quarter

Prod

uc

Product City Month Week

P Office Day

23J. Han: Data Mining: Concepts and Techniques

Page 24: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Multidimensional Data ModelingMultidimensional Data Modeling

• Key conceptsKey concepts– Dimensions: entities along which data can be

aggregated (e.g. Car model, time, color)

– Measures: scalar values to be aggregated (e.g. number of sales)

• Two types of tables• Two types of tables– Dimension tables

– Fact table: stores measures and references to dimensions

24

dimensions

J. Han: Data Mining: Concepts and Techniques

Page 25: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Star SchemaStar Schematimetime_keydayday_of_the_week Sales Fact Table

item_keyitem name

item

y_ _ _monthquarteryear

Sa es act ab e

time_key

item key

item_namebrandtypesupplier type

location

item_key

branch_key

l ti k

pp _ yp

branchlocation_keystreetcity

location_key

units_soldbranch_keybranch_namebranch type

state_or_provincecountry

dollars_sold

avg_sales

branch_type

25MeasuresJ. Han: Data Mining: Concepts and Techniques

Page 26: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Snowflake SchemaSnowflake Schematime

ittime_keydayday_of_the_week Sales Fact Table

item_keyitem_name

item

supplier keysupplier

y_ _ _monthquarteryear

time_key

item key

brandtypesupplier_key

supplier_keysupplier_type

location

e _ ey

branch_key

location keybranchlocation_keystreetcity_key

location_key

units_sold

d ll ld

branch_keybranch_namebranch type citydollars_sold

avg_sales

M

branch_typecity_keycitystate or province

city

26Measures

state_or_provincecountry

J. Han: Data Mining: Concepts and Techniques

Page 27: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

OLTP vs. OLAPOLTP vs. OLAP

• OLTP – Online Transaction Processing– Traditional database technology– Many small transactions

(point queries: UPDATE or INSERT)– Avoid redundancy, normalize schemas– Access to consistent, up-to-date database

• OLTP Examples:p– Flight reservation– Banking and financial transactionsg– Order Management, Procurement, ...

• Extremely fast response times27

Extremely fast response times...

Carsten Binnig, ETH Zürich

Page 28: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

OLTP vs. OLAP

• OLAP – Online Analytical Processing– Big aggerate queries, no Updates– Redundancy a necessity (Materialized Views, special-

purpose indexes, de-normalized schemas)– Periodic refresh of data (daily or weekly)

• OLAP Examples– Decision support (sales per employee)pp ( p p y )– Marketing (purchases per customer)– Biomedical databases

• Goal: Response Time of seconds / few minutes

28Carsten Binnig, ETH Zürich

Page 29: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

OLTP vs. OLAP (Water and Oil)

• Lock Conflicts: OLAP blocks OLTP • Database design:Database design:

– OLTP normalized, OLAP de-normalized• Tuning, OptimizationTuning, Optimization

– OLTP: inter-query parallelism, heuristic optimization– OLAP: intra-query parallelism, full-fledged optimizationq y p , g p

• Freshness of Data: – OLTP: serializabilityOLTP: serializability– OLAP: reproducibility

• Integrity:Integrity: – OLTP: ACID – OLAP: Sampling, Confidence Intervals

29

p g,

Carsten Binnig, ETH Zürich

Page 30: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Solution: Data WarehouseSolution: Data Warehouse

S i l S db f OLAP• Special Sandbox for OLAP• Data input using OLTP systems• Data Warehouse aggregates and replicates data

(special schema)( p )• New Data is periodically uploaded to Warehouse

30Carsten Binnig, ETH Zürich

Page 31: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

DW ArchitectureDW Architecture

OLTP OLAPOLTP OLAP

OLTP Applications GUI, Spreadsheets

DB1Extract

DB1

DB2D t W h

TransformLoad(ETL)

DB3

Data Warehouse

31Carsten Binnig, ETH Zürich

Page 32: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

DW Products and ToolsDW Products and Tools

• Oracle 11g, IBM DB2, Microsoft SQL Server, ...– All provide OLAP extensions

• SAP Business Information Warehouse– ERP vendors

• MicroStrategy, Cognos (now IBM)– Specialized vendors– Kind of Web-based EXCEL

• Niche Players (e.g., Btell)– Vertical application domain

32

Page 33: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

Reference (highly recommended)Reference (highly recommended)

• Jim Gray et al. “Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals”. Data Mining and Knowledge Discovery 1(1), 1997.

• http://citeseer.ist.psu.edu/old/392672.html• Data Warehousing chapter of Jianwei Han’sData Warehousing chapter of Jianwei Han s

textbook (chapter 3)

33

Page 34: Marlon Dumas marlon.dumas ät ut . ee · 2009-10-15 · Marlon Dumas marlon.dumas ät ut . ee. ... OLTP vs. OLAPvs. OLAP • OLTP – Online Transaction Processing – Traditional

HomeworkHomework

• Exercises 1 and 4 at:– http://www.systems.ethz.ch/education/courses/fs09/d

ata-warehousing/ex2.pdf• Multidimensional data modeling exercise in

course Wiki pages

34