advanced querying olap data warehousing. database applications transaction processing –online...

44
Advanced Querying OLAP Data Warehousing

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Advanced Querying

OLAP

Data Warehousing

Page 2: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Database Applications

• Transaction processing– Online setting– Supports day-to-day operation of business

• Decision support– Offline setting– Strategic planning (statistics)

Page 3: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Transaction Processing

Transaction processing

• Operational setting• Up-to-date = critical

• Simple data

• Simple queries

Flight reservations

• ticket sales• do not sell a seat

twice• reservation, date,

name• Give flight details of X

List flights to Y

Page 4: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Transaction Processing

• Database must support– simple data

• tables

– simple queries• select from where …

– consistency & integrity CRITICAL– concurrency

• Relational databases, Object-Oriented, Object-Relational

Page 5: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Decision Support

Decision support

• Off-line setting• « Historical » data• Summarized data• Different databases

• Statistical queries

Flight company

• Evaluate ROI flights• Flights of last year• # passengers on line L• Passengers, fuel costs,

maintenance info• Average % of seats

sold/month/destination

Page 6: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

A decision support DB that is maintained separately from the organization’s operational databases.

Why Separate Data Warehouse?• High performance for both systems

– DBMS— tuned for OLTP • access methods, indexing, concurrency control, recovery

– Warehouse—tuned for OLAP • complex OLAP queries, multidimensional view, consolidation.

• Different functions and different data– Missing data: Decision support requires historical data which

operational DBs do not typically maintain– Data consolidation: DS requires consolidation (aggregation,

summarization) of data from heterogeneous sources– Data quality: different sources typically use inconsistent data

representations, codes and formats which have to be reconciled

Data Warehouse

Page 7: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Three-Tier Architecture

DataWarehouse

ExtractTransformLoadRefresh

OLAP Engine

Monitor&

IntegratorMetadata

Data Sources Front-End Tools

Serve

Data Marts

Operational DBs

other

sources

Data Storage

OLAP Server

Analysis

Query/Reporting

Data Mining

ROLAPServer

Page 8: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

OLAP

• OLAP = OnLine Analytical Processing– Online = no waiting for answers

• OLAP system = system that supports analytical queries that are dimensional in nature.

Page 9: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

This Lecture

• Examples of decision support queries

• Data Cubes– Conceptual data model– Typical operations

• Implementation– ROLAP vs MOLAP– Indexing structures

• SQL:1999 support for OLAP

Page 10: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Examples of Queries

• Flight company: evaluate ticket sales– give total, average, minimal, maximal amount– per date: week, month, year– by destination/source port/country/continent– by ticket type– by # of connections– …

Page 11: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Characteristics

• One special attribute: amount measure

• Other attributes: select relevant regions dimensions

Different levels of generality (month, year, …)

hierarchies

• Measure data is summarized: sum, min, max, average

aggregations

Page 12: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Supermarket example

• Evaluate the sales of products– Product cost in $– Customer: ID, city, state, country, – Store: chain, size, location, – Product: brand, type, …– …

• What are the measure and dimensional attributes, where are the hierarchies?

measure

Dim.hierarchies

Page 13: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Why dimensions?

customer

store

product

Cost in $

• Multidimensional view on the data

Page 14: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Cross Tabulation

• Cross-tabulations are highly useful– Sales of clothes JuneAugust ‘06

Blue Red Orange Total

June 51 25 158 234

July 58 20 120 198

August 65 22 51 138

Total 174 67 329 570

Product: color

Date:month,JuneAugust2006

Page 15: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data cubes

• Extension of Cross-Tables to multiple dimensions

• Conceptual notion

Blue Red Orange Total

June 51 25 158 234

July 58 20 120 198

August 65 22 51 138

Total 174 67 329 570

Dimensions

Data Points/1st level of aggregation

Aggregatedw.r.t. X-dim

Aggregatedw.r.t. Y-dim

Aggregatedw.r.t. X and Y

Page 16: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Cubes

Date

Produ

ct

Cou

ntr

ysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

Ireland

France

Germany

sum

Page 17: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Cubes

• Base cuboid = n-dimensional cube with n number of dimensions

• The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid

• The lattice of cuboids forms a data cube

Page 18: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Lattice of Cuboidsall

product, date, country

product date country

product, date product, countrydate, country

Page 19: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Operations with Data Cubes

Scenario:

• Before starting the analysis task:– what data?

• select a few relevant dimensions• define hierarchy• aggregation functions of interest

– Pre-materialize• load data• compute counts/max, min, avg, … on beforehand

Page 20: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Operations with Data Cubes

• What operations can you think of an analyst might find useful? (e.g., store)

Page 21: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Operations with Data Cubes

• What operations can you think of that an analyst might find useful? (e.g., store)– only look at stores in the Netherlands– look at cities instead of individual stores– look at the cross-table for product-date– restrict analysis to 2006, product O1– go back to a finer granularity at the store level

Page 22: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Roll-Up

• Move in one dimension from a lower granularity to a higher one– store city– cities country– product product type

Page 23: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Drill-down

• Move in one dimension from a higher granularity to a lower one– city store– country cities– product type product

• Drill-through:– go back to the original, individual data records

Page 24: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Pivoting

• Change the dimensions that are “displayed”; select a cross-tab.– look at the cross-table for product-date– display cross-table for date-customer

Page 25: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Slice & dice

• Select a part of the cube by restricting one or more dimensions– restrict analysis to “city = Eindhoven”

Page 26: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Summary of Concepts

• Cube: Multidimensional view on data– dimensional attributes– measure attribute

• Operations:– roll-up/drill-down– pivoting– slice and dice

Page 27: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Implementation

• To make query answering more efficient: consolidate (materialize) aggregations

• Obvious implementation: multidimensional array.– Fast lookup: cell(prod. p, date d, prom. pr):

• look up index of p1, index of d, index of pr:

index = (p x D x PR) + (d x PR) + pr

Page 28: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Implementation

• Multidimensional array– obvious problem: sparse data

can easily be solved, though.

Example:

binary search tree, key on index

hash table.

Page 29: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Implementation

• However: very quickly people were confronted with the Data Explosion Problem

Consolidating the summaries blows up the data enormously !

Reasons are often misunderstood and confusing.

Page 30: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Explosion Problem

• Why?

Suppose:– n dimensions, every dimension has d values– dn possible tuples.

– Number of cells in the cube: (d+1)n

– So, this is not the problem

Page 31: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Explosion Problem

• Why?

Suppose– n dimensions, every dimension has d values– every dimension has a hierarchy

– most extreme case: binary tree

2d possibilities/dimension

Page 32: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Explosion Problem

• Why?

Suppose– n dimensions, every dimension has d values– every dimension has a hierarchy– most extreme case: binary tree

2d possibilities/dimension 2n x dn cells

Only partial explanation (factor 2n comes from an extremely pathological case)

Page 33: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Explosion Problem

• Why?– The problem is that most data is not dense,

but sparse.– Hence, not all dn combinations are possible.

Example: 10 dimensions with 10 values– 10 000 000 000 possibilities

Suppose « only » 1 000 000 are present

Page 34: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Data Explosion Problem

Example: 10 dimensions with 10 values– 10 000 000 000 possibilities

Suppose « only » 1 000 000 are present

Every tuple increases count of 210 cells !

With hierarchies: effect even worse!

If every hierarchy has 5 items:

510 = 9 765 625 cells!

Page 35: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

View Selection Problem• Suffices to precompute some aggregates, and

compute others on demand.– aggregate on (item-name, color) from an aggregate

on (item-name, color, size) – For all but a few “non-decomposable” aggregates

such as median

• Several optimizations for computing multiple aggregates– Compute aggregate on (item-name, color) from an

aggregate on (item-name, color, size)

– Compute aggregates on (item-name, color, size), (item-name, color) and (item-name) in single DB sort

Page 36: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

View Selection Problemall

product, date, country

product date country

product, date product, countrydate, country

Page 37: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

View Selection Problemall

product, date, country

product date country

product, date product, countrydate, country

Which views to select:hard research problem !

Page 38: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Implementation

Nowadays systems can be divided in three categories:– ROLAP (Relational OLAP)

• OLAP supported on top of a relational database

– MOLAP (Multi-Dimensional OLAP)• Use of special multi-dimensional data structures

– HOLAP: (Hybrid)• combination of previous two

Page 39: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

ROLAP

• Cubes can easily be represented in relational tables: special value “all”

Month Prod. Cust. PriceJan p1 c1 10Jan p2 c1 8Jan p1 c2 10Feb p1 c1 9

…all p1 c1 102Jan all c1 18Jan p1 all 1 230all all c1 4 235

…all all all 1 253 458

Page 40: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

ROLAP

• Typical database scheme:– star schema

• fact table is central• links to dimensional tables

– Extensions:• snowflake schema

– dimensions have hierarchy/extra information attached

• Star constellation– multiple star schemas sharing dimensions

Page 41: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Example of a Star Schema

Order NoOrder No

Order DateOrder Date

Customer NoCustomer No

Customer Customer NameName

Customer Customer AddressAddress

CityCity

SalespersonIDSalespersonID

SalespersonNaSalespersonNameme

CityCity

QuotaQuota

OrderNOOrderNO

SalespersonIDSalespersonID

CustomerNOCustomerNO

ProdNoProdNo

DateKeyDateKey

CityNameCityName

QuantityQuantity

Total Price

ProductNOProductNO

ProdNameProdName

ProdDescrProdDescr

CategoryCategory

CategoryDescriptionCategoryDescription

UnitPriceUnitPrice

DateKeyDateKey

DateDate

CityNameCityName

StateState

CountryCountry

OrderOrder

CustomerCustomer

SalespersoSalespersonn

CityCity

DateDate

ProductProduct

Fact TableFact Table

Page 42: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

Example of a Snowflake Schema

Order NoOrder No

Order DateOrder Date

Customer NoCustomer No

Customer Customer NameName

Customer Customer AddressAddress

CityCity

SalespersonIDSalespersonID

SalespersonNaSalespersonNameme

CityCity

QuotaQuota

OrderNOOrderNO

SalespersonIDSalespersonID

CustomerNOCustomerNO

ProdNoProdNo

DateKeyDateKey

CityNameCityName

QuantityQuantity

Total Price

ProductNOProductNO

ProdNameProdName

ProdDescrProdDescr

CategoryCategory

CategoryCategory

UnitPriceUnitPrice

DateKeyDateKey

DateDate

MonthMonth

CityNameCityName

StateState

CountryCountry

OrderOrder

CustomerCustomer

SalespersoSalespersonn

CityCity

DateDate

ProductProduct

Fact TableFact Table

CategoryNaCategoryNameme

CategoryDeCategoryDescrscr

MontMonthh

YearYear YearYear

StateNameStateName

CountryCountry

CategoryCategory

StateState

MonthMonth

YearYear

Page 43: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

branch_keybranch_namebranch_type

time_keydayday_of_the_weekmonthquarteryear

Measures

Branch

Time

item_keyitem_namebrandtypesupplier_key

Item

location_keystreetcityProvince/streetcountry

Location

Sales Fact Table

Avg_sales

Euros_sold

Unit_sold

Location_key

Branch_key

Item_key

Time_key

shipper_keyshipper_namelocation_keyshipper_type

shipper

unit_shipped

Euros_sold

to_location

from_location

shipper_key

Item_key

Time_key

Shipping Fact TableMultiple fact tables share dimension tables

Example of Fact Constellation

Page 44: Advanced Querying OLAP Data Warehousing. Database Applications Transaction processing –Online setting –Supports day-to-day operation of business Decision

SQL 1999 support for OLAP

• see other set of slides