some olap issues cmpt 455/826 - week 9, day 2 jan-apr 2009 – w9d21

25
Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d2 1

Upload: marylou-garrison

Post on 04-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Some OLAP Issues

CMPT 455/826 - Week 9, Day 2

Jan-Apr 2009 – w9d2 1

Page 2: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features To Consider In A Data Warehousing System

(based on Gorla)

Jan-Apr 2009 – w9d2 2

Page 3: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Gaining Acceptance

• Hypothesis:– New technology won’t be utilized effectively if it isn’t accepted

– and acceptance is based on:

• perceived usefulness (PU)– the degree to which a person believes that using a particular

system would enhance his or her job performance

• perceived ease of use (PEU)– the degree to which a person believes that using a particular

system would be free of effort

Jan-Apr 2009 – w9d2 3

Page 4: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Visualization – allows users to create summary tables and charts interactively

– Measures: the presence of • multidimensional tables • multidimensional graphics

Jan-Apr 2009 – w9d2 4

Page 5: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Summarization – the “degree of aggregation” of information

• i.e. supporting directed acyclic graphs within a dimension

– Measures: • the number of hierarchies allowed (in a single dimension)• the level of detail• the capability to swap between summarized and detailed levels

Jan-Apr 2009 – w9d2 5

Page 6: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Navigation – the capability to drill-down or role-up between levels of detail– the capability to get to the information you want

• drill-down is going from using more general to more detailed information in a particular domain (e.g. changing location focus from state to city)

• roll-up is going from using more detailed to more general information in a particular domain (e.g. changing location focus from city to state)

• slicing is selecting certain rows in a table and ignoring the rest• dicing is selecting certain attributes in a table and ignoring the rest

– Measures: • shareability (number of concurrent users allowed)• data navigatability (availability of drill-down, slicing-dicing, and drag-drop

facilities)• the ability to extract detailed and real-time data

Jan-Apr 2009 – w9d2 6

Page 7: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Query Function: – Query engines

• extract data from multidimensional databases and • generate outputs in 3D graphics

• Measures: – using pre-constructed query capability– simple query building with click-select feature – query building with query languages– concurrent run of queries

Jan-Apr 2009 – w9d2 7

Page 8: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Sophisticated Analysis:

– measures:• (six most common types of analyses used in decision support) • statistical profiling

– (e.g. list customers with highest combined sales)

• moving averages• cross dimension comparison

– (e.g. compare product sales by region over a period of time)

• queries with self-defined formula• exception condition• what-if analysis

Jan-Apr 2009 – w9d2 8

Page 9: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Dimensionality

– Measures:• the number of allowable dimensions• capability to redefine a dimension• time for data refresh after redefinition

Jan-Apr 2009 – w9d2 9

Page 10: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP Features

• Performance

– Measures: • (response times for four basic functions)• standard report generation• customized report generation• graphic/chart generation• data navigation

Jan-Apr 2009 – w9d2 10

Page 11: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

An Analysis of Additivity in OLAP Systems

(based on Horner)

Jan-Apr 2009 – w9d2 11

Page 12: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Typical operations

• Roll-up – increases the level of aggregation along one or more classification

hierarchies;

• Drill-down – decreases the level of aggregation along one or more classification

hierarchies;

• Slice-Dice– selects and projects the data;

• Pivoting– reorients the multi-dimensional data view to allow exchanging facts for

dimensions symmetrically; and,

• Merging– performs a union of separate roll-up operations

Jan-Apr 2009 – w9d2 12

Page 13: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Summarization of measures

• The roll-up and merge operations– both use aggregate operators to combine finer-grained

measures into summary data

• But not all fact data – is mathematically summarizable

• In certain instances– using the sum operator to summarize data can result in

inaccurate summary outputs

Jan-Apr 2009 – w9d2 13

Page 14: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Measures

• additive – along a dimension if the sum operator can be used to

meaningfully aggregate values along all hierarchies in that dimension

• fully-additive – if it is additive across all dimensions

• semi-additive – if it is only additive across certain dimensions

• non-additive – if it is not additive across any dimension

Jan-Apr 2009 – w9d2 14

Page 15: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Hierarchies

• A strict hierarchy – is one where each object at a lower level belongs to only one

value at a higher level

• A non-strict hierarchy – can be thought of as a many-to many relationship between a

higher level of the hierarchy and the lower level

– can result in multiple or alternate path hierarchies, whereby the lower object splits into two distinct higher level objects

Jan-Apr 2009 – w9d2 15

Page 16: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Hierarchies

• Alternate and multiple path hierarchies – are important when summarizing measures, and can specifically

present problems when merging data

• An inaccurate summarization can result – if summaries from different paths of the same hierarchy are

merged

• Data cannot be merged – among classification attributes that have overlapping data

instances

Jan-Apr 2009 – w9d2 16

Page 17: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Non-additive measures

• Derived data– Ratios and Percentages– Measures of Intensity– Average/Maximum/Minimum

• Numbers used for other than quantities– Measurements of direction– Codes (& arbitrarily assigned numbers)– Dates and time of day

Jan-Apr 2009 – w9d2 17

Page 18: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

So, What does this all mean?

• It is important for us to be able to consider – how these papers on OLAP and Data Warehousing – relate to the other material – we have covered in this course

Jan-Apr 2009 – w9d2 18

Page 19: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Going back to the beginning

• We discussed a taxonomy of data related concepts:

– Wisdom • is de-contextualized truth, that is always true• is different from raw facts, which are highly context specific

– Knowledge • is context-specific truth, that includes decisions• is the result of applying rules/algorithms to information in a given context

– Information• is processed (extracted, summarized, etc.) data that is useful for making

some decision

– Data• is raw facts, which need not be numbers

Jan-Apr 2009 – w9d2 19

Page 20: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Going further

• Metadata– is data that can be used to understand / process data– can include rules / algorithms– can be implicit (in the data types or structures)– can be explicit (in additional data attributes / tables / databases)

Jan-Apr 2009 – w9d2 20

Page 21: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Going further - Metadata (cont)

• It is always important to consider for each data attribute– the syntax (generally dealt with in a database)– the semantics (often not dealt with explicitly)

• the operations in which it can participate and the TASKS they serve– maintaining the data– analyzing the data (including what queries can use it and how)

• the USERS of the data, who– own it– input / change it– have access to read it

• other important attributes of the attribute– value– privacy– risks– etc.

Jan-Apr 2009 – w9d2 21

Page 22: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Ontologies can help

• Ontologies – contain

• names of concepts • descriptions• rules

– structure concepts • from high level• to individual attributes

– support sharing of information• between the database and a user• across databases / systems / users

Jan-Apr 2009 – w9d2 22

Page 23: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Dimensions can help, too

• Dimensions – organize data conceptually– in directed acyclic graphs

– can support exploration• within a dimension• between multiple dimensions

Jan-Apr 2009 – w9d2 23

Page 24: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

Data Warehouses

• Data Warehouses

– can provide dimensional storage of data to aid in exploration

– don’t rely on traditional normalization and other technology focused techniques

– do rely on techniques such as OLAP to help users explore them

– need a good understanding of the data • so that it can be cleaned before being placed in them

Jan-Apr 2009 – w9d2 24

Page 25: Some OLAP Issues CMPT 455/826 - Week 9, Day 2 Jan-Apr 2009 – w9d21

OLAP

• OLAP– refers to a collection of techniques for exploring data

• in Data Warehouses and other Dimensionally organized systems

– So far the papers have focused on exploration that involves• summarizing, extracting, and processing DATA• producing largely numerical information

– What more should it do?

– What’s needed so that it can do this?

Jan-Apr 2009 – w9d2 25