some olap issues cmpt 455/826 - week 9, day 2 jan-apr 2009 – w9d21
TRANSCRIPT
Some OLAP Issues
CMPT 455/826 - Week 9, Day 2
Jan-Apr 2009 – w9d2 1
OLAP Features To Consider In A Data Warehousing System
(based on Gorla)
Jan-Apr 2009 – w9d2 2
Gaining Acceptance
• Hypothesis:– New technology won’t be utilized effectively if it isn’t accepted
– and acceptance is based on:
• perceived usefulness (PU)– the degree to which a person believes that using a particular
system would enhance his or her job performance
• perceived ease of use (PEU)– the degree to which a person believes that using a particular
system would be free of effort
Jan-Apr 2009 – w9d2 3
OLAP Features
• Visualization – allows users to create summary tables and charts interactively
– Measures: the presence of • multidimensional tables • multidimensional graphics
Jan-Apr 2009 – w9d2 4
OLAP Features
• Summarization – the “degree of aggregation” of information
• i.e. supporting directed acyclic graphs within a dimension
– Measures: • the number of hierarchies allowed (in a single dimension)• the level of detail• the capability to swap between summarized and detailed levels
Jan-Apr 2009 – w9d2 5
OLAP Features
• Navigation – the capability to drill-down or role-up between levels of detail– the capability to get to the information you want
• drill-down is going from using more general to more detailed information in a particular domain (e.g. changing location focus from state to city)
• roll-up is going from using more detailed to more general information in a particular domain (e.g. changing location focus from city to state)
• slicing is selecting certain rows in a table and ignoring the rest• dicing is selecting certain attributes in a table and ignoring the rest
– Measures: • shareability (number of concurrent users allowed)• data navigatability (availability of drill-down, slicing-dicing, and drag-drop
facilities)• the ability to extract detailed and real-time data
Jan-Apr 2009 – w9d2 6
OLAP Features
• Query Function: – Query engines
• extract data from multidimensional databases and • generate outputs in 3D graphics
• Measures: – using pre-constructed query capability– simple query building with click-select feature – query building with query languages– concurrent run of queries
Jan-Apr 2009 – w9d2 7
OLAP Features
• Sophisticated Analysis:
– measures:• (six most common types of analyses used in decision support) • statistical profiling
– (e.g. list customers with highest combined sales)
• moving averages• cross dimension comparison
– (e.g. compare product sales by region over a period of time)
• queries with self-defined formula• exception condition• what-if analysis
Jan-Apr 2009 – w9d2 8
OLAP Features
• Dimensionality
– Measures:• the number of allowable dimensions• capability to redefine a dimension• time for data refresh after redefinition
Jan-Apr 2009 – w9d2 9
OLAP Features
• Performance
– Measures: • (response times for four basic functions)• standard report generation• customized report generation• graphic/chart generation• data navigation
Jan-Apr 2009 – w9d2 10
An Analysis of Additivity in OLAP Systems
(based on Horner)
Jan-Apr 2009 – w9d2 11
Typical operations
• Roll-up – increases the level of aggregation along one or more classification
hierarchies;
• Drill-down – decreases the level of aggregation along one or more classification
hierarchies;
• Slice-Dice– selects and projects the data;
• Pivoting– reorients the multi-dimensional data view to allow exchanging facts for
dimensions symmetrically; and,
• Merging– performs a union of separate roll-up operations
Jan-Apr 2009 – w9d2 12
Summarization of measures
• The roll-up and merge operations– both use aggregate operators to combine finer-grained
measures into summary data
• But not all fact data – is mathematically summarizable
• In certain instances– using the sum operator to summarize data can result in
inaccurate summary outputs
Jan-Apr 2009 – w9d2 13
Measures
• additive – along a dimension if the sum operator can be used to
meaningfully aggregate values along all hierarchies in that dimension
• fully-additive – if it is additive across all dimensions
• semi-additive – if it is only additive across certain dimensions
• non-additive – if it is not additive across any dimension
Jan-Apr 2009 – w9d2 14
Hierarchies
• A strict hierarchy – is one where each object at a lower level belongs to only one
value at a higher level
• A non-strict hierarchy – can be thought of as a many-to many relationship between a
higher level of the hierarchy and the lower level
– can result in multiple or alternate path hierarchies, whereby the lower object splits into two distinct higher level objects
Jan-Apr 2009 – w9d2 15
Hierarchies
• Alternate and multiple path hierarchies – are important when summarizing measures, and can specifically
present problems when merging data
• An inaccurate summarization can result – if summaries from different paths of the same hierarchy are
merged
• Data cannot be merged – among classification attributes that have overlapping data
instances
Jan-Apr 2009 – w9d2 16
Non-additive measures
• Derived data– Ratios and Percentages– Measures of Intensity– Average/Maximum/Minimum
• Numbers used for other than quantities– Measurements of direction– Codes (& arbitrarily assigned numbers)– Dates and time of day
Jan-Apr 2009 – w9d2 17
So, What does this all mean?
• It is important for us to be able to consider – how these papers on OLAP and Data Warehousing – relate to the other material – we have covered in this course
Jan-Apr 2009 – w9d2 18
Going back to the beginning
• We discussed a taxonomy of data related concepts:
– Wisdom • is de-contextualized truth, that is always true• is different from raw facts, which are highly context specific
– Knowledge • is context-specific truth, that includes decisions• is the result of applying rules/algorithms to information in a given context
– Information• is processed (extracted, summarized, etc.) data that is useful for making
some decision
– Data• is raw facts, which need not be numbers
Jan-Apr 2009 – w9d2 19
Going further
• Metadata– is data that can be used to understand / process data– can include rules / algorithms– can be implicit (in the data types or structures)– can be explicit (in additional data attributes / tables / databases)
Jan-Apr 2009 – w9d2 20
Going further - Metadata (cont)
• It is always important to consider for each data attribute– the syntax (generally dealt with in a database)– the semantics (often not dealt with explicitly)
• the operations in which it can participate and the TASKS they serve– maintaining the data– analyzing the data (including what queries can use it and how)
• the USERS of the data, who– own it– input / change it– have access to read it
• other important attributes of the attribute– value– privacy– risks– etc.
Jan-Apr 2009 – w9d2 21
Ontologies can help
• Ontologies – contain
• names of concepts • descriptions• rules
– structure concepts • from high level• to individual attributes
– support sharing of information• between the database and a user• across databases / systems / users
Jan-Apr 2009 – w9d2 22
Dimensions can help, too
• Dimensions – organize data conceptually– in directed acyclic graphs
– can support exploration• within a dimension• between multiple dimensions
Jan-Apr 2009 – w9d2 23
Data Warehouses
• Data Warehouses
– can provide dimensional storage of data to aid in exploration
– don’t rely on traditional normalization and other technology focused techniques
– do rely on techniques such as OLAP to help users explore them
– need a good understanding of the data • so that it can be cleaned before being placed in them
Jan-Apr 2009 – w9d2 24
OLAP
• OLAP– refers to a collection of techniques for exploring data
• in Data Warehouses and other Dimensionally organized systems
– So far the papers have focused on exploration that involves• summarizing, extracting, and processing DATA• producing largely numerical information
– What more should it do?
– What’s needed so that it can do this?
Jan-Apr 2009 – w9d2 25