riadh ben messaoud - fsegn analytical... · 2019-10-24 · data warehouses & olap 4 what is...
TRANSCRIPT
Riadh Ben Messaoud
1. The Big Picture2. Data Warehouse Philosophy3. Data Warehouse Concepts4. Warehousing Applications5. Warehouse Schema Design6. Business Intelligence Reporting7. On-Line Analytical Processing8. OLAP Applications9. Data Warehouse Implementation10. Warehousing Software
2Data Warehouses & OLAP
1. The Big Picture2. Data Warehouse Philosophy3. Data Warehouse Concepts4. Warehousing Applications5. Warehouse Schema Design6. Business Intelligence Reporting7. On-Line Analytical Processing8. OLAP Applications9. Data Warehouse Implementation10. Warehousing Software
3Data Warehouses & OLAP
Data Warehouses & OLAP 4
What is OLAP? On-Line Analytical Processing is not a
definition… It gives no help in deciding if a product is an
OLAP tool or not!
Since late 1994, many vendors claim to have OLAP compliant products
It is not possible to rely on the vendors’ own description
Membership of the OLAP council is not a good indicator…
The Codd rules are also an unsuitable way of detecting OLAP compliance…
Data Warehouses & OLAP 5
What is OLAP? Researchers were forced to create their own
definition… It had to be simple, memorable and
product-independent
The FASMI test is one of the most converging definition efforts for detecting OLAP compliance
It defines the characteristics of an OLAP application in a specific way…
FASMIFast Analysis of Shared Multidimensional
Information
Data Warehouses & OLAP 6
FAST The system is targeted to deliver responses to
users within about 5 seconds◦ Simplest analysis ~ no more than 1 second◦ Most complicated analysis ~ no more than 20
seconds
End-users assume that a process has failed if results are not received within 30 seconds
Unless the system warns that the report will take longer time, the user will hit “Alt+Ctrl+Delete”
Data Warehouses & OLAP 7
FAST The OLAP response speed is not easy to
achieve… … especially when on-the-fly and ad hoc
calculations are required
Vendors resort to many techniques to achieve this goal:◦ Specialized forms of data storage,◦ Extensive pre-calculations,◦ Specific hardware requirements.
Data Warehouses & OLAP 8
FAST None of the existent products is fully
optimized … an area of developing technology◦ The full pre-calculation approach fails with large and
sparse data◦ Doing everything on-the-fly is much too slow with
large data
According to surveys, slow query response is consistently the most often-cited technical problem with OLAP product …
Data Warehouses & OLAP 9
ANALYSIS
The system can cope with any business logic and statistical analysis relevant for the application and the user
In some OLAP product some pre-programming may be needed…◦ Without having to program, it is necessary to
allow the user to define new ad hoc calculations
Data Warehouses & OLAP 10
ANALYSIS Analysis could include specific features like:◦ Time series analysis,◦ Cost allocations,◦ Currency translation,◦ Goal seeking,◦ Ad hoc multidimensional structural changes,◦ Non-procedural modeling,◦ Exception alerting,◦ Data mining.
These capabilities differ between products, depending on their target markets
Data Warehouses & OLAP 11
SHARED The system implements all the security
requirements for confidentiality If multiple write access is needed,
concurrent update locking at an appropriate level should be implemented
The system should be able to handle multiple updates in a timely and secure manner
This is a major area of weakness in many OLAP products
… assuming that OLAP applications will be read-only
Even products with multi-user read-write have crude security models
Data Warehouses & OLAP 12
MULTIDIMENSIONAL Is the key requirement for all OLAP
applications
The system must provide a multidimensional conceptual view including:◦ Full support for hierarchies◦ Multiple hierarchies
… This is the most logical way to analyze businesses and organizations
Data Warehouses & OLAP 13
INFORMATION Is all of the data and derived information
needed, wherever it is and however much is relevant for the application
The capacity if handling data differ between OLAP products◦ The largest OLAP products can hold at least a
thousand times as much as the smallest
Many considerations must be taking:◦ Data duplication, RAM required, disk space utilization,
performance, integration with DWs
Data Warehouses & OLAP 14
The FASMI test is a reasonable and understandable definition of the goals OLAP is meant to achieve
Researches encourage users and vendors to adopt this definition, which we hope will avoid the controversies of previous attempts
Data Warehouses & OLAP 15
The Codd rules In 1993, Codd et al. published a white paper
“Providing OLAP to User-Analysts: An IT Mandate”
Codd was very well known as a respected database researcher from the 1960s till the late 1980s
He is credited with being the inventor of the relational database model in 1969
Unfortunately, his OLAP rules proved to be controversial due to being vendor-sponsored, rather than mathematically based
Data Warehouses & OLAP 16
The Codd rules The OLAP white paper included 12 rules,
which are now well known They were followed by another 6 rules in
1995
Codd restructured the rules into four groups, calling them “features”◦ Basic Features◦ Special Features◦ Reporting Features◦ Dimension Control
Data Warehouses & OLAP 17
The Codd rulesBasic Features
1. Multidimensional Conceptual View◦ Few would argue with this feature◦ Codd believes this to be the central core of OLAP◦ Codd included “slice and dice” as part of this
requirement
Data Warehouses & OLAP 18
The Codd rulesBasic Features
2. Intuitive Data Manipulation◦ Data manipulation through direct actions on
cells in the view◦Without recourse to menus or multiple
actions, we assume that this is by using a mouse
◦ Many products fail on this, because they do not necessarily support double clicking or drag and drop
Data Warehouses & OLAP 19
The Codd rulesBasic Features
3. Accessibility: OLAP as a Mediator◦ OLAP engines are considered as middleware, sitting
between heterogeneous data sources and an OLAP front-end
◦ Most products can achieve this, but often with more data staging and batching than vendors like to admit
Data Warehouses & OLAP 20
The Codd rulesBasic Features
4. Batch Extraction vs Interpretive◦ This rule effectively required that products offer
both their own staging database for OLAP data as well as offering live access to external data
◦ Only a minority of OLAP products properly comply with it
Data Warehouses & OLAP 21
The Codd rulesBasic Features
5. OLAP Analysis Models◦ Codd required that OLAP products should support
all four analysis models : Categorical: parameterized static reporting ~ All
OLAP tools Exegetical: slicing and dicing with drill down ~ All
OLAP tools Contemplative: « what if? » analysis ~ Most OLAP
tools Formulaic: goal seeking models ~ Very few OLAP
tools
Data Warehouses & OLAP 22
The Codd rulesBasic Features
6. Client/Server Architecture◦ The OLAP server component of an OLAP product
should be sufficiently intelligent that various clients could be attached with minimum effort and programming for integration◦ Relatively few OLAP products are qualified for
this test◦ A very tough test
◦ What the Web would deliver on this issue?◦ What XML would deliver on this issue?
Data Warehouses & OLAP 23
The Codd rulesBasic Features
7. Transparency◦ This test, dealing with openness, is also a tough
but valid one◦ A spreadsheet user should be able to get full
values from an OLAP engine and not even be aware of where the data comes from◦ OLAP products must allow live access to
heterogeneous data sources from a full function spreadsheet add-in, with the OLAP server engine in between◦ A very few products that do fully comply with
the test
Data Warehouses & OLAP 24
The Codd rulesBasic Features
8. Multi-User Support◦ OLAP tools must provide concurrent access
(retrieval and update), integrity and security◦ Many OLAP applications are still read-only
◦ However, almost all vendors claim compliance!!!
Data Warehouses & OLAP 25
The Codd rulesSpecial Features
9. Treatment of Non-Normalized Data◦ Refers to the integration between an OLAP engine
and denormalized source data◦ Any data updates performed in the OLAP
environment should not be allowed to alter stored denormalized data in feeder systems◦ Data changes should not be allowed in what are
normally regarded as calculated cells within the OLAP database
Data Warehouses & OLAP 26
The Codd rulesSpecial Features
10. Storing OLAP Results: Keeping them Separate from Source Data◦ This is really an implementation rather than a
product issue◦ But few would disagree with it◦ Read-write OLAP applications should not be
implemented directly on live transaction data◦ OLAP data changes should be kept distinct from
transaction data◦ The method of data write-back used in Microsoft
Analysis Services is the best implementation of this
Data Warehouses & OLAP 27
The Codd rulesSpecial Features
11. Extraction of Missing Values◦ All missing values are cast in the uniform
representation defined by the Relational Model◦ Missing values are to be distinguished from
zero values◦ A few OLAP tools do break this rule
Data Warehouses & OLAP 28
The Codd rulesSpecial Features
12. Treatment of Missing Values◦ All missing values are to be ignored by the OLAP
analyzer regardless of their source◦ This is an almost inevitable consequence of how
multidimensional engines treat all data
Data Warehouses & OLAP 29
The Codd rulesReporting Features
13. Flexible Reporting◦ The dimensions can be laid out in any way that the
user requires in reports◦ Most products are capable of this in their
formal report writers◦ It is preferable that analysis and reporting facilities
be combined in one module
Data Warehouses & OLAP 30
The Codd rulesReporting Features
14. Uniform Reporting Performance◦ Reporting performance be not significantly
degraded by increasing the number of dimensions or database size◦ There are differences between products◦ The principal factor that affects performance is the
degree to which the calculations are performed in advance and where live calculations are done
Data Warehouses & OLAP 31
The Codd rulesReporting Features
15. Automatic Adjustment of Physical Level◦ OLAP system must adjust its physical schema
automatically to adapt to the type of model, data volumes and sparsity
◦ Most vendors fall far short of this noble ideal◦ Since 1996, users can benefit from it in
Microsoft Analysis Services
Data Warehouses & OLAP 32
The Codd rulesDimension Control
16. Generic Dimensionality◦ Each dimension must be equivalent in both its
structure and operational capabilities◦ This has proven to be one of the most controversial
Codd’s rules
◦ With a strictly purist interpretation, few products fully comply◦ If you are buying a product for a specific
application, you may safely ignore the rule
Data Warehouses & OLAP 33
The Codd rulesDimension Control
17. Unlimited Dimensions & Aggregation Levels◦ Technically, no product can possibly comply
with this feature◦ There is no such thing as an unlimited entity on a
limited computer◦ Few applications need more than about eight or
ten dimensions◦ Few hierarchies have more than about six
consolidation levels
◦ In practice, you can probably ignore this requirement.
Data Warehouses & OLAP 34
The Codd rulesDimension Control
18. Unrestricted Cross-dimensional Operations◦ All forms of calculation must be allowed across all
dimensions, not just the “measures” dimension◦ Many products which use only relational
storage are weak in this area
◦ These types of calculations are important if you are doing complex calculations
Data Warehouses & OLAP 35
OLAP Milestones
Data Warehouses & OLAP 36
OLAP Milestones
Data Warehouses & OLAP 37
OLAP Milestones
Data Warehouses & OLAP 38
OLAP Milestones
Data Warehouses & OLAP 39
OLAP Milestones
Data Warehouses & OLAP 40
OLAP Milestones
Data Warehouses & OLAP 41
OLAP Milestones
Data Warehouses & OLAP 42
OLAP Milestones
Data Warehouses & OLAP 43
OLAP Milestones
Data Warehouses & OLAP 44
OLAP Milestones
Data Warehouses & OLAP 45
OLAP Milestones
Data Warehouses & OLAP 46
OLAP Milestones
Data Warehouses & OLAP 47
OLAP Milestones
Data Warehouses & OLAP 48
OLAP Milestones