t.rokayah bayan olap in the data warehouse. chapter objectives review the major features and...
TRANSCRIPT
T.ROKAYAH BAYAN
OLAP IN THE DATA WAREHOUSE
CHAPTER OBJECTIVESReview the major features and functions of
OLAP in detail Grasp the intricacies of dimensional analysis
and learn the meanings of hypercubes,drill-down and roll-up, and slice-and-dice
Examine the different OLAP models and determine which model is suitable for your environment
Consider OLAP implementation by studying the steps and the tools
Introduction OLAP stand for online analytical processing. The term OLAP or online analytical processing was
introduced in a paper entitled “Providing On-Line Analytical Processing to User Analysts,” by Dr. E. F. Codd, the acknowledged “father” of the relational database model.
The paper, published in 1993, defined 12 rules or guidelines for an OLAP system.
As the name implies,OLAP has to do with the processing of data as it is manipulated for analysis.
data warehouse provides the best opportunity for analysis and OLAP is the vehicle for carrying out involved analysis.
In today’s data warehousing environment, with such huge progress in analysis tools from various vendors, you cannot have a data warehouse without OLAP.
OLAP definition:On-Line Analytical Processing (OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user.
DEMAND FOR ONLINE ANALYTICAL PROCESSING
data warehouse is meant for performing substantial analysis using the available data.
The analysis leads to strategic decisions that are the major reasons for building data warehouses in the first place.
For performing meaningful analysis, data must be cast in a way suitable for analysis of the values of key indicators over time along business dimensions.
the traditional methods of analysis provided in a data warehouse are not sufficient and perceive what exactly is demanded by the users to stay competitive and to expand.
Need for Multidimensional Analysis
Example : Imagine a business analyst looking for reasons why
profitability dipped sharply in the recent months in the entire enterprise. The analyst starts this analysis by querying for the overall sales for the last five months for the entire company, broken down by individual months. The analyst notices that although the sales do not show a drop, there is a sharp reduction in profitability for the last three months. The analysis proceeds further when the analyst wants to find out which countries show reductions. The analyst requests a breakdown of sales by major worldwide regions and notes that the European region is responsible for the reduction in profitability. Now the analyst senses that clues are becoming more pronounced and looks for a breakdown of the European sales by individual countries. The analyst finds that the profitability has increased for a few countries, decreased sharply for some other countries, and been stable for the rest.
Figure 15-1 showing the steps through the single analysis session.
Example : discuss How many steps are there? Many steps, but a single analysis session and train of
thought. Each step in this train of thought constitutes a query. The analyst formulates each query, executes it, waits
for the result set to appear on the screen, and studies the result set.
Each query is interactive because the result set from one query forms the basis for the next query.
Did you notice that none of the queries in the above analysis session included any serious calculations?
This is not typical. In a real-world analysis session, many of the queries require calculations, sometimes complex calculations.
OLAP is the Answer the tools being used in the OLTP and basic
data warehouse environments do not match up to the task.
We need different set of tools and products that are specifically meant for serious analysis. We need OLAP in the data warehouse.
guidelines for an OLAP system
The initial twelve guidelines for an OLAP system:Multidimensional Conceptual View:Provide a multidimensional data model that is intuitively analytical and easy to use. Business users’ view of an enterprise is multidimensional in nature. Therefore, a multidimensional data model conforms to how the users perceive business problems.Transparency. Make the technology, underlying data repository, computing architecture, and the diverse nature of source data totally transparent to users. Such transparency, supporting a true open system approach, helps to enhance the efficiency and productivity of the users through front-end tools that are familiar to them.
Accessibility. Provide access only to the data that is actually needed
to perform the specific analysis, presenting a single, coherent, and consistent view to the users. The OLAP system must map its own logical schema to the heterogeneous physical data stores and perform any necessary transformations.
Consistent Reporting Performance. Ensure that the users do not experience any significant
degradation in reporting performance as the number of dimensions or the size of the database increases. Users must perceive consistent run time, response time, or machine utilization every time a given query is run.
guidelines for an OLAP system
Client/Server Architecture. Conform the system to the principles of client/server
architecture for better performance, flexibility, adaptability, and interoperability. Make the server component sufficiently intelligent to enable various clients to be attached with a minimum of effort and integration programming.
Generic Dimensionality. Ensure that every data dimension is equivalent in
both structure and operational capabilities. Have one logical structure for all dimensions. The basic data structure or the access techniques must not be biased toward any single data dimension.
guidelines for an OLAP system
Unrestricted Cross-dimensional Operations. Provide ability for the system to recognize dimensional hierarchies and automatically perform roll-up and drill-down operations within a dimension or across dimensions. Have the interface language allow calculations and data manipulations across any number of data dimensions, without restricting any relations between data cells, regardless of the number of common data attributes each cell contains.
Intuitive Data Manipulation. Enable consolidation path reorientation (pivoting),drill-down and roll-up, and other manipulations to be accomplished intuitively and directly via point-and-click and drag-and-drop actions on the cells of the analytical model. Avoid the use of a menu or multiple trips to a user interface.
guidelines for an OLAP system
Dynamic Sparse Matrix Handling. Adapt the physical schema to the specific analytical
model being created and loaded that optimizes sparse matrix handling. When encountering a sparse matrix, the system must be able to dynamically deduce the distribution of the data and adjust the storage and access to achieve and maintain consistent level of performance.
Multiuser Support. Provide support for end users to work concurrently
with either the same analytical model or to create different models from the same data. In short, provide concurrent data access, data integrity, and access security.
guidelines for an OLAP system
OLAP advantages Enables analysts, executives, and managers to gain useful insights from the
presentation of data. Can reorganize metrics along several dimensions and allow data to be
viewed from different perspectives. Supports multidimensional analysis. Is able to drill down or roll up within each dimension. Is capable of applying mathematical formulas and calculations to measures. Provides fast response, facilitating speed-of-thought analysis. Complements the use of other information delivery techniques such as data
mining. Improves the comprehension of result sets through visual presentations
using graphs and charts. Can be implemented on the Web. Designed for highly interactive analysis
OLAP Applications
Finance: Budgeting, activity-based costing, financial performance analysis, and financial modeling.
Sales: Sales analysis and sales forecasting.
Marketing: Market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation.
Manufacturing: Production planning and defect analysis.
OLAP Key Features
Multi-dimensional views of data.
Support for complex calculations.
Time Intelligence.
Representation of Multi-Dimensional Data OLAP database servers use multi-dimensional
structures to store data and relationships between data.
Multi-dimensional structures are best-visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension.
Representation of Multi-Dimensional Data Multi-dimensional databases are a compact and easy-to-
understand way of visualizing and manipulating data elements that have many inter-relationships.
The cube can be expanded to include another dimension, for example, the number of sales staff in each city.
The response time of a multi-dimensional query depends on how many cells have to be added on-the-fly.
As the number of dimensions increases, the number of cube’s cells increases exponentially.
Lattice of Cuboids
all
time item location supplier
time,item time,location
time,supplier
item,location
item,supplier
location,supplier
time,item,location
time,item,supplier
time,location,supplier
item,location,supplier
time, item, location, supplier
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D cuboids
4-D(base) cuboid
CUBE
sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
dimensions = 3
Multi-dimensional cube:Fact table view:
Aggregates
sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
• Add up amounts for day 1• In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
81
sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
• Add up amounts by day• In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
ans date sum1 812 48
Aggregates
Operators: sum, count, max, min, median, avg
“Having” clauseUsing dimension hierarchy
average by region (within store) maximum by month (within date)
Aggregates
Cube Aggregation
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3sum 67 12 50
sump1 110p2 19
129
. . .
drill-down
rollup
Example: computing sums
Cube Operators
day 1
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3sum 67 12 50
sump1 110p2 19
129
. . .
sale(c1,*,*)
sale(*,*,*)sale(c2,p2,*)
sale(*,p1,*)
c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129
Extended Cube
day 2 c1 c2 c3 *p1 44 4 48p2* 44 4 48
c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81
day 1
*
sale(*,p2,*)
Aggregation Using Hierarchies
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
region A region Bp1 56 54p2 11 8
customer
region
country
(customer c1 in Region A;customers c2, c3 in Region B)
Pivoting
sale prodId storeId date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
Multi-dimensional cube:Fact table view:
c1 c2 c3p1 56 4 50p2 11 8
Cube Aggregates Lattice
city, product, date
city, product city, date product, date
city product date
all
day 2c1 c2 c3
p1 44 4p2 c1 c2 c3
p1 12 50p2 11 8
day 1
c1 c2 c3p1 56 4 50p2 11 8
c1 c2 c3p1 67 12 50
129
use greedyalgorithm todecide whatto materialize
Dimension Hierarchies
all
state
city
cities city statec1 CAc2 NY
Dimension Hierarchies
city, product
city, product, date
city, date product, date
city product date
all
state, product, date
state, date
state, product
state
not all arcs shown...
Interesting Hierarchy
all
years
quarters
months
days
weeks
time day week month quarter year1 1 1 1 20002 1 1 1 20003 1 1 1 20004 1 1 1 20005 1 1 1 20006 1 1 1 20007 1 1 1 20008 2 1 1 2000
conceptualdimension table
Total annual salesof TV in U.S.A.Date
Prod
uct
Cou
ntrysum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
SAMPLE CUBE
Total annual salesof PC in U.S.A.
Total annual salesof VCR in U.S.A.Total Q1 sales
In U.S.ATotal Q1 sales
In CanadaTotal Q1 sales
In Mexico
Total Q1 sales
In all countries
Total Q2 sales
In all countries
Total sales
In U.S.ATotal sales
In CanadaTotal sales
In Mexico
TOTAL SALES
Typical OLAP Operations
Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or
detailed data, or introducing new dimensions Slice and dice:
project and select Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes.
Other operations drill through: through the bottom level of the cube to
its back-end relational tables (using SQL)
SampleOLAP Drill down
onlinereport
Cube Operation Cube definition and computation in OLAP
1. define cube sales[item, city, year]: sum(sales_in_dollars)2. compute cube sales
Transform it into a SQL-like language (with a new operator cube by)
SELECT item, city, year, SUM (amount)FROM SALESCUBE BY item, city, year
Need compute the following Group-Bys (date, product, customer),(date,product),(date, customer), (product, customer),(date), (product), (customer)()
(item)(city)
()
(year)
(city, item) (city, year) (item, year)
(city, item, year)
Roll-up and Drill-down
The roll-up operation performs aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by dimension reduction such that one or more dimensions are removed from the given cube.
Drill-down is the reverse of roll-up. It navigates from less detailed data to more detailed data. Drill-down can be realized by either stepping down a concept hierarchy for a dimension or introducing additional dimensions.
Slice and diceThe slice operation performs a selection on
one dimension of the given cube, resulting in a sub_cube.
The dice operation defines a sub_cube by performing a selection on two or more dimensions.
Food Line Outdoor Line CATEGORY_total Asia 59,728 151,174 210,902
Food Line Outdoor Line CATEGORY_total
Malaysia 618 9,418 10,036
China 33,198.5 74,165 107,363.5
India 6,918 0 6,918
Japan 13,871.5 34,965 48,836.5
Singapore 5,122 32,626 37,748
Belgium 7797.5 21,125 28,922.5
Drill-Down
Drill-Down
Roll-Up
Food Line Outdoor Line CATEGORY_total
Canada 29,116.5 69,310 98,426.5
Mexico 12,743.5 24,284 37,027.5
United States 102,561.5 232,679 335,240.5
Food Line Outdoor Line CATEGORY_total North America 144,421.5 326,273 470,694.5
Roll-Up
Slice
Food Line Outdoor Line CATEGORY_total North America 144,421.5 326,273 470,694.5
992,481690,751301,730REGION_total
470,694.5326,273144,421.5North America
310,884.5213,30497,580.5Europe
210,902151,17459,728Asia
CATEGORY_total
Outdoor Line
Food Line
992,481690,751301,730REGION_total
470,694.5326,273144,421.5North America
310,884.5213,30497,580.5Europe
210,902151,17459,728Asia
CATEGORY_total
Outdoor Line
Food Line
Slicing
Food Line Outdoor Line
Mexico 12,743.5 24,284
United States 102,561.5 232,679
Dice
Food Line Outdoor Line CATEGORY_total
Canada 29,116.5 69,310 98,426.5
Mexico 12,743.5 24,284 37,027.5
United States 102,561.5 232,679 335,240.5
Dicing (Sub-cube)
Other OLAP Operations
o Drill-Across: Queries involving more than one fact tableo Drill-Through: Makes use of SQL to drill through the bottom level of a data cube down to its back-end relational tableso Pivot (rotate): Pivot (also called "rotate") is avisualization operation which rotates the data axes inview in order to provide an alternative presentation ofthe data. Other examples include rotating the axes in a3-D cube, or transforming a 3-D cube into a series of 2-D planes.
Other OLAP Operations
o Moving Averageso Growth Rateso Depreciationo Currency Conversiono Statistical Functionso Top N or Bottom N queries
OLAP Tools - Categories
OLAP tools are categorized according to the architecture used to store and process multi-dimensional data.
There are four main categories of OLAP tools as defined by Berson and Smith (1997) and Pends and Greeth (2001) including: Multi-dimensional OLAP (MOLAP) Relational OLAP (ROLAP) Hybrid OLAP (HOLAP) Desktop OLAP (DOLAP)
CHAPTER SUMMARY OLAP is critical because its multidimensional analysis, fast
access, and powerful calculations exceed that of other analysis methods. OLAP is defined on the basis of Codd’s initial twelve guidelines. OLAP characteristics include multidimensional view of the data,
interactive and complex analysis facility, ability to perform intricate
calculations, and fast response time. Dimensional analysis is not confined to three dimensions that
can be represented by a physical cube. Hypercubes provide a method for representing views with more dimensions.
ROLAP and MOLAP are the two major OLAP models. The difference between them lies in the way the basic data is stored. Ascertain which model is more suitable for your environment.
OLAP tools have matured. Some RDBMSs include support for OLAP.