armazéns de dados
TRANSCRIPT
Armazéns de
DadosOLAP operations
Based on slides by Alejandro Vaisman & Esteban Zimány
Business Intelligence
Contents
• OLAP operations
| 2
Imagem retirada de www.demandsolutions.com a 11/03/2015
OLAP Operations
| 3
OLAP Operations
| 4
OLAP Operations
• Starting cube: quarterly sales (in thousands) by product category and customer cities for 2012
• We first compute the sales quantities by country: a roll-up operation to the Country level along the Customer
dimension
• Sales of category Seafood in France significantly higher in the first quarter
o To find out if this occurred during a particular month, we take cube back to City aggregation level, and drill-down
along Time to the Month level
• To explore alternative visualizations, we sort products by name
• To see the cube with the Time dimension on the x axis, we rotate the axes of the original cube, without changing
granularities → pivoting (see next slide)
• To visualize the data only for Paris → slice operation, results in a 2-dimensional subcube, basically a collection of
time series (see next slide)
• To obtain a 3-dimensional subcube containing only sales for the first two quarters and for the cities Lyon and Paris,
we go back to the original cube and apply a dice operation
| 5
OLAP Operations
| 6
Advanced OLAP Operations
| 7
Advanced OLAP Operations
• To compare sales quantities in 2012 with those in 2011 we need a
cube with the same structure than the one for 2012
• Given two cubes, drill-across builds a new cube with the measures
of both in each cell
• To compute the percentage change of sales between the two years:
Drill-across operator and apply then add measure, which computes a
new value for each cell from the values in the original cube
• We also want to aggregate data in various ways; we start, from the
original cube, computing total sales by quarter and city, using the sum
aggregation operator| 8
Advanced OLAP Operations
• Last operation (bottom right of slide) requires aggregation
• Aggregation functions in OLAP can be:
o Cumulative: compute the measure value of a cell from several
other cells; examples are SUM, COUNT, and AVG
o Filtering: Filters the members of a dimension that appear in the
result; examples are MIN and MAX
─ Filtering functions compute not only the aggregated value, but
also the members of the dimension that belong to the result
• To aggregate measures of a cube at the current granularity
without performing a roll-up
| 9
Advanced OLAP Operations
| 10
Advanced OLAP Operations
• Aggregation
• “Total overall quantity” (not shown in the previous slide) yields a single cell whose coordinates for
the three dimensions equal all
• Aggregation without changing granularity
yielding a cube where only the cells containing the maximum by quarter and city will have
values, the others will be null
o Top two sales by product and city
• Three-month moving average of sales
o ADDMEASURE(Sales, MovAvg = AVG(Quantity) OVER Time 2 CELLS PRECEDING)
• ‘Year-to-date sum”:
o ADDMEASURE(Sales, YTDQuantity = SUM(Quantity) OVER Time ALL CELLS PRECEDING)
o The window contains the current cell and all previous ones (indicated by ALL CELLS
PRECEDING)
| 11
Advanced OLAP Operations
• We go back to the original cube to compute the quarter sales that
amount to 70% of the total sales, applying the top percent aggregation
operator
• Finally, we rank the quarterly sales by category and city
| 12
Advanced OLAP Operations
• Union merges two cubes having the same schema but
disjoint instances.
• Difference removes the cells in a cube that belong to
another one; the two cubes must have the same schema
• Drill-through operation allows to move from data at the
bottom level in a cube to data in the operational systems
from which the cube was derived
o Could be used when trying to determine the reason for
outlier values in a data cube| 13
Advanced OLAP Operations
| 14
“Top two sales” cube
Summarizing OLAP Operations
| 15
Exercises
| 16
Exercises
1. A data warehouse of a telephone provider consists of five dimensions, namely, caller
customer, callee customer, time, call type, and call program, and three measures,
namely, number of calls, duration, and amount.
Define the OLAP operations to be performed in order to answer the following queries.
Propose dimension hierarchies when needed.
a. Total amount collected by each call program in 2012.
b. Total duration of calls made by customers from Brussels in 2012.
c. Total number of weekend calls made by customers from Brussels to customers in
Antwerp in 2012.
d. Total duration of international calls started by customers in Belgium in 2012.
e. Total amount collected from customers in Brussels who are enrolled in the corporate
program in 2012.
| 17
Exercises
2. A data warehouse of a train company contains information about train segments. It consists of six
dimensions, namely, departure station, arrival station, trip, train, arrival time, and departure time, and
three measures, namely, number of passengers, duration, and number of kilometers.
Define the OLAP operations to be performed in order to answer the following queries. Propose
dimension hierarchies when needed.
a. Total number of kilometers made by Alstom trains during 2012 departing from French or Belgian
stations.
b. Total duration of international trips during 2012, that is, trips departing from a station located in a
country and arriving at a station located in another country.
c. Total number of trips that departed from or arrived at Paris during July 2012.
d. Average duration of train segments in Belgium in 2012.
e. For each trip, average number of passengers per segment, that is, take all the segments of each
trip, and average the number of passengers.
| 18
Exercises
3. Consider the data warehouse of a university that contains information about teaching and research
activities. On the one hand, the information about teaching activities is related to dimensions
department, professor, course, and time, the latter at a granularity of academic semester. Measures
for teaching activities are number of hours and number of credits. On the other hand, the information
about research activities is related to dimensions professor, funding agency, project, and time, the
latter twice for the start date and the end date, both at a granularity of day. In this case, professors
are related to the department to which they are affiliated. Measures for research activities are the
number of person months and amount.
Define the OLAP operations to be performed in order to answer the following queries. For this,
propose the necessary dimension hierarchies.
a. By department, total number of teaching hours during the academic year 2012/2013..
b. By department, total amount of research projects during the calendar year 2012.
c. By department, total number of professors involved in research projects during the calendar year
2012.
d. By professor, total number of courses delivered during the academic year 2012/2013.
e. By department and funding agency, total number of projects started in 2012.| 19