business intelligence reminds on data warehousing
TRANSCRIPT
BUSINESS INTELLIGENCE
Reminds on Data Warehousing
(details at the Decision Support Database course)
Data Sciencce & Business Informatics Degree
BI Architecture2
Laboratory of Data Science
Star-schema datawarehouse
A fact table with star-schema dimension tables only
3
time_key
day
day_of_the_week
month
quarter
year
time
branch_key
street
city
province
country
branch
Sales Fact Table
time_key
item_key
employee_key
branch_key
units_sold
dollars_sold
dollars_costMeasures
item_key
item_name
brand
type
supplier_type
item
employee_key
employee_name
employee_type
employee
Laboratory of Data Science
Snowflake-schema datawarehouse
A fact table with star-schema, snowflake and parent-child dimension tables
4
time_key
day
day_of_the_week
month
quarter
year
time
branch_key
street
city_key
branch
Sales Fact Table
time_key
item_key
employee_key
branch_key
units_sold
dollars_sold
dollars_costMeasures
item_key
item_name
brand
type
supplier_key
item
supplier_key
supplier_type
supplier
city_key
city
province
country
cityemp_key
emp_name
emp_boss_key
employees
Laboratory of Data Science
Which DBMS technology for DW?
Storage technology
Architecture
5
Laboratory of Data Science
Storage
RDBMS: record oriented structure
Laboratory of Data Science
6
Columnar: column oriented structure
Advantages:• Faster Scan
• Data Compression (e.g. State)
Storage
Correlation value-based database
Data cells contain the index to an order-set value
In-memory database
Data is stored in compressed format in main memory
Extraction-based system
Storage of attribute extracted from continuous data
flows (eg., web traffic, sensors)
...
Laboratory of Data Science
7
Architecture
Sequential
SQL query processing by a single
processor
Parallel
SQL query plan processing by a
multi-processor machine, with shared
memory
Distributed (Map-reduce)
SQL query processing distributed to
a set of independent machines
◼ Teradata SQL-MR, Hadoop HiveQL
Laboratory of Data Science
8
BI Architecture10
Laboratory of Data Science
K-dimensional cuboid11
Milk Bread … ... Pasta
Jan 13
… ...
Dec 13
Feb 13
Product.Category
Store.City
Time.Month
OrangeRome
PiseFlorence
An hyper-cube with K axes, with a level of some hierarchy at each axis. A
cell of the cuboid contains the values of metrics for the conditions given by
the cell coordinates.
Lucca
Laboratory of Data Science
Cube navigation by different users12
product
Branch manager look at sales
of his/her stores
for any product and any period
Product managers look at sales
of some products
in any period and in any market
Finance manager look at sales
of a period compared to the previous period
for any product and any market
tim
e
Laboratory of Data Science
Cuboids in SQL
13
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key
GROUP BY L.city, I.brand, T.month
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
Laboratory of Data Science
Roll-up
How many cuboids?14
Product
Time
All
Time
Time
Product
All
All
Drill-Down
Roll-up
Drill-Down
Drill-Down
Roll-up
Laboratory of Data Science
15
Data Cube (extended cube, hypercube)
Total annual sales
of TVs in U.S.A.Date
Cou
ntr
y
*
*TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
*
Laboratory of Data Science
Data cube in SQL Server16
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key
GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
GROUP BY ROLLUP(L.city, I.brand, T.month) - all initial subsequences of the group-by attributes
Laboratory of Data Science
Slice and Dice17
Product
Time
Slice
Product
Time
Product =A,B,C
Laboratory of Data Science
Slice in SQL Server18
SELECT L.city, I.brand, T.month, SUM(dollars_sold)FROM fact AS F, location AS L, time AS T, item AS IWHERE F.location_key = L.location_key AND
F.time_key = T.time_key AND F.item_key = I.item_key AND
T.year = 2016GROUP BY CUBE(L.city, I.brand, T.month)
MeasureAggregate
Star-Join
Hierarchy levels
Order or
pivoting
Slice
Laboratory of Data Science