agenda common terms used in the software of data warehousing and what they mean. difference between...

22
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in how each are optimised. What is a cube and what are dimensions? High level overview of Performance Point Difference between a score card and a dashboard How do the data warehouse, cube and Performance Point relate to one another? At which point and how should calculated fields be added. The purpose and definition of Fact Tables, Dimension Tables etc. Quantifiable benefits organisations achieve through data warehousing

Upload: eugene-richardson

Post on 24-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Agenda

• Common terms used in the software of data warehousing and what they mean.

• Difference between a database and a data warehouse - the difference in how each are optimised.

• What is a cube and what are dimensions? • High level overview of Performance Point • Difference between a score card and a dashboard • How do the data warehouse, cube and Performance Point relate to one

another? • At which point and how should calculated fields be added. • The purpose and definition of Fact Tables, Dimension Tables etc. • Quantifiable benefits organisations achieve through data warehousing

Page 2: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Data Warehouse vs Transaction Database

• Transaction Database– Handles day-to-day activities

• Takes Orders• Manages Production• Ships Orders• Runs Accounts• Changes frequently (every hour, minute, second)

• Data Warehouse– Handles Planning

• Looks at historical patterns of Sales• Shows trends in demand and production• Remains mainly static

– New data is added and/or corrections made infrequently

Page 3: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Data Warehouse Overview

Operational Source

Systems

Extract

Data Staging

Area

Services:Clean, combine and standardiseConform dimensionsNO USER QUERY SERVICES

Data Store:Flat Files and Relational Tables

Processing:Sorting and sequential processing

Data Presentation

Area

Data Mart 1DIMENSIONALAtomic and Summary Data.Based on a single business process

Extract

Extract

DW Bus:Conformed Facts and Dimensions

Data Mart 2,3, etc

Data Access Tools

Ad Hoc Query Tools

Report Writers

Analytic and Modelling Applications

SQLMDXDMXExcel

Reporting ServicesReport Builder

Analysis ServicesPerformancePoint

Access

Access

Load

Load

Page 4: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

A Data Warehouse

Data Profiler

Source Systems

Corrections

ETL Staging Tables

DQ &

ETL

Control & AuditMetadata

Data Quality

DDS

Reports

Name Description

Data Profiler Analyses number of rows in tables, how many rows contain nulls, etc

Metadata Database containing info about the data structure, data meaning, DQ rules, etc

ETL Extract, Transform and Load process

MDB Multi Dimensional Database

MDB/Cubes

Pivot Tables

Ad Hoc Queries

Spreadsheets

Reports

Data Mining

Dashboard

Analytics

Reports

Scorecards

Other BI Apps

Page 5: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Cubes

The Data WarehouseUsing an Enterprise Data Warehouse

Data Profiler

Source Systems

Corrections

ETL Staging Tables

DQ &

ETL

Control & AuditMetadata

Data Quality

EDW ETL

ETL

DDS

DDS

BI Apps

Finance Apps

CRM Apps

Reports

Name Description

Data Profiler Analyses number of rows in tables, how many rows contain nulls, etc

Metadata Database containing info about the data structure, data meaning, DQ rules, etc

ETL Extract, Transform and Load process

EDW Enterprise Data Warehouse

Page 6: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

EXAMPLE OF A MULTI DIMENSIONAL DATABASE

Page 7: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

What is a Multi Dimensional Database?

• Consider a sales operation:– We know that last year our total Widget Sales were 53,853– How were those sales broken down?

• Broken down by Quarter:

Q1 Q2 Q3 Q4 TotalSales 8288 16148 18501 10916 53853

But we need more detail – What were the sales of Left, Right and Ambidextrous Widgets

Page 8: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Widget Sales in more detail

Q1 Q2 Q3 Q4

Total Sales 8288 16148 18501 10916 53853

Left Handed Widgets 660 740 794 911

Right Handed Widgets 6128 6509 7707 8342

Ambidextrous Widgets 1500 1650 1499 1663

But we also need to know the sales by area:

Page 9: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Widget Sales in great detailQ1 Q2 Q3 Q4

Sales 8,278 16,148 18,501 10,916 53,853

Left Handed Widgets 650 740 794 911

England 300 330 355 461

Scotland 200 235 260 261

Wales 150 165 179 181

NI 10 8

Right Handed Widgets 6,128 6,509 7,707 8,342

England 2,301 2,565 3,412 3,987

Scotland 1,387 1,454 1,550 1,651

Wales 540 600 765 690

NI 1,900 1,890 1,980 2,014

Ambidextrous Widgets 1,500 1,650 1,499 1,663

England 799 808 789 901

Scotland 400 501 367 460

Wales 300 341 320 299

NI 1 23 3

Page 10: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

The CubeQ1 Q2 Q3 Q4Sales 8,278 16,148 18,501 10,916 53,853

Left Handed Widgets 650 740 794 911

England 300 330 355 461Scotland 200 235 260 261Wales 150 165 179 181NI 10 8

Right Handed Widgets 6,128 6,509 7,707 8,342

England 2,301 2,565 3,412 3,987Scotland 1,387 1,454 1,550 1,651Wales 540 600 765 690NI 1,900 1,890 1,980 2,014

Ambidextrous Widgets 1,500 1,650 1,499 1,663

England 799 808 789 901Scotland 400 501 367 460Wales 300 341 320 299NI 1 23 3

4 labels

3 labels

4 labels

• This structure can hold a certain number of data elements. • The number of elements is the total number of separate labels multiplied together• i.e this structure can hold 4 x 3 x 4 data elements. (= 48)• Which makes it look a lot like a cube…• That’s as far as the cube analogy can go, because a real data warehouse will have many different sets of independent labels – They are called Dimensions

Page 11: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Dimension Tables

• Dimension Tables contain the names of each member of the dimension:Product_ID Product_Name Category

101 Left Handed Widget Retail

102 Right Handed Widget Retail

103 Ambidextrous Widget Specialist

Primary Key

Page 12: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Fact Table

Region_ID Product_ID Quarter Units Price

1 101 1 300 45.20

1 101 2 330 45.20

1 101 3 355 45.20

1 101 4 461 44.00

1 102 1 200 39.00

1 102 2 235 39.00

1 102 3 260 38.50

1 102 4 261 38.50

Page 13: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Fact Table & Dimension Table Relationship

Region_ID Product_ID Quarter Units Price

1 101 1 300 45.20

1 101 2 330 45.20

1 101 3 355 45.20

1 101 4 461 44.00

1 102 1 200 39.00

1 102 2 235 39.00

1 102 3 260 38.50

1 102 4 261 38.50

Product_ID Product_Name

101 Left Handed Widget

102 Right Handed Widget

103 Ambidextrous Widget

One-to-Many Relationship

Page 14: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

• Normalised Data Structure– Structure designed for handling live transactions

• Dimensional Data Structure– AKA Denormalised Data Structure– Structure designed for querying

• Operational Data Store– Often a copy of a transactional database– Updated regularly from transactional systems– May be used for reporting

Common terms used in data warehousing and what they mean - 1

Page 15: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Common terms used in data warehousing and what they mean - 2

• Dimensional Modelling– Fact Table or Measure Table

• Holds historical records of events that occurred in a transactional system– Conformed Facts

• Facts from multiple fact tables are conformed when the technical definitions of the facts are equivalent. Conformed facts can have the same name in different tables and can be combined and compared mathematically

– Dimension Table• Has a number of Attributes, e.g. Product Name, Category, Colour, etc• Used to slice and dice the data in the Fact Table

– Attribute• Property of a Dimension

– Conformed Dimension• Dimensions are conformed when the are exactly the same (including the keys) or

one is a perfect subset ot the other• The row headers produced in answer sets from two different conformed

dimensions must be able to be matched perfectly

Page 16: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Conformed Dimensions - Example

Business Processes

Common Dimensions

Date

Product

Store

Promotion

Warehouse

Vendor

Contract

Shipper

Retail Sales x x x x

Retail Inventory x x x

Retail Deliveries x x x

Warehouse Inventory x x x x

Warehouse Deliveries x x x x

Purchase Orders x x x x x x

Page 17: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Facts and Dimensions - Example

Page 18: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Common terms used in data warehousing and what they mean - 3

• Slowly Changing Dimension (SCD)– A Dimension where the rows change slowly over time. An example would be a

product Dimension where the Price attribute changes from year to year as a result of marketing/profitability issues.

• Type 1 SCD– Values are overwritten when they change

• Type 2 SCD– A new row is written when the value of an attribute changes

• Type 3 SCD– The previous value is put into an “Old Value” column

• Data Mart– A logical and physical subset of the data warehouse’s presentation area– Data Marts can be tied together using Drill-Across queries when their

dimensions are conformed

Page 19: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Common terms used in data warehousing and what they mean - 4

• Primary Key– Unique Identifier for a record

• Foreign Key– A value in a record that refers to a Primary Key in another table

• Surrogate Key – AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key– A new primary key that is created in a table to ensure uniqueness regardless of the source of new

records.• E.g. Two Customer tables in different sources may both have a primary key on CustomerID. This means that

the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys

• Grain– The meaning of a single row in a table. The grain of a fact table represents the most atomic level by

which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store“. Each record in this fact table is therefore uniquely defined by a day, product and store. In this case you would not be able to look at sales by the hour, nor could you look at individual sales

• Granularity– The level of detail captured in a data warehouse.

Page 20: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Surrogate Key

• Surrogate Key (AKA Meaningless key, integer key, nonnatural key, artificial key, synthetic key)– Data Warehouses integrate data from multiple sources and therefore they

can’t rely upon an application key in one table being different from another application key in another table in another database.

– A new primary key that is created in a table to ensure uniqueness regardless of the source of new records.

– Surrogate keys can be integers even if the application key isn’t • This saves space• e.g. Two Customer tables in different sources may both have a primary key on

CustomerID. This means that the same CustomerID could relate to two totally different customers, depending on which source they came from. So when the records are added to a Dimensional Data Warehouse, a new Primary Key is added which has no relationship to the sources’ primary keys

• e.g Data changes over time. As an example, if the price of Left Handed Widgets is increased from 45.20 to 47.90, we need to keep the old data and add new data. Therefore we need a key that doesn’t depend solely upon the product ID

Page 21: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Star Schema

Page 22: Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in

Snowflake Schema

• Star• Snowflake