designing the data warehouse / data mart

45
Designing the data warehouse / data mart Methodologies and Techniques

Upload: flavio

Post on 13-Feb-2016

83 views

Category:

Documents


2 download

DESCRIPTION

Designing the data warehouse / data mart. Methodologies and Techniques. Basic principles. Warehouse Database. Life cycle of the DW. First time load. Operational Databases. Refresh. Refresh. Purge or Archive. Refresh. Data transfers into a database. First time system implementation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Designing the data warehouse / data mart

Designing the data warehouse/ data mart

Methodologies and Techniques

Page 2: Designing the data warehouse / data mart

Mainframe and non-mainframe data sources

Extraction, Transformation and Loading Tool - for data quality and integrityUsing flat-files or a staging area

Relational Database on a dedicated Server:Normalised, Snowflake or Star data schema

Static/Flexible Reports:Batch and on-line reporting environment

Data Mining:Using an inductive logic engine to identify correlations and hidden relationships

Analytical Tool:Interactive analysis -OLAP tool

User Interface Deployment:Web deployment, Power-user client interfacesNeed to provide rich presentation visualisation, user interaction and supplementary documentation

Scrutinising MultidimensionalSummary Cubes

Surveying Transaction-level

Data

Showing Reportsand

Analytical Results

Data Warehouse Store

Source Systems

Discovering Surprise

Relationships

Data Staging for Cleansing

Basic principles

Page 3: Designing the data warehouse / data mart

Life cycle of the DW

Operational DatabasesOperational Databases Warehouse DatabaseWarehouse Database

First time loadFirst time load

RefreshRefresh

RefreshRefresh

RefreshRefresh

Purge or ArchivePurge or Archive

Page 4: Designing the data warehouse / data mart

Data transfers into a database

• First time system implementation– From a manual system

• Data warehousing projects• Database version upgrade• ERP projects• Migration

– From old to new system

Page 5: Designing the data warehouse / data mart

Data transfers between systems

• Dynamic data (eg. sales orders)– Interface required?

• Static data (eg. customers)– Conversion required?

Page 6: Designing the data warehouse / data mart

What can go wrong• Data not available

– feature activated from implementation onwards– Massive data entry– Eg: different account structure

• Data incomplete• Data inconsistent (eg: engineering vs accounts)• Wrong level of granularity• Data not clean• New system requires changes – new product

codes

Page 7: Designing the data warehouse / data mart

Data cleaning must address

• Different department record same info under different codes

• Multiple records of same company (under different names)

• Fields missing in input tables (eg: c/o)• Different depts. Record different

addresses for same customer• Use of different units for time periods

Page 8: Designing the data warehouse / data mart

Labour intensive tasks• Data entry• Data checks• Working on solving conflicts• Allocating new codes

• Solution = introduce as much automation as possible– SQL / SQL loader (Oracle)– Custom conversion programmes to extract, modify and upload

data– Filtering– Parsing (eg: excel)– Staging areas for conversion in progress

Page 9: Designing the data warehouse / data mart

Data utilities

• ORACLE is king of data handling• Export: to transfer data between DBs

– Extract both table structure and data content into dump file

• Import: corresponding facility• SQL*loader automatic import from a

variety of file formats into DB files– Needs a control file

Page 10: Designing the data warehouse / data mart

Control files: using SQLloader• Data tranfers in and out of DB can be automated

using the loader– Create a data file with the data(!)– Create a control file to guide the operation

• Load creates two files– Log file– “bad transactions” file

• Also a discard file if control file has selection criteria in it

Page 11: Designing the data warehouse / data mart

Example 1 – the supplier file

Sup code Sup name Sup address City Phone4 digits

OLD

New supplier code to include city where firm is basedAssignation of category based on amounts purchased

Page 12: Designing the data warehouse / data mart

Example 1 – the supplier file

Sup code Sup name Sup address City Phone4 digits

Sup code Sup name Sup address… Phone Cat3 letters + 1,2,3 depending4 digits on total purchases

last year

OLD

NEW

New supplier code to include city where firm is basedAssignation of category based on amounts purchased

Page 13: Designing the data warehouse / data mart

Example 2 – New Cost Accounting Structure

Maintenance department expenditure:1 account => separate accounts for different production activities

Intervention code Desc. Date LabourParts Total

OLD

Page 14: Designing the data warehouse / data mart

Example 2 – New Cost Accounting Structure

Maintenance department expenditure:1 account => separate accounts for different production activities

Intervention code Desc. Date LabourParts Total

Intervention code Desc. Date labour Parts Total Account

OLD

NEW

Page 15: Designing the data warehouse / data mart

Example 3: merging files

• Complete customer file based on Accounts and Sales and Shipping

OLD (finance)

CustID name address city account number credit limit balance

OLD (sales)

OLD (Shipping)

CustID* name address city discount rates sales_to_date rep_name

CustID** name address city Preferred haulier

Page 16: Designing the data warehouse / data mart

Example 4: change of business practices

• Payment by bank draft for international customers• Automatic payment into account for national customers

• Payment direct into account for all customers

Page 17: Designing the data warehouse / data mart

Data Staging Area

• The construction site for the warehouse• Required by most scenarios• Connected to wide variety of sources• Clean / aggregate / compute / validate

data

ExtractExtractTransformTransform

Operationalsystem TransportTransport

(Load)(Load)

WarehouseData

stagingarea

Page 18: Designing the data warehouse / data mart

Remote Staging ModelData staging area within the warehouse environment

Extract,Extract,transform,transform,transporttransport

TransformTransform

Operationalsystem TransportTransport

(Load)(Load)

Datastaging

area Warehouse

Warehouse environmentWarehouse environmentOper. envt.Oper. envt.

Data staging area in its own environment, avoiding negative impact on the warehouse environment

Extract,Extract,transform,transform,transporttransport

TransformTransform

Operationalsystem TransportTransport

(Load)(Load)

Datastaging

areaWarehouse

Staging envt.Staging envt.Oper. envt.Oper. envt. Warehouse envt.Warehouse envt.

Page 19: Designing the data warehouse / data mart

Onsite Staging Model

ExtractExtractTransformTransform

Operationalsystem TransportTransport

(Load)(Load)

Datastaging

area Warehouse

Operational environmentOperational environment WH envt.WH envt.

Data staging area within the operational environment,

possibly affecting the operational system

Page 20: Designing the data warehouse / data mart

Data Mart • A subset of a data warehouse that

supports the requirements of a particular department or business function.

• Characteristics include:– Do not normally contain detailed operational data

unlike data warehouses.– May contain certain levels of aggregation

Page 21: Designing the data warehouse / data mart

MarketingSales

FinanceHuman Resources

Dependent Data Mart

Data Data WarehouseWarehouse

Data MartsData Marts

External DataExternal Data

Flat FilesFlat Files

Operational Systems Marketing

Sales

Finance

Page 22: Designing the data warehouse / data mart

Independent Data Mart

Sales or MarketingSales or Marketing

External DataExternal Data

Flat FilesFlat FilesOperational Systems

Page 23: Designing the data warehouse / data mart

Reasons for Creating a Data Mart

• To give users more flexible access to the data they need to analyse most often.

• To provide data in a form that matches the specific needs of a group of users

• To improve end-user response time.• Potential users of a data mart are clearly

defined and can be targeted for support

Page 24: Designing the data warehouse / data mart

Reasons for Creating a Data Mart

• To provide appropriately structured data as dictated by the requirements of the end-user access tools.

• Building a data mart is simpler (and much quicker) compared with establishing a corporate data warehouse.

• The cost of implementing data marts is far less than that required to establish a data warehouse.

Page 25: Designing the data warehouse / data mart

Exploiting the DW data

• DW is a platform for creating a wide array of reports

• It solves data feed problems, but does not lead to specific decision support

• Need a model for organising data into meaningful reports

• Need specific interfaces for users

Page 26: Designing the data warehouse / data mart

ExtractionCleaningTransformationLoading

Relational Databaseon a dedicated Server

De normalised, data

Static Reporting

Scrutinising

MultidimensionalData CubesOLAP tools

Data Warehouse

Source Systems

Discovering

Data Mining…….

Data StagingArea

Exploiting the DW data

Page 27: Designing the data warehouse / data mart

Multidimensional Models

The data is found at the intersection of dimensions.

ProductProduct

P/L_LineP/L_Line

TimeTime

FINANCE

MarketMarket

ProductProduct

TimeTime

SALES

CustomerCustomer

Page 28: Designing the data warehouse / data mart

Representing multidimensional data

Page 29: Designing the data warehouse / data mart

MOLAP Server• The application layer

stores data in a multidimensional structure

• The presentation layer provides the multidimensional view MOLAP

Engine

DSS clientDSS client

Application Application layerlayer

WarehouseWarehouse

• Efficient storage and processing• Complexity hidden from the

user (but NOT from developer)• Analysis using preaggregated

summaries and precalculated measures

Page 30: Designing the data warehouse / data mart

ROLAP Server

• The warehouse stores atomic data.

• The application layer generates SQL for the three- dimensional view.

• The presentation layer provides the multidimensional view.

ROLAPengine

DSS clientDSS client

Application Application layerlayer

WarehouseWarehouseserverserver

Multiple Multiple SQLSQL

Page 31: Designing the data warehouse / data mart

MOLAP

ServerServer useruserWarehouseWarehouse

QueryQuery

DataData

MDDBMDDB

PeriodicPeriodicloadload

Page 32: Designing the data warehouse / data mart

ROLAP

ServerServer useruserWarehouseWarehouse

DataDatacachecache

LiveLivefetchfetch

CacheCache

QueryQuery

DataData

Also Hybrid (HOLAP)

Page 33: Designing the data warehouse / data mart

Choosing a Reporting Architecture

• Business needs• Potential for growth• interface• enterprise architecture• Network architecture• Speed of access• Openness

MOLAP

ROLAP

SimpleSimple ComplexComplex

QueryQueryPerformancePerformance

GoodGood

OKOK

AnalysisAnalysis

Page 34: Designing the data warehouse / data mart

Modeling• Warehouses differ from operational Warehouses differ from operational

structures: structures: – Analytical requirementsAnalytical requirements– Subject orientationSubject orientation

• Data must map to subject oriented Data must map to subject oriented information:information:– Identify business subjectsIdentify business subjects– Define relationships between subjectsDefine relationships between subjects– Name the attributes of each subjectName the attributes of each subject

• Modeling is iterativeModeling is iterative• Modeling tools are availableModeling tools are available

Page 35: Designing the data warehouse / data mart

1. Defining the business model 2. Creating the dimensional

model 3. Modeling summaries4. Creating the physical model

Physical model

11

2, 32, 3

44

Select a business process

Modeling the Data Warehouse

Page 36: Designing the data warehouse / data mart

Identifying Business Rules

Product

Type Monitor Status PC 15 inch NewServer 17 inch Rebuilt

19 inch CustomNone

Location

Geographic proximity

0 - 1 miles1 - 5 miles > 5 miles

StoreStore > District > Region

TimeMonth > Quarter > Year

Page 37: Designing the data warehouse / data mart

Creating the Dimensional ModelIdentify fact tables– Translate business measures into fact

tables– Analyze source system information for

additional measures– Identify base and derived measures– Document additivity of measures

Identify dimension tablesLink fact tables to the dimension tablesCreate views for users

Page 38: Designing the data warehouse / data mart

Dimension TablesDimension tables have the following

characteristics: • Contain textual information that

represents the attributes of the business• Contain relatively static data• Are joined to a fact table through a

foreign key reference Product Channel

Facts(units,price)

Customer Time

Page 39: Designing the data warehouse / data mart

Fact TablesFact tables have the following characteristics:• Contain numeric measures (metrics) of the

business• May contain summarized (aggregated) data• May contain date-stamped data• Are typically additive• Have key value that is typically a concatenated

key composed of the primary keys of the dimensions

• Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

Page 40: Designing the data warehouse / data mart

Dimensional Model (Star Schema)

Product Channel

Facts(units,price)

Customer Time

Dimension tablesDimension tables

Fact tableFact table

Page 41: Designing the data warehouse / data mart

Star Schema Model

• Central fact table• Radiating dimensions• Denormalized model

Store TableStore_idDistrict_id...

Item TableItem_idItem_desc...

Time TableDay_idMonth_idPeriod_idYear_id

Product TableProduct_idProduct_desc…

Sales Fact TableProduct_idStore_idItem_idDay_idSales_dollarsSales_units...

Page 42: Designing the data warehouse / data mart

Star Schema Model

• Easy for users to understand• Fast response to simple queries• Simple metadata• Supported by many front end tools• Less robust to change• Does not support history

Page 43: Designing the data warehouse / data mart

Using Summary Data

• Provides fast access to precomputed data

• Reduces use of I/O, CPU, and memory• Is distilled from source systems and

precalculated summaries• Usually exists in summary fact tables

Phase 3: Modeling summaries

Page 44: Designing the data warehouse / data mart

Designing Summary Tables

Units Sales(€) Store

Product ATotal

Product BTotal

Product CTotal

• Average• Maximum

• Total• Percentage

Page 45: Designing the data warehouse / data mart

Summary Tables Example

SALES FACTSSales Region Month10,000 North Jan 9912,000 South Feb 9911,000 North Jan 9915,000 West Mar 9918,000 South Feb 9920,000 North Jan 9910,000 East Jan 992,000 West Mar 99

SALES BY MONTH/REGIONMonth Region Tot_Sales$Jan 99 North 41,000Jan 99 East 10,000Feb 99 South 40,000Mar 99 West 17,000

SALES BY MONTHMonth Tot_SalesJan 99 51,000Feb 99 40,000Mar 99 17,000