data warehouse implementation 1. agenda review development approach review dimensional modeling ...

14
MIS 4346/5346 DATA WAREHOUSING Data Warehouse Implementation 1

Upload: george-richardson

Post on 22-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

1

MIS 4346/5346 DATA WAREHOUSING

Data Warehouse Implementation

Page 2: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

2

Agenda

Review Development Approach Review Dimensional Modeling Implementing the Data Warehouse

with SQL Server Enterprise Edition Implementing Data Mart Physical

Structures Creating the data mart database Creating dimension tables Creating fact tables Using scripts

Page 3: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

3

DW Development Approach: Kimball Methodology

DW Project Lifecycle

Business requirements Business Requirements Documentation Bus Matrix

Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, …

Page 4: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

4

Review: Dimensional Modeling

Page 5: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

5

Dimensional Model: Revisited

Page 6: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

6

Data Warehouse Project Lifecycle

Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

Page 7: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

7

IT Architecture/Infrastructure Physical Design*: SQL Server Enterprise Edition

SQL ServerDatabase Engine

* Specifically Product Selection & Installation

Page 8: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

8

Data Warehouse Project Lifecycle

Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

Page 9: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

9

DW/DM Implementation: Building the Data Mart Database Typically one database per data mart Example:

USE MASTER

CREATE DATABASE ClassPerformanceDW;

GO

ALTER DATABASE ClassPerformanceDW SET RECOVERY SIMPLE

GO

Page 10: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

10

Creating Dimension Tables Naming is typically DimTableName Consider data compression Example:

CREATE TABLE DimStudent(student_sk int identity(1,1),student_id varchar(9),firstname varchar(30),lastname varchar(30),major varchar(7),classification varchar(25),gpa numeric(2, 1),clubname varchar(25),undergradschool varchar(25),gmat int,undergradORgrad varchar(10),

CONSTRAINT dimstudent_pk PRIMARY KEY (student_sk)); GO

CREATE INDEX student_id_idx on DimStudent (student_id);GO

ALTER TABLE DimStudent REBUILD WITH (DATA_COMPRESSION = PAGE); GO

GRANT SELECT ON DimStudent TO PUBLIC; GO

See http://blog.sqlauthority.com/2010/03/01/sql-server-data-and-page-compressions-data-storage-and-io-improvement/

OR http://sqlmag.com/database-performance-tuning/practical-data-compression-sql-server

Page 11: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

11

Creating Fact Tables Naming typically FactTableName Example:

CREATE TABLE fact_enrollment(student_sk int,class_sk int,date_sk int,professor_sk int,

location_sk int, termyear_sk int,

coursegrade numeric(2, 1), CONSTRAINT fact_enrollment_pk PRIMARY KEY (student_sk, class_sk, date_sk, professor_sk), CONSTRAINT fact_enrollment_student_fk FOREIGN KEY (student_sk) REFERENCES dimstudent(student_sk), CONSTRAINT fact_enrollment_class_fk FOREIGN KEY(class_sk) REFERENCES dimclass (class_sk), CONSTRAINT fact_enrollment_date_fk FOREIGN KEY(date_sk) REFERENCES dimtime (date_sk), CONSTRAINT fact_enrollment_professor_fk FOREIGN KEY(professor_sk) REFERENCES dimprofessor

(professor_sk), CONSTRAINT fact_enrollment_location_fk FOREIGN KEY(location_sk) REFERENCES dimlocation

(location_sk), CONSTRAINT fact_enrollment_termyear_fk FOREIGN KEY(termyear_sk) REFERENCES dimtermyear

(termyear_sk), );

GO

GRANT SELECT ON factenrollment TO PUBLIC;

GO

Page 12: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

12

Using Scripts

Contains all statements to create data mart tables

Advantages: Can easily create test environments Can easily create production tables Fewer files to manage Code reuse

Page 13: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

13

Example Script “Design”

CREATE Script Contains CREATEs for all tables

TRANSFORM/LOAD Script (next topic) Calls individual transform/load scripts

One for each table Cleanup

Clear and shrink the log file

Example:http://business.baylor.edu/gina_green/teaching/sqlserver/scripts/generate_class_performance_dw_tables/generate_class_performance_dw_tables.zip

Page 14: Data Warehouse Implementation 1. Agenda  Review Development Approach  Review Dimensional Modeling  Implementing the Data Warehouse with SQL Server

14

Summary

Physical Design: Infrastructure and DW

Creating and Naming: Database Dimension tables Fact tables

Considerations when creating above objects

Using scripts