transportation: refreshing warehouse data chapter 13

35
Transportation: Refreshing Warehouse Data Chapter 13

Upload: arron-elliott

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Transportation:Refreshing Warehouse Data

Chapter 13

Developing a Refresh Strategy for Capturing Changed Data

Consider load windowIdentify data volumesIdentify cycleKnow the technical infrastructurePlan a staging areaDetermine how to detect changes

Operational databases

T1 T2 T3

User Requirements and Assistance

Users define the refresh cycle IT balances requirements against technical

issues Document all tasks and processes Employ user skills

Operational databases

T1 T2 T3

Load Window Time available for entire ETT process Plan Test Prove Monitor

Load Window User Access Period Load Window

0 3am 6 9 12pm 3 6 9 12

Load Window Plan and build processes according to a

strategy. Consider volumes of data. Identify technical infrastructure. Ensure currency of data. Consider user access requirements first High availability requirements may mean a

small load window User Access Period

0 3am 6 9 12pm 3 6 9 12

Scheduling the Load Window

Requirements Load cycle

File NamesFile typesNumber of filesNumber of loadsFirst-time load or refreshDate of fileData rangeRecords in file - countsTotals - amounts

3

4

Control File

File1

File2

FTP

Receive data

Openand readfilesto

verifyand

analyze

Controlprocess

0 3 am

Scheduling the Load Window

Load intowarehouse

Verify,analyze,reapply Index

data

Createsummaries

Updatemetadata

5

6

7

8

9

File1

File2 Parallel

load

9 am3 am 6 am

Scheduling the Load Window

Back upwarehouse

Create Views for Specialized

tools

UsersAccess

Summarydata

Publish

10

11 1213

9 am6 am

User access

Capturing Changed Data for Refresh

Capture new fact data Capture changed dimension data Determine method for capture of each Methods: - Wholesale data replacement - Comparison of database instances - Time stamping - Database triggers - Database log Hybird techniques

Wholesale Data Replacement

Expensive Limited historical data, if any Data mart implementations Time period replacement

Operational databases

T1 T2 T3

Comparison of Database Instance

Simple to perform, but expensive in time and processing

Data file: - Changes to operational data since

last refresh - Used by various techniques

Yesterday’sOperationaldatabase

Today’sOperationaldatabase

Databasecomparison

Delta file holdsChanged data

Time and Date Stamping

Fast scanning for records changed since last extraction

Date Updated field No detection of deleted data

Operational data

Delta file holdsChanged data

Database Triggers

Changed data intersected at the server level

Extra I/O required Maintenance overhead

Operation Server

(DBMS)

Trigger

Trigger

Trigger

Using a Database Log

Contains before and after images Requires system checkpoint Common technique

OperationalServer

(DBMS)Log analysis

AndData extraction

LogOperational data

Delta file holdsChanged data

Verdict Consider each method on merit. Consider a hybrid approach if one

approach is not suitable. Consider current technical, existing

operational, and current application issues.

Applying the Changes to Data

You have a choice of techniques: Overwrite a record Add a record Add a field Maintain history Add version numbers

Overwriting a Record

Easy to implement Loses all history Not recommended

Customer ID John Doe Single

Customer ID John Doe Married

Adding a New Record

History is preserved; dimensions grow. Time constraints are not required. Generalized key is created. Metadata tracks usage of keys.

1 Customer Id John Doe Single

1 Customer Id John Doe Single

1A Customer Id John Doe Married

Adding a Current Field

Maintains some history Loses intermediate values Is enhanced by adding an Effective

Date field

Customer Id John Doe Single

Customer Id John Doe Single Married 01-JAN-96

Limitations of Methods for Applying Changes

Complete history impossible Dimensions may grow large Maintenance overload

1234 Comer 1 Main Street 555-67891234 Comer 200 First Ave 222-3211

1234 Comer 1 Main Street 555-6789

1234 Comer 1 Main Street 555-6789 01-Apr-93

1234-01 Comer 200 First Ave 222-3211

Effective Date

1234-01 Comer 200 First Ave 222-3212 01-Jun-97

Maintaining History

One-to-many relationship Always retain current record Consistently able to refer to

record history

HIST_CUST

CUSTOMER

Sales

Time

Product

History Preserved History enables realistic analysis. History retains context of data. History provides for realistic historical

analysis. - Reflect business changes - Maintain context between fact and dimension data - Retain sufficient data to relate old to

new

Version Numbering Avoid double counting Facts hold version number

Customer.CustId Version Customer Names1234 1 Comer1234 2 Comer

Customer.CustId Version Sales Facts1234 1 11,0001234 2 12,000

Customer

Sales

Time

Product

Purging and Archiving Data As data ages, its value

depreciates. Remove old data from the

warehouse: - Archive for later use - Purge without copy

Techniques for Purging Data TRUNCATE: Retains no rollback DELETE: Retains redo and rollback ALTER TABLE: Removes a partition PL/SQL: Uses database triggers

Techniques for Archiving Data Export to dump file from tables Import to tables from dump file ALTER TABLE EXCHANGE partitions

DatabaseDatabase

EXP

IMP .dmp

Verdict Defined by business requirements Must be managed

Final Tasks Update metadata - ETT - User Publish data - Availability - Changes - Subject area basis Use database roles to prevent and allow

access

Publishing Data Control access using database roles 24-hour operation may be requested Compromise between load and

access Consider - Staggering updates - Using temporary tables - Using separate tables

ETT Tool Selection Criteria Overlap with existing tools Availability of meta model Supported data sources Ease of modification and maintenance Required fine tuning of code Ease of change control Power of transformation logic Level of modularization Power of error, exception, resubmission features Intuitive documentation Performance of code

ETT Tool Selection Criteria Activity scheduling and

sophistication Metadata generation Learning curve Flexibility Supported operation systems Cost

Transportation Tools Information OpenBridge Oracle SQL*Loader Gateways PL/SQL Precompilers Platinum Technology InfoPump Platinum Info

Transport

Replication Server Utilities Oracle Symmetric and

Heterogeneous Replication

Gateways and Middleware Brio Technology DataPrism Information Co. OpenBridge Information Builders EDA/SQL Oracle Gateways Platinum Technology InfoHub Prism Prism Manager Software AG Entire Transaction Propagator

SummaryThis lesson discussed the following

topics: Capturing changed data Applying the changes Purging and archiving data Publishing the data, controlling

access, and automating processes Identifying tools for transporting data

into the warehouse