tdwi data integration basics -...

43
Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. This preview shows selected pages that are representative of the entire course book. The pages shown are not consecutive. The page numbers as they appear in the actual course material are shown at the bottom of each page. All table-of-contents pages are included to illustrate all of the topics covered by a course.

Upload: others

Post on 23-Mar-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. This preview shows selected pages that are representative of the entire course book. The pages shown are not consecutive. The page numbers as they appear in the actual course material are shown at the bottom of each page. All table-of-contents pages are included to illustrate all of the topics covered by a course.

Page 2: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics for Business and IT Professionals

Page 3: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics

ii © The Data Warehousing Institute

The Data Warehousing Institute takes pride in the educational soundness and technical accuracy of all of our courses. Please give us your comments – we’d like to hear from you. Address your feedback to:

email: [email protected] Publication Date: August 2005

© Copyright 2005 by The Data Warehousing Institute. All rights reserved. No part of this document may be reproduced in any form, or by any means, without written permission from The Data Warehousing Institute.

Page 4: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics

© The Data Warehousing Institute iii

Module 1 Data Integration Concepts ...…..…...................... 1-1

Module 2 Data Sources ........................................................ 2-1

Module 3 Data Integration Systems ……............................ 3-1

Module 4 Data Quality ………………………………………… 4-1

Module 5 Data Integration Roles…………………………...... 5-1

Appendix A Basis of Course Examples ………………………. A-1

Appendix B Bibliography and References ……………........... B-1

TAB

LE O

F C

ON

TEN

TS

Page 5: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Concepts

1-1

Module 1 Data Integration Concepts

Topic Page Data Integration Defined 1-2

Data Integration Context 1-6

Data Integration Systems Overview 1-14

Page 6: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Concepts TDWI Data Integration Basics

1-2 © The Data Warehousing Institute

Data Integration Defined What IS Data Integration?

Data Integration: The process of combining data from two

or more disparate but related data sources in such a way that

data from each source increases the overall information value

of the resulting body of data.Dave Wells, TDWI

Integrated data is combined based on business rules.

Ideally, every data element in an integrated database:

• is connected with other data elements• complements the surrounding data• avoids conflict that may result in confusion, uncertainty,

or multiple values for the same business fact• can be traced to the source from which it was obtained

Data Integration: The process of combining data from two

or more disparate but related data sources in such a way that

data from each source increases the overall information value

of the resulting body of data.Dave Wells, TDWI

Data Integration: The process of combining data from two

or more disparate but related data sources in such a way that

data from each source increases the overall information value

of the resulting body of data.Dave Wells, TDWI

Integrated data is combined based on business rules.

Ideally, every data element in an integrated database:

• is connected with other data elements• complements the surrounding data• avoids conflict that may result in confusion, uncertainty,

or multiple values for the same business fact• can be traced to the source from which it was obtained

Page 7: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Concepts

© The Data Warehousing Institute 1-3

Data Integration Defined What IS Data Integration

A PROCESS OF COMBINING DATA

Dave Wells at TDWI defines data integration as “the process of combining data from two or more disparate but related data sources in such a way that data from each source increases the overall information value of the resulting body of data.” Consider these key points from the definition: • Data integration is a process. As with all processes, data integration

has inputs, events and activities that lead to production of a product. • Data integration combines data from multiple related data sources. • The goal of data integration is increased information value from a

body of data. INTEGRATION ACTIVITIES

The activities of the data integration process are those steps necessary to acquire data from sources, transform the data to achieve desirable properties of integrated data, and store integrated data so it is available for use. Data transformation steps – those that change the data – are the most complex of all integration activities. The goals when combining data include removing conflict, establishing data relationships, improving consistency of representation, and ensuring data quality. Business rules provide the foundation for data transformation logic. Transformation based on business rules serves to align data structure and content with real things in the business – an essential part of increasing information value of the data.

INTEGRATION RESULTS

The product of a data integration process is a database that contains integrated data. Desirable characteristics of integrated data include: • Every data element is connected with and related to other data

elements. • Each data element complements the surrounding data by collecting a

related business fact, adding clarity, and providing added context. • Each data element contains a unique and non-redundant business fact,

or if redundant avoids conflict and uncertainty of multiple values for the same business fact.

• The lineage of each data element is known and recorded; every data element is traceable to the source from which it was obtained.

Page 8: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Concepts TDWI Data Integration Basics

1-4 © The Data Warehousing Institute

Data Integration Defined What ISN’T Data Integration

Data organized around business processes or business organizations

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

Employee Data

Jobs Data

Recruiting Data

etc

Different answers depending where you look

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit ReportTime & Cost

Data Mart unable to balance

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit Report

PayrollAudit ReportTime & Cost

Data Mart unable to balance

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

Unable to navigate between two distinct systems or databases

PayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

PayrollSystem

Employee Data

Payment Data

Time ReportsPayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

FinanceSystem

General Ledger

Budget Ledger

Cash LedgerFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting DataPersonnel

System

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Dumping all the data into one database … and calling it a data warehouse!

Data organized around business processes or business organizations

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

Employee Data

Jobs Data

Recruiting Data

etc

Data organized around business processes or business organizations

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

Employee Data

Jobs Data

Recruiting Data

etc

Different answers depending where you look

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit ReportTime & Cost

Data Mart unable to balance

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit Report

PayrollAudit ReportTime & Cost

Data Mart unable to balance

Different answers depending where you look

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit ReportTime & Cost

Data Mart unable to balance

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

HR/PayrollSystem

Jobs Data

Recruiting Data

Payment Data

Time Reports

Employee Data

PayrollAudit Report

PayrollAudit ReportTime & Cost

Data Mart unable to balance

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

Unable to navigate between two distinct systems or databases

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

FinanceSystem

General Ledger

Budget Ledger

Cash Ledger

etc

General Ledger

Budget Ledger

Cash Ledger

etc

payment account number can’tbe found in budget system

Unable to navigate between two distinct systems or databases

PayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

PayrollSystem

Employee Data

Payment Data

Time ReportsPayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

FinanceSystem

General Ledger

Budget Ledger

Cash LedgerFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting DataPersonnel

System

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Dumping all the data into one database … and calling it a data warehouse!

PayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

PayrollSystem

Employee Data

Payment Data

Time ReportsPayrollSystem

Employee Data

Payment Data

Time ReportsFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

FinanceSystem

General Ledger

Budget Ledger

Cash LedgerFinanceSystem

General Ledger

Budget Ledger

Cash Ledger

PersonnelSystem

Employee Data

Jobs Data

Recruiting DataPersonnel

System

Employee Data

Jobs Data

Recruiting Data

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Employee Data

Payment Data

Time ReportsGeneral Ledger

Budget LedgerCash Ledger

Employee Data

Jobs Data

Recruiting Data

“Data Warehouse”

Dumping all the data into one database … and calling it a data warehouse!

Page 9: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Concepts

© The Data Warehousing Institute 1-5

Data Integration Defined What ISN’T Data Integration

STOVEPIPE DATA Data organized around business processes, business organizations, or transactions systems is not integrated. A payroll system and a personnel system, for example, each collect, store, and use employee data. When each system independently manages its own employee data redundancy conflicts are certain to occur. When each system uses its own means of identifying employees the situation is aggravated by inability to navigate between systems and to reconcile conflicts and discrepancies. These circumstances are common throughout the legacy applications of most organizations. More recently many organizations have developed stovepipe data marts, where each data mart is designed to meet the needs of a specific process or work group. When independent data definitions and transformation logic are defined for each data mart, no integration occurs. Non-integrated data marts may use more up-to-date technology than legacy systems, but they do nothing to resolve redundancy and conflict in the data.

CO-LOCATED DATA Putting all of the data into a single database does not by itself achieve

integration. The collective databases that are sometimes built – whether we call them data warehouse, operational data store, reporting database, or another name – are not integrated simply because they are a single database. The same issues of confusion and conflict occur when these databases contain islands of disconnected data, unresolved redundancy, and conflicting values for a single business fact.

Page 10: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Concepts TDWI Data Integration Basics

1-6 © The Data Warehousing Institute

Data Integration Context Business Context – The Need for Data Integration

Business IntelligenceBusiness Intelligence

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

Non-Integrated Legacy Systems

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

PayrollSystem

Employee Data

Payment Data

Time Reports

etc.

Employee Data

Payment Data

Time Reports

etc.

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

PersonnelSystem

Employee Data

Jobs Data

Recruiting Data

etc

Employee Data

Jobs Data

Recruiting Data

etc

Non-Integrated Legacy Systems

PeopleSoft HR

Oracle Financials Siebel CRM

ERP Islands

PeopleSoft HR

Oracle Financials Siebel CRM

PeopleSoft HR

Oracle Financials Siebel CRM

PeopleSoft HR

Oracle Financials Siebel CRM

ERP Islands

Data Stores

Data Sources

Data Acquisition, Cleansing, & Integration

Data Warehouse

Data Warehousing

Data Stores

Data Sources

Data Acquisition, Cleansing, & Integration

Data WarehouseData StoresData Stores

Data Sources

Data Acquisition, Cleansing, & IntegrationData Acquisition, Cleansing, & Integration

Data WarehouseData Warehouse

Data Warehousing

Mergers & Acquisitions

combiningstrategies, products,processes, people,

and more …

Mergers & Acquisitions

combiningstrategies, products,processes, people,

and more …

ActivityProcess

OrganizationEnterprise

Subje

ct Area

s

CRM BPM SCM BAM etc.

Customer

Product

Supplier

Workforce

etc.

Measures

& Metrics

ActivityProcess

OrganizationEnterprise

Subje

ct Area

s

CRM BPM SCM BAM etc.

Customer

Product

Supplier

Workforce

etc.

Measures

& Metrics

Cross-Organizational MetricsActivity

ProcessOrganization

Enterprise

Subje

ct Area

s

CRM BPM SCM BAM etc.

Customer

Product

Supplier

Workforce

etc.

Measures

& Metrics

ActivityProcess

OrganizationEnterprise

Subje

ct Area

s

CRM BPM SCM BAM etc.

Customer

Product

Supplier

Workforce

etc.

Measures

& Metrics

Cross-Organizational Metrics

Page 11: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Concepts

© The Data Warehousing Institute 1-7

Data Integration Context Business Context – The Need for Data Integration

DRIVERS OF INTEGRATION

Many different data and technology environments create a need for data integration. Although distinctly different in goals and purpose the issues, the need, and the integration process are similar for each of: • Non-integrated legacy systems where multiple systems independently

collect and manage redundant and overlapping data. • ERP islands with different and non-integrated ERP systems for

various business functions. • Data warehousing which brings together data from many disparate

sources. • Business intelligence which depends on a foundation of integrated

data to deliver meaningful information. • Mergers and acquisitions where dissimilar data resources of two

enterprises must be combined. • Cross-organizational metrics to provide consistent business measures

that involve multiple business processes, data sources, and computer systems.

INTEGRATION PROJECTS

The drivers itemized above typically result in two distinct kinds of data integration projects: • Recurring integration projects are needed when data needs to be

integrated on a continuous basis. These projects are typical for drivers such as cross-organizational metrics, business intelligence, and data warehousing. Note that the term “recurring integration” does not suggest that the project persists indefinitely, but that the integration process can be executed continuously.

• One-time integration projects are needed when the data integration

process needs to be executed only once. These kinds of projects are typical of data conversion to initially load ERP systems, historical data collection for initial data warehouse loads, and combining of data following mergers or acquisitions.

Although the nature of the projects differs, the integration issues and activities are similar for both types of projects.

Page 12: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Sources

2-1

Module 2 Data Sources

Topic Page Selecting Data Sources 2-2

Understanding Data Sources 2-8

Page 13: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

This page intentionally left blank.

Page 14: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Sources TDWI Data Integration Basics

2-4 © The Data Warehousing Institute

Selecting Data Sources Evaluating Sources – Data with Integration Value

Secondary and

Shadow Systems

Transaction Systems

Decision SupportSystems

Backups, LogFiles & Archives

External Data

Ad-hoc DataCollections

Secondary andShadow Systems

Secondary andShadow Systems

Transaction Systems

Transaction Systems

Decision SupportSystems

Decision SupportSystems

Backups, LogFiles & Archives

Backups, LogFiles & Archives

External DataExternal Data

Ad-hoc DataCollections

Ad-hoc DataCollections

AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity

Usability

Origin of DataOwnershipSystem Management

Usage

Manageability

Data SourceEvaluation Matrix

Secondary andShadow Systems

Transaction Systems

Decision SupportSystems

Backups, LogFiles & Archives

External Data

Ad-hoc DataCollections

Secondary andShadow Systems

Secondary andShadow Systems

Transaction Systems

Transaction Systems

Decision SupportSystems

Decision SupportSystems

Backups, LogFiles & Archives

Backups, LogFiles & Archives

External DataExternal Data

Ad-hoc DataCollections

Ad-hoc DataCollections

AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity

Usability

Origin of DataOwnershipSystem Management

Usage

Manageability

Secondary andShadow Systems

Transaction Systems

Decision SupportSystems

Backups, LogFiles & Archives

External Data

Ad-hoc DataCollections

Secondary andShadow Systems

Secondary andShadow Systems

Transaction Systems

Transaction Systems

Decision SupportSystems

Decision SupportSystems

Backups, LogFiles & Archives

Backups, LogFiles & Archives

External DataExternal Data

Ad-hoc DataCollections

Ad-hoc DataCollections

AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity

UsabilityAvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity

AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity

Usability

Origin of DataOwnershipSystem Management

Usage

ManageabilityOrigin of DataOwnershipSystem Management

Usage

Origin of DataOwnershipSystem Management

Usage

Manageability

Data SourceEvaluation Matrix

Page 15: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Sources

© The Data Warehousing Institute 2-5

Selecting Data Sources Evaluating Sources – Data with Integration Value

USABLE DATA SOURCES

Each prospective data source needs to be evaluated in terms of usability to help determine its real value as a source for data integration. A subjective assessment of usability criteria using a five point scale (1=poor, 5= excellent) is sufficient for the purpose. Usability criteria for evaluation include:

Criteria Assessment Questions Availability How available and accessible is the data? Are there technical obstacles to

access? Or ownership and access authority issues? Understandability How easily understood is the data? Is it well documented? Does someone

in the organization have depth of knowledge? Who works regularly with this data?

Stability How frequently do data structures change? What is the history of change for the data? What is the expected life span of the potential data source?

Accuracy How reliable is the data? Do the business people who work with the data trust it?

Timeliness When and how often is the data updated? How current is the data? How much history is available? How available is it for extraction?

Completeness Does the scope of data correspond to the scope of the data warehouse? Is any data missing?

Granularity Is the source the lowest available grain (most detailed level) for this data? MANAGEABLE DATA SOURCES

The degree to which a data source is easily managed is also important when selecting data sources. It is particularly important for those data sources that will be used routinely for ongoing integration activities such as data warehousing. Consider the following manageability criteria:

Criteria Assessment Questions Origin of Data Is this data source the first point-of-capture for the data? Is it a reliable

source for all instances of the data? Ownership of

Data Who owns the data and the system that collects it? Is it considered to be the system-of-record for the facts that it collects?

System Management

Is the data collection system managed internally or externally? By a service bureau? Internal IT department? End-user department?

Usage of Data Who uses this data? For what purpose? Does the usage naturally lead to feedback and verification of data quality?

Page 16: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Systems

3-1

Module 3 Data Integration Systems

Topic Page Getting Data 3-2

Transforming Data 3-14

Storing Data 3-26

Page 17: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

This page intentionally left blank.

Page 18: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Systems TDWI Data Integration Basics

3-10 © The Data Warehousing Institute

Getting Data Source-to-Target Data Element Mapping

socia

l_sec

urity

_num

ber

first_n

ame

last_n

ame

midd

le_ini

tial

birthd

atege

nder

mailin

g_ad

dres

scit

ysta

tezip

_cod

eho

me_p

hone

_num

ber

work_

phon

e_nu

mber

emer

genc

y_co

ntact_

name

emer

genc

y_co

ntact_

phon

e_nu

mber

tax_s

tatus

_fede

ral

tax_e

xemp

tions

_fede

ral

tax_s

tatus

_stat

etax

_exe

mptio

ns_s

tate

emplo

ymen

t_date

annu

al_sa

lary

healt

h_ins

uran

ce_e

nroll

ed_in

dicato

rsp

ouse

_hea

lth_in

dicato

rde

pend

ent_h

ealth

_indic

ator

ESP_

dedu

ction

_amo

unt

profi

t_sha

ring_

eligib

lility_

boole

anco

mmen

tsloc

al_fie

ld_1

local_

field_

2

employee_id

employee_name

date_of_birth

sex

address_line1

address_line2

city

state

zip_code

ethinc_origin_code

federal_tax_marital_status

federal_tax_number_of_exemptions

state_tax_marital_status

state_tax_number_of_exemptions

hire_date

separation_date

employment_status_code

employment_status_date

SSNemployee_id

benefit_program_code

participation_end_date

participation_begin_date

plan_code

plan_type

spouse_coverage_code

child_coverage_code

benefit_program_carrier_code

pct_to_investment_fundamt_to_investment_fund

E-Ma

x Ben

efits

Partic

ipatio

n Tab

le

PlayNation Employee Table

E-Ma

x Emp

loyee

Tab

le

Page 19: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Systems

© The Data Warehousing Institute 3-11

Getting Data Source-to-Target Data Element Mapping

SAMPLE MATRIX

The matrix on the facing page illustrates an example of mapping source data to target data at the data element level. Data element mapping is not necessarily complex. It is just detailed and sometimes tedious. This level of mapping is necessary to understand requirements for migration of data from non-integrated to integrated data stores. This detailed level of mapping provides information that is essential before transformation design can begin In this example we can see that: • Some data elements have one-to-one associations and identical names

(city, state, and zip_code for example). Do they share common formats and allowable values?

• Some data elements have one-to-one associations and similar but different names (sex / gender, date_of_birth / birthdate). Do they share common formats and allowable values?

• Some data elements have one-to-many associations (employee_name first_name, last_name, and middle_initial). Clearly some kind of

data transformation will be needed here.

• Some target data elements (plan_type, participation_end_date, participation_begin_date, plan_code) have no apparent data source. Will the data be manually populated? Is there another source? OK to not collect this data?

• Some source data elements (phone numbers and emergency contact data from PlayNation) have no corresponding target. Will the data be lost? Should the target be modified?

• Some collections of data elements (spouse and children benefits coverage, for example) are organized in significantly different ways. Complex data transformations may be needed here.

Page 20: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Systems TDWI Data Integration Basics

3-12 © The Data Warehousing Institute

Getting Data Data Capture Design Considerations

ALLDATA

CHANGEDDATA

PUSH TOTARGET

PULL FROMSOURCE

replicate sourcefiles / tables

extract sourcefiles / tables

replicate sourcechanges or

transactions

extract sourcechanges or

transactions

Works well for one time data conversion such as:• Combining data from two systems• Initial load of warehousing data• Start-up data for ERP implementation

Works well for ongoing data integration with small amounts of data.

OK for ongoing data integration (i.e., data warehousing) when data volume is small, and timeliness of data is not important.

Works well for ongoing data integration when real-time data is desired.

Works well for ongoing data integration when real-time data is desired.

ALLDATA

CHANGEDDATA

PUSH TOTARGET

PULL FROMSOURCE

replicate sourcefiles / tables

extract sourcefiles / tables

replicate sourcechanges or

transactions

extract sourcechanges or

transactions

ALLDATA

CHANGEDDATA

PUSH TOTARGET

PULL FROMSOURCE

replicate sourcefiles / tables

extract sourcefiles / tables

replicate sourcechanges or

transactions

extract sourcechanges or

transactions

Works well for one time data conversion such as:• Combining data from two systems• Initial load of warehousing data• Start-up data for ERP implementation

Works well for ongoing data integration with small amounts of data.

OK for ongoing data integration (i.e., data warehousing) when data volume is small, and timeliness of data is not important.

Works well for ongoing data integration when real-time data is desired.

Works well for ongoing data integration when real-time data is desired.

Page 21: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Systems

© The Data Warehousing Institute 3-13

Getting Data Data Capture Design Considerations

MATCHING TO NEEDS AND CONSTRAINTS

Data capture design seeks to get all of the data needed as efficiently as is practical, and to minimize impact on the source systems from which data is obtained. Some of the questions that help to design and develop an optimal data capture process are:

• What constraints does the source system impose? Source systems with limited batch processing time, or those that require 24x7 availability demand special consideration and careful design.

• Will data be captured from the source only one time, or will data capture be ongoing? One-time data capture processes typically consider simplicity, reliability, and speed of development to be more important than processing efficiency. An extract of all data from a source is often the most effective means of acquiring data.

• What volume of data is expected with each instance of data capture? Very large data volumes need special attention to efficiency of acquisition. Capturing only data changes is ideal when changes can reliably be detected. A source system capable of pushing changes may offer an ideal solution.

• Are all occurrences (rows/records) or only a subset needed? If only a subset is needed, then consider the percent of the total body of data that is needed. Small percentage indicates selection as part of the extract process. Large percentage suggests selection after extract.

• Will capture of data changes meet the need or ongoing data capture, or is a full extract needed each time? Can changes be reliably detected in the source system? When changes can’t be detected with confidence, then comparing generations of full extracts may be required. Changes may still be lost, however, depending on the frequency of extract and the volatility of the data.

• Can the source system push data to the integration system, or must the data be pulled by the integration system? For particularly sensitive source systems, push is the best option whenever possible. A push approach allows the source system to control impact of data acquisition.

• What technology is used to store the source data? What technologies are available for data capture? Exploit the available technology to achieve rapidly developed and easy to maintain data acquisition processes. Consider available ETL tools, DBMS replication features, database transaction logs, etc.

Page 22: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Quality

4-1

Module 4 Data Quality

Topic Page Data Quality Concepts 4-2

Data Correctness 4-6

Data Integrity 4-32

Continuous Quality Improvement 4-60

Page 23: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

This page intentionally left blank.

Page 24: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Quality TDWI Data Integration Basics

4-30 © The Data Warehousing Institute

Data Correctness Using Data Correctness Rules

44434241precedence

40393837continuity

36353433retention

32313029duration

28272625currency

24232221granularity

20191817precision

16151413consistency

1211109balancing

8765completeness

4321accuracy preventcorrectrepairdetect

44434241precedence

40393837continuity

36353433retention

32313029duration

28272625currency

24232221granularity

20191817precision

16151413consistency

1211109balancing

8765completeness

4321accuracy preventcorrectrepairdetect

find defects: validate, verify, and inspect data

replace bad data using alternate sources, defaults & derived values

find and fixthe root cause(usually process)

Page 25: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Quality

© The Data Warehousing Institute 4-31

Data Correctness Using Data Correctness Rules

DATA CLEANSING ACTIONS

Data correctness defects exist whenever data is found to be in violation of correctness rules. Data cleansing is a process of taking action to remove defects of data quality. The four common kinds of actions include: • Detection – Knowing when a defect exists. • Repair – Fixing a defect in data that has already been delivered. • Correction – Fixing a data quality defect before the data is delivered. • Prevention – Fixing a process deficiency that allows defects to occur. Eleven types of data correctness rules, when intersected with four kinds of data cleansing activities (detect, repair, correct, prevent) yield forty-four distinct actions that may be taken to improve data correctness.

DETECTING DATA QUALITY DEFECTS

Validation, verification, and inspection are the common techniques used to detect data quality defects. Validation tests data against expressed data quality rules. Verification tests against other reliable sources (i.e., asking a customer to verify their address). Inspection conducts a thorough examination of data to discover properties that might not be found using validation and verification techniques. Where validation and verification assume known questions (e.g. business rules and alternative sources) inspection is a process of data-driven discovery where the questions aren’t necessarily known in advance.

Page 26: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Quality TDWI Data Integration Basics

4-58 © The Data Warehousing Institute

Data Integrity Using Data Integrity Rules

44434241precedence

40393837continuity

36353433retention

32313029duration 28272625attribute dependency

24232221relationship dependency

20191817value set

16151413inheritance

1211109reference

8765cardinality

4321identity

preventcorrectrepairdetect

44434241precedence

40393837continuity

36353433retention

32313029duration 28272625attribute dependency

24232221relationship dependency

20191817value set

16151413inheritance

1211109reference

8765cardinality

4321identity

preventcorrectrepairdetect

find defects: validate, verify, and inspect data

replace bad data using alternate sources, defaults & derived values

find and fixthe root cause(usually process)

Page 27: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Quality

© The Data Warehousing Institute 4-59

Data Integrity Using Data Integrity Rules

DATA CLEANSING ACTIONS

Data integrity defects exist whenever data is found to be in violation of integrity rules. Data cleansing is a process of taking action to remove defects of data quality. The four common kinds of actions are identical to those discussed for data correctness defects: detect, repair, correct, and prevent. Seven types of data correctness rules, when intersected with four kinds of data cleansing activities yield twenty-eight distinct actions that may be taken to improve data correctness. When combined with the forty-four actions for data correctness, a total of seventy-two data cleansing actions are possible.

detect repair correct prevent

accuracy 1 2 3 4

completeness 5 6 7 8

balancing 9 10 11 12

consistency 13 14 15 16

precision 17 18 19 20

granularity 21 22 23 24

currency 25 26 27 28

duration 29 30 31 32

retention 33 34 35 36

continuity 37 38 39 40

Dat

a C

orre

ctne

ss

precedence 41 42 43 44

identity defects 1 2 3 4

reference defects 5 6 7 8

cardinality defects 9 10 11 12

inheritance defects 13 14 15 16

value defects 17 18 19 20

relationship dependency defects 21 22 23 24

Dat

a In

tegr

ity

attribute dependency defects 25 26 27 28

Page 28: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Quality TDWI Data Integration Basics

4-60 © The Data Warehousing Institute

Continuous Quality Improvement Planning and Execution

Filter

Correct

defaultsderivationsalternates

Preventinput

AuditMeasure and Monitor

Act

Identify Actions

Analyze Gap betweenCurrent State & Goals

Define Quality Measures

Set Quality Goals

Assess the Current State

Define the Scope

Plan

ning

Execution

ScopeGoals & Measures

ActionsRoles

ResourcesResponsibilities

ScheduleContinuity

DataQualityPlan

Page 29: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Quality

© The Data Warehousing Institute 4-61

Continuous Quality Improvement Planning and Execution

PLANNED DATA QUALITY

Developing a plan for data cleansing includes the activities necessary to improve data quality, monitor achievement of quality goals, and evolve the data cleansing strategy. Data quality planning consumes time, effort, and resources – it is not free. Like most things, when done well, data quality strategy takes more effort to plan than to execute. The cost and effort of planning is supported by this simple truth: Good data quality is always the result of good planning. Only poor quality happens without planning. A comprehensive data quality plan includes:

Defined Scope addressing questions such as which data is within the scope of effort and which rule types to be applied. While you might be inclined to say “all data and all rules,” practical constraints of time and resources may demand that the scope of effort be reduced.

Goals and Measures that express quantifiable objectives of the data cleansing plan. Goals typically quantify a defect rate – i.e., 99.5% accuracy or zero reference defects. Measures are needed to assess the current state and to evaluate progress toward meeting the plan’s goals.

Actions describe what steps will be taken to improve quality and achieve the planned goals. This course has identified seventy-two common actions for data cleansing. No plan is likely to include all of them. Is the plan to detect errors and audit data quality? To correct or repair defects? To prevent defects at the source?

Roles, Resources and Responsibilities are assigned to detect, correct, and prevent data quality defects, as well to continuously measure and monitor.

Scheduling attaches a timeframe to the goals of the plan. Consider the relative priorities of data quality issues and dependencies among activities to develop a realistic timeline.

Continuity shifts data quality improvement from a project to an ongoing data management practice. Ideally, a data-cleansing plan seeks continuous improvement of data quality. Continuous quality improvement is achieved through regular planning, incremental improvements, and routine communication and feedback.

Page 30: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Roles

© The Data Warehousing Institute 5-1

Module 5 Data Integration Roles

Topic Page Roles and Responsibilities 5-2

Understanding the Data 5-4

Getting the Data 5-10

Changing the Data 5-16

Storing the Data 5-22

Using the Data 5-28

In Conclusion 5-34

Page 31: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

This page intentionally left blank.

Page 32: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Roles TDWI Data Integration Basics

5-2 © The Data Warehousing Institute

Roles and Responsibilities Overview

Planning & Analysis

Design & Construction

Implementation & Execution

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

know the target and map source

data to target data

design and buildprocesses to

capturesource data

identify and specifydata transformation

rules and logic

design and buildprocesses to

transform the data

estimate volume and identify timing

and security requirements

design and buildprocesses to transport and load the data

test, schedule, andexecute transport

and load processing

describe data uses,identify data

quality goals and measures

design & deploy data access tools,

build quality measurement

test and execute data access capabilities,

manage data quality

identify, evaluate, and select

data sources

explore the data for understanding

and to identify business rules

profile the data todiscover and verify

business rules

test, schedule, and execute

transformationprocessing

test, schedule, and execute data captureprocessing

Planning & Analysis

Design & Construction

Implementation & Execution

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

know the target and map source

data to target data

design and buildprocesses to

capturesource data

identify and specifydata transformation

rules and logic

design and buildprocesses to

transform the data

estimate volume and identify timing

and security requirements

design and buildprocesses to transport and load the data

test, schedule, andexecute transport

and load processing

describe data uses,identify data

quality goals and measures

design & deploy data access tools,

build quality measurement

test and execute data access capabilities,

manage data quality

identify, evaluate, and select

data sources

explore the data for understanding

and to identify business rules

profile the data todiscover and verify

business rules

test, schedule, and execute

transformationprocessing

test, schedule, and execute data captureprocessing

Page 33: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Roles

© The Data Warehousing Institute 5-3

Roles and Responsibilities Overview

TEAM EFFORT OF BUSINESS AND TECHNICAL SKILLS

Developing and operating data integration systems are processes that demand both business and technical knowledge. Understanding how data is used, what business rules apply, where and how it is collected, and the degree to which it is trusted offer examples of needs where business knowledge is paramount. Knowledge of storage methods, data structures, database capabilities, etc. provide examples of needs where technical skills are critical.

ROLES AND RESPONSIBILITIES FRAMEWORK

The five stages of data integration lifecycle – understand the data, get the data, change the data, store the data, and use the data provide the foundation to define a roles and responsibilities structure for data integration. When intersected with typical information systems lifecycle phases – planning, analysis, design, construction, implementation, and operation (or execution) – they yield a roles and responsibilities matrix as shown on the facing page. Note that the cells in the matrix do not represent roles or activities, but categories of work within which activities, roles, and responsibilities need to be identified.

Page 34: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Roles TDWI Data Integration Basics

5-4 © The Data Warehousing Institute

Understanding the Data Planning and Analysis Roles

• Conflicting business definitions and terminology

• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

test, schedule, andexecute transformationprocessing

identify, evaluate, andselect data sources

identify, evaluate, andselect data sources

• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

test, schedule, andexecute transformationprocessing

identify, evaluate, andselect data sources

identify, evaluate, andselect data sourcesidentify, evaluate, and

select data sourcesidentify, evaluate, andselect data sources

Page 35: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Roles

© The Data Warehousing Institute 5-19

Changing the Data Design and Construction Roles

ACTIVITIES

Design and construction activities of data transformation build the processes to actually change the data. These activities include: • Identify rule dependencies to develop a modular design that executes

interdependent rules in the correct sequence. Rule dependency exists when execution of a transformation rule is based upon the result of another rule.

• Design and build transformation modules that package a collection of interdependent rules as a single, executable computer procedure.

• Identify time dependencies to develop a process design that executes transformation modules in the correct sequence. Time dependency exists when one transformation rule must execute before another can be executed.

• Design and assemble transformation processes as a set of modules to be executed together in a specific sequence.

ROLES AND RESPONSIBILITIES

Applying the roles and responsibilities model produces a result such as that shown below. Responsibility designations may differ for your organization and activities may need to be tailored to your specific project.

Activity Business IT

Identify Rule Dependencies Consult Decide

Design and Build Transformation Modules Inform Decide

Identify Time Dependencies Consult Decide

Design and Assemble Transformation Processes Inform Decide

Page 36: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Roles TDWI Data Integration Basics

5-32 © The Data Warehousing Institute

Using the Data Implementation and Execution Roles

• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

test, schedule, andexecute transformationprocessing

identify, evaluate, andselect data sources

test and execute dataaccess capabilities,manage data quality

• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

test, schedule, andexecute transformationprocessing

identify, evaluate, andselect data sources

test and execute dataaccess capabilities,manage data qualityidentify, evaluate, and

select data sources

test and execute dataaccess capabilities,manage data quality

Page 37: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Roles

© The Data Warehousing Institute 5-33

Using the Data Implementation and Execution Roles

ACTIVITIES

Value of integrated data is realized when the data is used to achieve positive business outcomes – executing the entire data-to-value chain. Usage activities include: • Test operational features and functions to ensure that they work

correctly and meet business needs. Formalize successful testing by documenting system acceptance.

• Test decision-support and analytic capabilities to ensure that they work correctly and meet business needs. Formalize successful testing by documenting system acceptance.

• Employ operational system capabilities to execute and record business transactions, to carry out day-to-day work, and to obtain data and information needed for operational activities.

• Employ decision-support and analytic capabilities to inform decision-making processes, analyze business outcomes, forecast business trends, and enlighten planning processes.

• Manage data quality by providing continuous feedback about the quality of the data, and by correcting business process issues that lead to data quality problems.

ROLES AND RESPONSIBILITIES

Applying the roles and responsibilities model produces a result such as that shown below. Responsibility designations may differ for your organization and activities may need to be tailored to your specific project.

Activity Business IT

Test Operational Features and Functions Decide Consult

Test Decision-Support and Analytic Capabilities Decide Consult

Employ Operational System Capabilities Decide Consult

Employ Decision-Support and Analytic Capabilities Decide Consult

Manage Data Quality Decide by Consensus

Page 38: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Data Integration Roles TDWI Data Integration Basics

5-34 © The Data Warehousing Institute

In Conclusion Best Practices for Data Integration Success

executionimplementationconstructiondesignanalysisplanning executionimplementationconstructiondesignanalysisplanning

Data Integration is a process that starts with planning and ends with execution.

usage

storage

transformation

acquisition

understanding

usage

storage

transformation

acquisition

understanding

Ever

y as

pect

fro

m u

nder

stan

ding

to

usag

ege

ts a

tten

tion

at

each

pro

cess

sta

ge.

Every activity has designated roles

and responsibilities.

Business and IT work together as a team to achieve data integration success.

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

test, schedule, andexecute transformationprocessing

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & Execution

identify, evaluate andselect data sources

explore the data forunderstanding and toidentify business rules

profile the data todiscover and verifybusiness rules

know the target andmap source datato target data

design and buildprocesses to capturesource data

test, schedule, andexecute data captureprocessing

identify and specifydata transformationrules and logic

design and buildprocesses totransform the data

test, schedule, andexecute transformationprocessing

estimate volume andidentify timing and security requirements

design and buildprocesses to transportand load the data

test, schedule, andexecute transport andload processing

describe data uses,identify data qualitygoals and measures

design & deploy dataaccess tools, build quality measurement

test and execute dataaccess capabilities,manage data quality

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Getthe Data

Changethe Data

Storethe Data

Understandthe Data

Usethe Data

Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution

DecideConsultMap Source Data Elements to Target Data Elements

DecideConsultMap Source Data Stores to Target Data Stores

ConsultDecideMap Source Entities to Target Entities

DecideInformReview the Target Data Model

ITBusinessActivity

DecideConsultMap Source Data Elements to Target Data Elements

DecideConsultMap Source Data Stores to Target Data Stores

ConsultDecideMap Source Entities to Target Entities

DecideInformReview the Target Data Model

ITBusinessActivity

Page 39: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Data Integration Roles

© The Data Warehousing Institute 5-35

In Conclusion Best Practices for Data Integration Success

PROCESS AND TEAMWORK

Four key elements make up successful data integration projects regardless of the reason for data integration: • Data integration is managed as a process with six distinct stages –

planning, analysis, design, construction, implementation, and execution.

• Each stage of the process has activities to focus on every aspect of data integration – understanding the data, getting the data, changing the data, storing the data, and using the data.

• Every activity has designated roles and responsibilities. • Business and IT work together as a team to achieve successful data

integration. MAKING TEAMWORK WORK

To achieve real teamwork every stakeholder in the data integration project, whether representing business or IT, must be able to fill multiple roles – sometimes with decision authority and sometimes in a consulting and advisory role. With clearly designated roles and responsibilities for each activity, teamwork is achieved when: • Business has significant decision-making responsibility. • IT has significant decision-making responsibility. • Business has a consulting and advisory role in IT decisions. • IT has a consulting and advisory role in business decisions. • Critical decisions are made by consensus of business and IT.

A MODEL FOR INTEGRATION TEAMWORK

The following two pages summarize the set of activities discussed throughout this module and suggest typical designation of business and IT roles for each activity. Note that decision-making roles are divided between business and IT, and that each supports and advises the other in a consulting capacity as needed. This model is not presented as the “right way” for all integration projects. It may readily be adapted to your data integration project by adding activities unique to the project, removing activities not needed for the project, and adjusting responsibilities to fit the organization and culture in which the project will be performed. It is less important which roles and responsibilities are decided than that they are decided at the start of the project.

Page 40: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Basis of Course Examples

© The Data Warehousing Institute A-1

Appendix A

Basis of Course Examples

Topic Page Scenario A-3

E-Max Systems A-4

PlayNation Systems A-6

E-Max Database A-8

E-Max Flat Files A-11

Page 41: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Basis of Course Examples

© The Data Warehousing Institute A-3

Scenario Overview of an Acquisition

© TDWI:The Data Warehousing Institute

EDUCATION

Course Example – An Integration ProblemScenario

A-3

E-Max is a consumer electronics retailer with sales outlets that include brick-and-mortar stores, an internet outlet, and catalog sales. E-Max acquires PlayNation, a small chain of electronic gaming stores clustered locally in a fewregions throughout the US and Canada.E-Max has a mature IT department that supports many operational systems and is in the earlystages of building a data warehouse. PlayNation has an ad-hoc systems environment typical of small companies. Much of the data management is done locally by each regional office. Critical corporate systems for finance and payroll are operated by an external service bureau. Most internal data is stored in spreadsheetscomplemented by limited use of a Microsoft Access® database.The most pressing data integration needs are related to workforce and payroll data. Compliance considerations, common paymaster requirements, and the move to an international workforce (with PlayNation’s Canada stores) drive E-Max to focus first on these areas.After satisfying the urgent need to integrate workforce and payroll data, attention will turn to other operational systems and data warehousing.

Page 42: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

Basis of Course Examples TDWI Data Integration Basics

A-4 © The Data Warehousing Institute

E-Max Systems E-Max HRMS and Payroll

© TDWI:The Data Warehousing Institute

EDUCATION

Course Example – An Integration ProblemE-Max HRMS and Payroll

A-4

HRMS Functions• recruiting and hiring• applicant tracking• eeo/affirmative action reporting• compensation management• benefits administration• position control• employment records• employee performance and training

Payroll Functions• time reporting• commission sales reporting• deduction entry• payroll calculation• check reconciliation• tax & benefits accounting• employee payment (check & deposit)• vendor/carrier payment

HRMS Data• employee• appointment• job postings• applicants• position • salary and wage• benefits programs• benefits enrollment• personnel actions• salary history• employee performance history• benefits participation history

Payroll Data• employee (common with HRMS)• appointment (common with HRMS)• position (common with HRMS)• funding distribution• dollar balances • employee deductions• employer contributions• payment history and audit trail• direct deposit enrollment• direct deposit transmittal• time and commission transactions• deduction history

Page 43: TDWI Data Integration Basics - download.101com.comdownload.101com.com/pub/tdwi/Files/TDWI_Data_Integration_Basics_Previewv2.pdfTDWI Data Integration Basics Data Integration Concepts

TDWI Data Integration Basics Basis of Course Examples

© The Data Warehousing Institute A-5

E-Max Systems E-Max HR and Payroll Data

HRMS Database

employeepersonnel

action

benefitspart. history

performancehistory

employeepymt. history

dollarbalances

employeededuction

employercontribution

dir. depositenrollment

salaryhistory

jobtitle position

jobposting

applicantappointment

bonusschedule

fundingdistribution

commissionschedule

person

0,1

1,1

1,1 0,n

0,n

1,1

0,1

0,n

0,1

0,n

0,n0,n

0,n

0,10,n

1,10,n 1,1

1,10,n

0,n

1,11,n

1,1

0,n

1,1

0,n

1,1

0,n

1,1

0,n

1,1

0,n

1,1

1,n

1,1

department staffallocation

benefitsparticipation

appointmenthistory

1,1

0,n

fiscal yearsalary history

detail salary history

401k participation

insuranceparticipation

retirementprogram

investmentprogram

1,11,n

1,1 1,n

1,10,n

1,10,n

1,1

employeesalary

retiree

employeewage

0,11,1

1,1

0,1

1,11,1

0,11,1

1,1

1,1

• vendor payment file

• direct deposit transmittal file

• time transactions file

• commission sales transaction file

• deduction history file

Payroll System(flat files)