previews of tdwi course books are provided as an...
TRANSCRIPT
Previews of TDWI course books are provided as an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews can not be printed. TDWI strives to provide course books that are content-rich and that serve as useful reference documents after a class has ended. This preview shows selected pages that are representative of the entire course book. The pages shown are not consecutive. The page numbers as they appear in the actual course material are shown at the bottom of each page. All table-of-contents pages are included to illustrate all of the topics covered by a course.
TDWI Data Integration Basics
ii © The Data Warehousing Institute
All rights reserved. No part of this document may be reproduced in any form, or by any means, without written permission from The Data Warehousing Institute.
TDWI Data Integration Basics
© The Data Warehousing Institute iii
Module 1 Data Integration Concepts ...…..…...................... 1-1
Module 2 Data Sources ........................................................ 2-1
Module 3 Data Integration Systems ……............................ 3-1
Module 4 Data Quality ………………………………………… 4-1
Module 5 Data Integration Roles…………………………...... 5-1
Appendix A Basis of Course Examples ………………………. A-1
Appendix B Bibliography and References ……………........... B-1
TAB
LE O
F C
ON
TEN
TS
TDWI Data Integration Basics Data Integration Concepts
1-1
Module 1 Data Integration Concepts
Topic Page Data Integration Defined 1-2
Data Integration Context 1-6
Data Integration Systems Overview 1-14
Data Integration Concepts TDWI Data Integration Basics
1-2 © The Data Warehousing Institute
Data Integration Defined What IS Data Integration?
Data Integration: The process of combining data from two
or more disparate but related data sources in such a way that
data from each source increases the overall information value
of the resulting body of data.Dave Wells, TDWI
Integrated data is combined based on business rules.
Ideally, every data element in an integrated database:
• is connected with other data elements• complements the surrounding data• avoids conflict that may result in confusion, uncertainty,
or multiple values for the same business fact• can be traced to the source from which it was obtained
Data Integration: The process of combining data from two
or more disparate but related data sources in such a way that
data from each source increases the overall information value
of the resulting body of data.Dave Wells, TDWI
Data Integration: The process of combining data from two
or more disparate but related data sources in such a way that
data from each source increases the overall information value
of the resulting body of data.Dave Wells, TDWI
Integrated data is combined based on business rules.
Ideally, every data element in an integrated database:
• is connected with other data elements• complements the surrounding data• avoids conflict that may result in confusion, uncertainty,
or multiple values for the same business fact• can be traced to the source from which it was obtained
TDWI Data Integration Basics Data Integration Concepts
© The Data Warehousing Institute 1-3
Data Integration Defined What IS Data Integration
A PROCESS OF COMBINING DATA
Dave Wells at TDWI defines data integration as “the process of combining data from two or more disparate but related data sources in such a way that data from each source increases the overall information value of the resulting body of data.” Consider these key points from the definition: • Data integration is a process. As with all processes, data integration
has inputs, events and activities that lead to production of a product. • Data integration combines data from multiple related data sources. • The goal of data integration is increased information value from a
body of data. INTEGRATION ACTIVITIES
The activities of the data integration process are those steps necessary to acquire data from sources, transform the data to achieve desirable properties of integrated data, and store integrated data so it is available for use. Data transformation steps – those that change the data – are the most complex of all integration activities. The goals when combining data include removing conflict, establishing data relationships, improving consistency of representation, and ensuring data quality. Business rules provide the foundation for data transformation logic. Transformation based on business rules serves to align data structure and content with real things in the business – an essential part of increasing information value of the data.
INTEGRATION RESULTS
The product of a data integration process is a database that contains integrated data. Desirable characteristics of integrated data include: • Every data element is connected with and related to other data
elements. • Each data element complements the surrounding data by collecting a
related business fact, adding clarity, and providing added context. • Each data element contains a unique and non-redundant business fact,
or if redundant avoids conflict and uncertainty of multiple values for the same business fact.
• The lineage of each data element is known and recorded; every data element is traceable to the source from which it was obtained.
Data Integration Concepts TDWI Data Integration Basics
1-4 © The Data Warehousing Institute
Data Integration Defined What ISN’T Data Integration
Data organized around business processes or business organizations
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
Employee Data
Jobs Data
Recruiting Data
etc
Different answers depending where you look
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit ReportTime & Cost
Data Mart unable to balance
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit Report
PayrollAudit ReportTime & Cost
Data Mart unable to balance
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
Unable to navigate between two distinct systems or databases
PayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
PayrollSystem
Employee Data
Payment Data
Time ReportsPayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
FinanceSystem
General Ledger
Budget Ledger
Cash LedgerFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting DataPersonnel
System
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Dumping all the data into one database … and calling it a data warehouse!
Data organized around business processes or business organizations
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
Employee Data
Jobs Data
Recruiting Data
etc
Data organized around business processes or business organizations
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
Employee Data
Jobs Data
Recruiting Data
etc
Different answers depending where you look
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit ReportTime & Cost
Data Mart unable to balance
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit Report
PayrollAudit ReportTime & Cost
Data Mart unable to balance
Different answers depending where you look
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit ReportTime & Cost
Data Mart unable to balance
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
HR/PayrollSystem
Jobs Data
Recruiting Data
Payment Data
Time Reports
Employee Data
PayrollAudit Report
PayrollAudit ReportTime & Cost
Data Mart unable to balance
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
Unable to navigate between two distinct systems or databases
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
FinanceSystem
General Ledger
Budget Ledger
Cash Ledger
etc
General Ledger
Budget Ledger
Cash Ledger
etc
payment account number can’tbe found in budget system
Unable to navigate between two distinct systems or databases
PayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
PayrollSystem
Employee Data
Payment Data
Time ReportsPayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
FinanceSystem
General Ledger
Budget Ledger
Cash LedgerFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting DataPersonnel
System
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Dumping all the data into one database … and calling it a data warehouse!
PayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
PayrollSystem
Employee Data
Payment Data
Time ReportsPayrollSystem
Employee Data
Payment Data
Time ReportsFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
FinanceSystem
General Ledger
Budget Ledger
Cash LedgerFinanceSystem
General Ledger
Budget Ledger
Cash Ledger
PersonnelSystem
Employee Data
Jobs Data
Recruiting DataPersonnel
System
Employee Data
Jobs Data
Recruiting Data
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Employee Data
Payment Data
Time ReportsGeneral Ledger
Budget LedgerCash Ledger
Employee Data
Jobs Data
Recruiting Data
“Data Warehouse”
Dumping all the data into one database … and calling it a data warehouse!
TDWI Data Integration Basics Data Integration Concepts
© The Data Warehousing Institute 1-5
Data Integration Defined What ISN’T Data Integration
STOVEPIPE DATA Data organized around business processes, business organizations, or transactions systems is not integrated. A payroll system and a personnel system, for example, each collect, store, and use employee data. When each system independently manages its own employee data redundancy conflicts are certain to occur. When each system uses its own means of identifying employees the situation is aggravated by inability to navigate between systems and to reconcile conflicts and discrepancies. These circumstances are common throughout the legacy applications of most organizations. More recently many organizations have developed stovepipe data marts, where each data mart is designed to meet the needs of a specific process or work group. When independent data definitions and transformation logic are defined for each data mart, no integration occurs. Non-integrated data marts may use more up-to-date technology than legacy systems, but they do nothing to resolve redundancy and conflict in the data.
CO-LOCATED DATA Putting all of the data into a single database does not by itself achieve
integration. The collective databases that are sometimes built – whether we call them data warehouse, operational data store, reporting database, or another name – are not integrated simply because they are a single database. The same issues of confusion and conflict occur when these databases contain islands of disconnected data, unresolved redundancy, and conflicting values for a single business fact.
Data Integration Concepts TDWI Data Integration Basics
1-6 © The Data Warehousing Institute
Data Integration Context Business Context – The Need for Data Integration
Business IntelligenceBusiness Intelligence
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
Non-Integrated Legacy Systems
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
PayrollSystem
Employee Data
Payment Data
Time Reports
etc.
Employee Data
Payment Data
Time Reports
etc.
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
PersonnelSystem
Employee Data
Jobs Data
Recruiting Data
etc
Employee Data
Jobs Data
Recruiting Data
etc
Non-Integrated Legacy Systems
PeopleSoft HR
Oracle Financials Siebel CRM
ERP Islands
PeopleSoft HR
Oracle Financials Siebel CRM
PeopleSoft HR
Oracle Financials Siebel CRM
PeopleSoft HR
Oracle Financials Siebel CRM
ERP Islands
Data Stores
Data Sources
Data Acquisition, Cleansing, & Integration
Data Warehouse
Data Warehousing
Data Stores
Data Sources
Data Acquisition, Cleansing, & Integration
Data WarehouseData StoresData Stores
Data Sources
Data Acquisition, Cleansing, & IntegrationData Acquisition, Cleansing, & Integration
Data WarehouseData Warehouse
Data Warehousing
Mergers & Acquisitions
combiningstrategies, products,processes, people,
and more …
Mergers & Acquisitions
combiningstrategies, products,processes, people,
and more …
ActivityProcess
OrganizationEnterprise
Subje
ct Area
s
CRM BPM SCM BAM etc.
Customer
Product
Supplier
Workforce
etc.
Measures
& Metrics
ActivityProcess
OrganizationEnterprise
Subje
ct Area
s
CRM BPM SCM BAM etc.
Customer
Product
Supplier
Workforce
etc.
Measures
& Metrics
Cross-Organizational MetricsActivity
ProcessOrganization
Enterprise
Subje
ct Area
s
CRM BPM SCM BAM etc.
Customer
Product
Supplier
Workforce
etc.
Measures
& Metrics
ActivityProcess
OrganizationEnterprise
Subje
ct Area
s
CRM BPM SCM BAM etc.
Customer
Product
Supplier
Workforce
etc.
Measures
& Metrics
Cross-Organizational Metrics
TDWI Data Integration Basics Data Integration Concepts
© The Data Warehousing Institute 1-7
Data Integration Context Business Context – The Need for Data Integration
DRIVERS OF INTEGRATION
Many different data and technology environments create a need for data integration. Although distinctly different in goals and purpose the issues, the need, and the integration process are similar for each of: • Non-integrated legacy systems where multiple systems independently
collect and manage redundant and overlapping data. • ERP islands with different and non-integrated ERP systems for
various business functions. • Data warehousing which brings together data from many disparate
sources. • Business intelligence which depends on a foundation of integrated
data to deliver meaningful information. • Mergers and acquisitions where dissimilar data resources of two
enterprises must be combined. • Cross-organizational metrics to provide consistent business measures
that involve multiple business processes, data sources, and computer systems.
INTEGRATION PROJECTS
The drivers itemized above typically result in two distinct kinds of data integration projects: • Recurring integration projects are needed when data needs to be
integrated on a continuous basis. These projects are typical for drivers such as cross-organizational metrics, business intelligence, and data warehousing. Note that the term “recurring integration” does not suggest that the project persists indefinitely, but that the integration process can be executed continuously.
• One-time integration projects are needed when the data integration
process needs to be executed only once. These kinds of projects are typical of data conversion to initially load ERP systems, historical data collection for initial data warehouse loads, and combining of data following mergers or acquisitions.
Although the nature of the projects differs, the integration issues and activities are similar for both types of projects.
TDWI Data Integration Basics Data Sources
2-1
Module 2 Data Sources
Topic Page Selecting Data Sources 2-2
Understanding Data Sources 2-8
Data Sources TDWI Data Integration Basics
2-4 © The Data Warehousing Institute
Selecting Data Sources Evaluating Sources – Data with Integration Value
Secondary and
Shadow Systems
Transaction Systems
Decision SupportSystems
Backups, LogFiles & Archives
External Data
Ad-hoc DataCollections
Secondary andShadow Systems
Secondary andShadow Systems
Transaction Systems
Transaction Systems
Decision SupportSystems
Decision SupportSystems
Backups, LogFiles & Archives
Backups, LogFiles & Archives
External DataExternal Data
Ad-hoc DataCollections
Ad-hoc DataCollections
AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity
Usability
Origin of DataOwnershipSystem Management
Usage
Manageability
Data SourceEvaluation Matrix
Secondary andShadow Systems
Transaction Systems
Decision SupportSystems
Backups, LogFiles & Archives
External Data
Ad-hoc DataCollections
Secondary andShadow Systems
Secondary andShadow Systems
Transaction Systems
Transaction Systems
Decision SupportSystems
Decision SupportSystems
Backups, LogFiles & Archives
Backups, LogFiles & Archives
External DataExternal Data
Ad-hoc DataCollections
Ad-hoc DataCollections
AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity
Usability
Origin of DataOwnershipSystem Management
Usage
Manageability
Secondary andShadow Systems
Transaction Systems
Decision SupportSystems
Backups, LogFiles & Archives
External Data
Ad-hoc DataCollections
Secondary andShadow Systems
Secondary andShadow Systems
Transaction Systems
Transaction Systems
Decision SupportSystems
Decision SupportSystems
Backups, LogFiles & Archives
Backups, LogFiles & Archives
External DataExternal Data
Ad-hoc DataCollections
Ad-hoc DataCollections
AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity
UsabilityAvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity
AvailabilityUnderstandabilityStabilityAccuracyTmielinessCompletenessGranularity
Usability
Origin of DataOwnershipSystem Management
Usage
ManageabilityOrigin of DataOwnershipSystem Management
Usage
Origin of DataOwnershipSystem Management
Usage
Manageability
Data SourceEvaluation Matrix
TDWI Data Integration Basics Data Sources
© The Data Warehousing Institute 2-5
Selecting Data Sources Evaluating Sources – Data with Integration Value
USABLE DATA SOURCES
Each prospective data source needs to be evaluated in terms of usability to help determine its real value as a source for data integration. A subjective assessment of usability criteria using a five point scale (1=poor, 5= excellent) is sufficient for the purpose. Usability criteria for evaluation include:
Criteria Assessment Questions Availability How available and accessible is the data? Are there technical obstacles to
access? Or ownership and access authority issues? Understandability How easily understood is the data? Is it well documented? Does someone
in the organization have depth of knowledge? Who works regularly with this data?
Stability How frequently do data structures change? What is the history of change for the data? What is the expected life span of the potential data source?
Accuracy How reliable is the data? Do the business people who work with the data trust it?
Timeliness When and how often is the data updated? How current is the data? How much history is available? How available is it for extraction?
Completeness Does the scope of data correspond to the scope of the data warehouse? Is any data missing?
Granularity Is the source the lowest available grain (most detailed level) for this data? MANAGEABLE DATA SOURCES
The degree to which a data source is easily managed is also important when selecting data sources. It is particularly important for those data sources that will be used routinely for ongoing integration activities such as data warehousing. Consider the following manageability criteria:
Criteria Assessment Questions Origin of Data Is this data source the first point-of-capture for the data? Is it a reliable
source for all instances of the data? Ownership of
Data Who owns the data and the system that collects it? Is it considered to be the system-of-record for the facts that it collects?
System Management
Is the data collection system managed internally or externally? By a service bureau? Internal IT department? End-user department?
Usage of Data Who uses this data? For what purpose? Does the usage naturally lead to feedback and verification of data quality?
TDWI Data Integration Basics Data Integration Systems
3-1
Module 3 Data Integration Systems
Topic Page Getting Data 3-2
Transforming Data 3-14
Storing Data 3-26
Data Integration Systems TDWI Data Integration Basics
3-10 © The Data Warehousing Institute
Getting Data Source-to-Target Data Element Mapping
socia
l_sec
urity
_num
ber
first_n
ame
last_n
ame
midd
le_ini
tial
birthd
atege
nder
mailin
g_ad
dres
scit
ysta
tezip
_cod
eho
me_p
hone
_num
ber
work_
phon
e_nu
mber
emer
genc
y_co
ntact_
name
emer
genc
y_co
ntact_
phon
e_nu
mber
tax_s
tatus
_fede
ral
tax_e
xemp
tions
_fede
ral
tax_s
tatus
_stat
etax
_exe
mptio
ns_s
tate
emplo
ymen
t_date
annu
al_sa
lary
healt
h_ins
uran
ce_e
nroll
ed_in
dicato
rsp
ouse
_hea
lth_in
dicato
rde
pend
ent_h
ealth
_indic
ator
ESP_
dedu
ction
_amo
unt
profi
t_sha
ring_
eligib
lility_
boole
anco
mmen
tsloc
al_fie
ld_1
local_
field_
2
employee_id
employee_name
date_of_birth
sex
address_line1
address_line2
city
state
zip_code
ethinc_origin_code
federal_tax_marital_status
federal_tax_number_of_exemptions
state_tax_marital_status
state_tax_number_of_exemptions
hire_date
separation_date
employment_status_code
employment_status_date
SSNemployee_id
benefit_program_code
participation_end_date
participation_begin_date
plan_code
plan_type
spouse_coverage_code
child_coverage_code
benefit_program_carrier_code
pct_to_investment_fundamt_to_investment_fund
E-Ma
x Ben
efits
Partic
ipatio
n Tab
le
PlayNation Employee Table
E-Ma
x Emp
loyee
Tab
le
TDWI Data Integration Basics Data Integration Systems
© The Data Warehousing Institute 3-11
Getting Data Source-to-Target Data Element Mapping
SAMPLE MATRIX
The matrix on the facing page illustrates an example of mapping source data to target data at the data element level. Data element mapping is not necessarily complex. It is just detailed and sometimes tedious. This level of mapping is necessary to understand requirements for migration of data from non-integrated to integrated data stores. This detailed level of mapping provides information that is essential before transformation design can begin In this example we can see that: • Some data elements have one-to-one associations and identical names
(city, state, and zip_code for example). Do they share common formats and allowable values?
• Some data elements have one-to-one associations and similar but different names (sex / gender, date_of_birth / birthdate). Do they share common formats and allowable values?
• Some data elements have one-to-many associations (employee_name first_name, last_name, and middle_initial). Clearly some kind of
data transformation will be needed here.
• Some target data elements (plan_type, participation_end_date, participation_begin_date, plan_code) have no apparent data source. Will the data be manually populated? Is there another source? OK to not collect this data?
• Some source data elements (phone numbers and emergency contact data from PlayNation) have no corresponding target. Will the data be lost? Should the target be modified?
• Some collections of data elements (spouse and children benefits coverage, for example) are organized in significantly different ways. Complex data transformations may be needed here.
Data Integration Systems TDWI Data Integration Basics
3-12 © The Data Warehousing Institute
Getting Data Data Capture Design Considerations
ALLDATA
CHANGEDDATA
PUSH TOTARGET
PULL FROMSOURCE
replicate sourcefiles / tables
extract sourcefiles / tables
replicate sourcechanges or
transactions
extract sourcechanges or
transactions
Works well for one time data conversion such as:• Combining data from two systems• Initial load of warehousing data• Start-up data for ERP implementation
Works well for ongoing data integration with small amounts of data.
OK for ongoing data integration (i.e., data warehousing) when data volume is small, and timeliness of data is not important.
Works well for ongoing data integration when real-time data is desired.
Works well for ongoing data integration when real-time data is desired.
ALLDATA
CHANGEDDATA
PUSH TOTARGET
PULL FROMSOURCE
replicate sourcefiles / tables
extract sourcefiles / tables
replicate sourcechanges or
transactions
extract sourcechanges or
transactions
ALLDATA
CHANGEDDATA
PUSH TOTARGET
PULL FROMSOURCE
replicate sourcefiles / tables
extract sourcefiles / tables
replicate sourcechanges or
transactions
extract sourcechanges or
transactions
Works well for one time data conversion such as:• Combining data from two systems• Initial load of warehousing data• Start-up data for ERP implementation
Works well for ongoing data integration with small amounts of data.
OK for ongoing data integration (i.e., data warehousing) when data volume is small, and timeliness of data is not important.
Works well for ongoing data integration when real-time data is desired.
Works well for ongoing data integration when real-time data is desired.
TDWI Data Integration Basics Data Integration Systems
© The Data Warehousing Institute 3-13
Getting Data Data Capture Design Considerations
MATCHING TO NEEDS AND CONSTRAINTS
Data capture design seeks to get all of the data needed as efficiently as is practical, and to minimize impact on the source systems from which data is obtained. Some of the questions that help to design and develop an optimal data capture process are:
• What constraints does the source system impose? Source systems with limited batch processing time, or those that require 24x7 availability demand special consideration and careful design.
• Will data be captured from the source only one time, or will data capture be ongoing? One-time data capture processes typically consider simplicity, reliability, and speed of development to be more important than processing efficiency. An extract of all data from a source is often the most effective means of acquiring data.
• What volume of data is expected with each instance of data capture? Very large data volumes need special attention to efficiency of acquisition. Capturing only data changes is ideal when changes can reliably be detected. A source system capable of pushing changes may offer an ideal solution.
• Are all occurrences (rows/records) or only a subset needed? If only a subset is needed, then consider the percent of the total body of data that is needed. Small percentage indicates selection as part of the extract process. Large percentage suggests selection after extract.
• Will capture of data changes meet the need or ongoing data capture, or is a full extract needed each time? Can changes be reliably detected in the source system? When changes can’t be detected with confidence, then comparing generations of full extracts may be required. Changes may still be lost, however, depending on the frequency of extract and the volatility of the data.
• Can the source system push data to the integration system, or must the data be pulled by the integration system? For particularly sensitive source systems, push is the best option whenever possible. A push approach allows the source system to control impact of data acquisition.
• What technology is used to store the source data? What technologies are available for data capture? Exploit the available technology to achieve rapidly developed and easy to maintain data acquisition processes. Consider available ETL tools, DBMS replication features, database transaction logs, etc.
TDWI Data Integration Basics Data Quality
4-1
Module 4 Data Quality
Topic Page Data Quality Concepts 4-2
Data Correctness 4-6
Data Integrity 4-32
Continuous Quality Improvement 4-60
Data Quality TDWI Data Integration Basics
4-30 © The Data Warehousing Institute
Data Correctness Using Data Correctness Rules
44434241precedence
40393837continuity
36353433retention
32313029duration
28272625currency
24232221granularity
20191817precision
16151413consistency
1211109balancing
8765completeness
4321accuracy preventcorrectrepairdetect
44434241precedence
40393837continuity
36353433retention
32313029duration
28272625currency
24232221granularity
20191817precision
16151413consistency
1211109balancing
8765completeness
4321accuracy preventcorrectrepairdetect
find defects: validate, verify, and inspect data
replace bad data using alternate sources, defaults & derived values
find and fixthe root cause(usually process)
TDWI Data Integration Basics Data Quality
© The Data Warehousing Institute 4-31
Data Correctness Using Data Correctness Rules
DATA CLEANSING ACTIONS
Data correctness defects exist whenever data is found to be in violation of correctness rules. Data cleansing is a process of taking action to remove defects of data quality. The four common kinds of actions include: • Detection – Knowing when a defect exists. • Repair – Fixing a defect in data that has already been delivered. • Correction – Fixing a data quality defect before the data is delivered. • Prevention – Fixing a process deficiency that allows defects to occur. Eleven types of data correctness rules, when intersected with four kinds of data cleansing activities (detect, repair, correct, prevent) yield forty-four distinct actions that may be taken to improve data correctness.
DETECTING DATA QUALITY DEFECTS
Validation, verification, and inspection are the common techniques used to detect data quality defects. Validation tests data against expressed data quality rules. Verification tests against other reliable sources (i.e., asking a customer to verify their address). Inspection conducts a thorough examination of data to discover properties that might not be found using validation and verification techniques. Where validation and verification assume known questions (e.g. business rules and alternative sources) inspection is a process of data-driven discovery where the questions aren’t necessarily known in advance.
Data Quality TDWI Data Integration Basics
4-58 © The Data Warehousing Institute
Data Integrity Using Data Integrity Rules
44434241precedence
40393837continuity
36353433retention
32313029duration 28272625attribute dependency
24232221relationship dependency
20191817value set
16151413inheritance
1211109reference
8765cardinality
4321identity
preventcorrectrepairdetect
44434241precedence
40393837continuity
36353433retention
32313029duration 28272625attribute dependency
24232221relationship dependency
20191817value set
16151413inheritance
1211109reference
8765cardinality
4321identity
preventcorrectrepairdetect
find defects: validate, verify, and inspect data
replace bad data using alternate sources, defaults & derived values
find and fixthe root cause(usually process)
TDWI Data Integration Basics Data Quality
© The Data Warehousing Institute 4-59
Data Integrity Using Data Integrity Rules
DATA CLEANSING ACTIONS
Data integrity defects exist whenever data is found to be in violation of integrity rules. Data cleansing is a process of taking action to remove defects of data quality. The four common kinds of actions are identical to those discussed for data correctness defects: detect, repair, correct, and prevent. Seven types of data correctness rules, when intersected with four kinds of data cleansing activities yield twenty-eight distinct actions that may be taken to improve data correctness. When combined with the forty-four actions for data correctness, a total of seventy-two data cleansing actions are possible.
detect repair correct prevent
accuracy 1 2 3 4
completeness 5 6 7 8
balancing 9 10 11 12
consistency 13 14 15 16
precision 17 18 19 20
granularity 21 22 23 24
currency 25 26 27 28
duration 29 30 31 32
retention 33 34 35 36
continuity 37 38 39 40
Dat
a C
orre
ctne
ss
precedence 41 42 43 44
identity defects 1 2 3 4
reference defects 5 6 7 8
cardinality defects 9 10 11 12
inheritance defects 13 14 15 16
value defects 17 18 19 20
relationship dependency defects 21 22 23 24
Dat
a In
tegr
ity
attribute dependency defects 25 26 27 28
Data Quality TDWI Data Integration Basics
4-60 © The Data Warehousing Institute
Continuous Quality Improvement Planning and Execution
Filter
Correct
defaultsderivationsalternates
Preventinput
AuditMeasure and Monitor
Act
Identify Actions
Analyze Gap betweenCurrent State & Goals
Define Quality Measures
Set Quality Goals
Assess the Current State
Define the Scope
Plan
ning
Execution
ScopeGoals & Measures
ActionsRoles
ResourcesResponsibilities
ScheduleContinuity
DataQualityPlan
TDWI Data Integration Basics Data Quality
© The Data Warehousing Institute 4-61
Continuous Quality Improvement Planning and Execution
PLANNED DATA QUALITY
Developing a plan for data cleansing includes the activities necessary to improve data quality, monitor achievement of quality goals, and evolve the data cleansing strategy. Data quality planning consumes time, effort, and resources – it is not free. Like most things, when done well, data quality strategy takes more effort to plan than to execute. The cost and effort of planning is supported by this simple truth: Good data quality is always the result of good planning. Only poor quality happens without planning. A comprehensive data quality plan includes:
Defined Scope addressing questions such as which data is within the scope of effort and which rule types to be applied. While you might be inclined to say “all data and all rules,” practical constraints of time and resources may demand that the scope of effort be reduced.
Goals and Measures that express quantifiable objectives of the data cleansing plan. Goals typically quantify a defect rate – i.e., 99.5% accuracy or zero reference defects. Measures are needed to assess the current state and to evaluate progress toward meeting the plan’s goals.
Actions describe what steps will be taken to improve quality and achieve the planned goals. This course has identified seventy-two common actions for data cleansing. No plan is likely to include all of them. Is the plan to detect errors and audit data quality? To correct or repair defects? To prevent defects at the source?
Roles, Resources and Responsibilities are assigned to detect, correct, and prevent data quality defects, as well to continuously measure and monitor.
Scheduling attaches a timeframe to the goals of the plan. Consider the relative priorities of data quality issues and dependencies among activities to develop a realistic timeline.
Continuity shifts data quality improvement from a project to an ongoing data management practice. Ideally, a data-cleansing plan seeks continuous improvement of data quality. Continuous quality improvement is achieved through regular planning, incremental improvements, and routine communication and feedback.
TDWI Data Integration Basics Data Integration Roles
© The Data Warehousing Institute 5-1
Module 5 Data Integration Roles
Topic Page Roles and Responsibilities 5-2
Understanding the Data 5-4
Getting the Data 5-10
Changing the Data 5-16
Storing the Data 5-22
Using the Data 5-28
In Conclusion 5-34
Data Integration Roles TDWI Data Integration Basics
5-2 © The Data Warehousing Institute
Roles and Responsibilities Overview
Planning & Analysis
Design & Construction
Implementation & Execution
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
know the target and map source
data to target data
design and buildprocesses to
capturesource data
identify and specifydata transformation
rules and logic
design and buildprocesses to
transform the data
estimate volume and identify timing
and security requirements
design and buildprocesses to transport and load the data
test, schedule, andexecute transport
and load processing
describe data uses,identify data
quality goals and measures
design & deploy data access tools,
build quality measurement
test and execute data access capabilities,
manage data quality
identify, evaluate, and select
data sources
explore the data for understanding
and to identify business rules
profile the data todiscover and verify
business rules
test, schedule, and execute
transformationprocessing
test, schedule, and execute data captureprocessing
Planning & Analysis
Design & Construction
Implementation & Execution
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
know the target and map source
data to target data
design and buildprocesses to
capturesource data
identify and specifydata transformation
rules and logic
design and buildprocesses to
transform the data
estimate volume and identify timing
and security requirements
design and buildprocesses to transport and load the data
test, schedule, andexecute transport
and load processing
describe data uses,identify data
quality goals and measures
design & deploy data access tools,
build quality measurement
test and execute data access capabilities,
manage data quality
identify, evaluate, and select
data sources
explore the data for understanding
and to identify business rules
profile the data todiscover and verify
business rules
test, schedule, and execute
transformationprocessing
test, schedule, and execute data captureprocessing
TDWI Data Integration Basics Data Integration Roles
© The Data Warehousing Institute 5-3
Roles and Responsibilities Overview
TEAM EFFORT OF BUSINESS AND TECHNICAL SKILLS
Developing and operating data integration systems are processes that demand both business and technical knowledge. Understanding how data is used, what business rules apply, where and how it is collected, and the degree to which it is trusted offer examples of needs where business knowledge is paramount. Knowledge of storage methods, data structures, database capabilities, etc. provide examples of needs where technical skills are critical.
ROLES AND RESPONSIBILITIES FRAMEWORK
The five stages of data integration lifecycle – understand the data, get the data, change the data, store the data, and use the data provide the foundation to define a roles and responsibilities structure for data integration. When intersected with typical information systems lifecycle phases – planning, analysis, design, construction, implementation, and operation (or execution) – they yield a roles and responsibilities matrix as shown on the facing page. Note that the cells in the matrix do not represent roles or activities, but categories of work within which activities, roles, and responsibilities need to be identified.
Data Integration Roles TDWI Data Integration Basics
5-4 © The Data Warehousing Institute
Understanding the Data Planning and Analysis Roles
• Conflicting business definitions and terminology
• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
test, schedule, andexecute transformationprocessing
identify, evaluate, andselect data sources
identify, evaluate, andselect data sources
• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
test, schedule, andexecute transformationprocessing
identify, evaluate, andselect data sources
identify, evaluate, andselect data sourcesidentify, evaluate, and
select data sourcesidentify, evaluate, andselect data sources
TDWI Data Integration Basics Data Integration Roles
© The Data Warehousing Institute 5-19
Changing the Data Design and Construction Roles
ACTIVITIES
Design and construction activities of data transformation build the processes to actually change the data. These activities include: • Identify rule dependencies to develop a modular design that executes
interdependent rules in the correct sequence. Rule dependency exists when execution of a transformation rule is based upon the result of another rule.
• Design and build transformation modules that package a collection of interdependent rules as a single, executable computer procedure.
• Identify time dependencies to develop a process design that executes transformation modules in the correct sequence. Time dependency exists when one transformation rule must execute before another can be executed.
• Design and assemble transformation processes as a set of modules to be executed together in a specific sequence.
ROLES AND RESPONSIBILITIES
Applying the roles and responsibilities model produces a result such as that shown below. Responsibility designations may differ for your organization and activities may need to be tailored to your specific project.
Activity Business IT
Identify Rule Dependencies Consult Decide
Design and Build Transformation Modules Inform Decide
Identify Time Dependencies Consult Decide
Design and Assemble Transformation Processes Inform Decide
Data Integration Roles TDWI Data Integration Basics
5-32 © The Data Warehousing Institute
Using the Data Implementation and Execution Roles
• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
test, schedule, andexecute transformationprocessing
identify, evaluate, andselect data sources
test and execute dataaccess capabilities,manage data quality
• Conflicting business definitions and terminology• Different ways of identifying data• Data overlap and inconsistency throughout business transaction systems• Hidden meaning, missing data and much more …• Deciding which data to use• Mapping transaction data sources to integrated data targets• Detecting and capturing data changes• Timing and source data readiness and much more …• Business rules for data transformation• Auditing and improving data quality• Connecting data from multiple and disparate transaction systems• Delivering summary data without loss of detail and much more …• Moving data securely over computer networks• Fast and reliable transport for large amounts of data• Freshness of data and timing of data loads• Availability and “up time” vs. time required to load and much more …• Access and navigation of the data• Understanding contents of integrated data stores• Quality, trust, and confidence• Feedback and continuous improvement and much more …
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
test, schedule, andexecute transformationprocessing
identify, evaluate, andselect data sources
test and execute dataaccess capabilities,manage data qualityidentify, evaluate, and
select data sources
test and execute dataaccess capabilities,manage data quality
TDWI Data Integration Basics Data Integration Roles
© The Data Warehousing Institute 5-33
Using the Data Implementation and Execution Roles
ACTIVITIES
Value of integrated data is realized when the data is used to achieve positive business outcomes – executing the entire data-to-value chain. Usage activities include: • Test operational features and functions to ensure that they work
correctly and meet business needs. Formalize successful testing by documenting system acceptance.
• Test decision-support and analytic capabilities to ensure that they work correctly and meet business needs. Formalize successful testing by documenting system acceptance.
• Employ operational system capabilities to execute and record business transactions, to carry out day-to-day work, and to obtain data and information needed for operational activities.
• Employ decision-support and analytic capabilities to inform decision-making processes, analyze business outcomes, forecast business trends, and enlighten planning processes.
• Manage data quality by providing continuous feedback about the quality of the data, and by correcting business process issues that lead to data quality problems.
ROLES AND RESPONSIBILITIES
Applying the roles and responsibilities model produces a result such as that shown below. Responsibility designations may differ for your organization and activities may need to be tailored to your specific project.
Activity Business IT
Test Operational Features and Functions Decide Consult
Test Decision-Support and Analytic Capabilities Decide Consult
Employ Operational System Capabilities Decide Consult
Employ Decision-Support and Analytic Capabilities Decide Consult
Manage Data Quality Decide by Consensus
Data Integration Roles TDWI Data Integration Basics
5-34 © The Data Warehousing Institute
In Conclusion Best Practices for Data Integration Success
executionimplementationconstructiondesignanalysisplanning executionimplementationconstructiondesignanalysisplanning
Data Integration is a process that starts with planning and ends with execution.
usage
storage
transformation
acquisition
understanding
usage
storage
transformation
acquisition
understanding
Ever
y as
pect
fro
m u
nder
stan
ding
to
usag
ege
ts a
tten
tion
at
each
pro
cess
sta
ge.
Every activity has designated roles
and responsibilities.
Business and IT work together as a team to achieve data integration success.
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
test, schedule, andexecute transformationprocessing
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & Execution
identify, evaluate andselect data sources
explore the data forunderstanding and toidentify business rules
profile the data todiscover and verifybusiness rules
know the target andmap source datato target data
design and buildprocesses to capturesource data
test, schedule, andexecute data captureprocessing
identify and specifydata transformationrules and logic
design and buildprocesses totransform the data
test, schedule, andexecute transformationprocessing
estimate volume andidentify timing and security requirements
design and buildprocesses to transportand load the data
test, schedule, andexecute transport andload processing
describe data uses,identify data qualitygoals and measures
design & deploy dataaccess tools, build quality measurement
test and execute dataaccess capabilities,manage data quality
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Getthe Data
Changethe Data
Storethe Data
Understandthe Data
Usethe Data
Planning & Analysis Design & Construction Implementation & ExecutionPlanning & Analysis Design & Construction Implementation & ExecutionPlanning & AnalysisPlanning & Analysis Design & ConstructionDesign & Construction Implementation & ExecutionImplementation & Execution
DecideConsultMap Source Data Elements to Target Data Elements
DecideConsultMap Source Data Stores to Target Data Stores
ConsultDecideMap Source Entities to Target Entities
DecideInformReview the Target Data Model
ITBusinessActivity
DecideConsultMap Source Data Elements to Target Data Elements
DecideConsultMap Source Data Stores to Target Data Stores
ConsultDecideMap Source Entities to Target Entities
DecideInformReview the Target Data Model
ITBusinessActivity
TDWI Data Integration Basics Data Integration Roles
© The Data Warehousing Institute 5-35
In Conclusion Best Practices for Data Integration Success
PROCESS AND TEAMWORK
Four key elements make up successful data integration projects regardless of the reason for data integration: • Data integration is managed as a process with six distinct stages –
planning, analysis, design, construction, implementation, and execution.
• Each stage of the process has activities to focus on every aspect of data integration – understanding the data, getting the data, changing the data, storing the data, and using the data.
• Every activity has designated roles and responsibilities. • Business and IT work together as a team to achieve successful data
integration. MAKING TEAMWORK WORK
To achieve real teamwork every stakeholder in the data integration project, whether representing business or IT, must be able to fill multiple roles – sometimes with decision authority and sometimes in a consulting and advisory role. With clearly designated roles and responsibilities for each activity, teamwork is achieved when: • Business has significant decision-making responsibility. • IT has significant decision-making responsibility. • Business has a consulting and advisory role in IT decisions. • IT has a consulting and advisory role in business decisions. • Critical decisions are made by consensus of business and IT.
A MODEL FOR INTEGRATION TEAMWORK
The following two pages summarize the set of activities discussed throughout this module and suggest typical designation of business and IT roles for each activity. Note that decision-making roles are divided between business and IT, and that each supports and advises the other in a consulting capacity as needed. This model is not presented as the “right way” for all integration projects. It may readily be adapted to your data integration project by adding activities unique to the project, removing activities not needed for the project, and adjusting responsibilities to fit the organization and culture in which the project will be performed. It is less important which roles and responsibilities are decided than that they are decided at the start of the project.
TDWI Data Integration Basics Basis of Course Examples
© The Data Warehousing Institute A-1
Appendix A
Basis of Course Examples
Topic Page Scenario A-3
E-Max Systems A-4
PlayNation Systems A-6
E-Max Database A-8
E-Max Flat Files A-11
TDWI Data Integration Basics Basis of Course Examples
© The Data Warehousing Institute A-3
Scenario Overview of an Acquisition
© TDWI:The Data Warehousing Institute
EDUCATION
Course Example – An Integration ProblemScenario
A-3
E-Max is a consumer electronics retailer with sales outlets that include brick-and-mortar stores, an internet outlet, and catalog sales. E-Max acquires PlayNation, a small chain of electronic gaming stores clustered locally in a fewregions throughout the US and Canada.E-Max has a mature IT department that supports many operational systems and is in the earlystages of building a data warehouse. PlayNation has an ad-hoc systems environment typical of small companies. Much of the data management is done locally by each regional office. Critical corporate systems for finance and payroll are operated by an external service bureau. Most internal data is stored in spreadsheetscomplemented by limited use of a Microsoft Access® database.The most pressing data integration needs are related to workforce and payroll data. Compliance considerations, common paymaster requirements, and the move to an international workforce (with PlayNation’s Canada stores) drive E-Max to focus first on these areas.After satisfying the urgent need to integrate workforce and payroll data, attention will turn to other operational systems and data warehousing.
Basis of Course Examples TDWI Data Integration Basics
A-4 © The Data Warehousing Institute
E-Max Systems E-Max HRMS and Payroll
© TDWI:The Data Warehousing Institute
EDUCATION
Course Example – An Integration ProblemE-Max HRMS and Payroll
A-4
HRMS Functions• recruiting and hiring• applicant tracking• eeo/affirmative action reporting• compensation management• benefits administration• position control• employment records• employee performance and training
Payroll Functions• time reporting• commission sales reporting• deduction entry• payroll calculation• check reconciliation• tax & benefits accounting• employee payment (check & deposit)• vendor/carrier payment
HRMS Data• employee• appointment• job postings• applicants• position • salary and wage• benefits programs• benefits enrollment• personnel actions• salary history• employee performance history• benefits participation history
Payroll Data• employee (common with HRMS)• appointment (common with HRMS)• position (common with HRMS)• funding distribution• dollar balances • employee deductions• employer contributions• payment history and audit trail• direct deposit enrollment• direct deposit transmittal• time and commission transactions• deduction history
TDWI Data Integration Basics Basis of Course Examples
© The Data Warehousing Institute A-5
E-Max Systems E-Max HR and Payroll Data
HRMS Database
employeepersonnel
action
benefitspart. history
performancehistory
employeepymt. history
dollarbalances
employeededuction
employercontribution
dir. depositenrollment
salaryhistory
jobtitle position
jobposting
applicantappointment
bonusschedule
fundingdistribution
commissionschedule
person
0,1
1,1
1,1 0,n
0,n
1,1
0,1
0,n
0,1
0,n
0,n0,n
0,n
0,10,n
1,10,n 1,1
1,10,n
0,n
1,11,n
1,1
0,n
1,1
0,n
1,1
0,n
1,1
0,n
1,1
0,n
1,1
1,n
1,1
department staffallocation
benefitsparticipation
appointmenthistory
1,1
0,n
fiscal yearsalary history
detail salary history
401k participation
insuranceparticipation
retirementprogram
investmentprogram
1,11,n
1,1 1,n
1,10,n
1,10,n
1,1
employeesalary
retiree
employeewage
0,11,1
1,1
0,1
1,11,1
0,11,1
1,1
1,1
• vendor payment file
• direct deposit transmittal file
• time transactions file
• commission sales transaction file
• deduction history file
Payroll System(flat files)