cdc datastage integration options r3 - ibm · pdf filedatastage and infosphere qualitystage...

11
© 2010 IBM Corporation InfoSphere CDC To DataStage Integration Options

Upload: trannga

Post on 30-Jan-2018

282 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

© 2010 IBM Corporation

InfoSphere CDC To DataStage Integration Options

Page 2: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

2

Business Challenges Driving Real-Time Data Integration

……Without Impacting the Performance of Production Sys tems

• Yesterday’s data inadequate for inventory and purchasing decisions

• We need up to date information flowing between applications and to ensure an up-to-date version is always available

• Need to pro-actively monitor and respond to business changes

Dynamic Warehousing & Business Intelligence and Reporting

Real-time Event Detection

Data Synchronization and Replication

Page 3: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

3

Accelerate capture and delivery of data changes for ETL optimization or event-driven data quality

Database Database Database.. ..

• InfoSphere Change Data Captureprovides low impact, log-based changed data capture and rapid delivery of changes

• Direct integration with InfoSphere DataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables

• Extremely low impact on sourcing for ETL processing into data warehouse

• Leverage existing data ETL and data cleansing investments

IBM Information Server

Data changes for ETL and data cleansing

Change Data Capture

Page 4: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

4

Differentiators

Integrated with InfoSphere Information Server Benefits

Technology integrated to feed real-time changed data into InfoSphere Information Server

Extend existing InfoSphere Information Server functionality with real-time data feeds

High Performance

Optimized native, log-based change data capture without staging on the source

Fast and efficient; no additional hardware; no changes to databases/applications

Less invasive to data sources and network bandwidth than alternative solutions

Low impact to performance of source databases

Transactional Integrity

Fault tolerant architecture maintains consistency and recovery

Lower risk by ensuring data integrity

Breadth of Coverage

DB2 z/LUW/iSeries, Oracle, Sybase, SQL Server, Informix, IMS, VSAM, ADABAS, IDMS

Leverage existing investments

Page 5: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

5

Four Different Integration Options

• Via Database Staging

• MQ Series Integration

• Flat File Integration

• Direct Connect

Greater flexibility to choose whichever option best fits your environment and business requirements

Page 6: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

6

InfoSphere CDC & InfoSphere DataStage (ETL)

Native

LogDB

Retail

Point Of Sale

“CDC”Continuous

IBM Information Server

Staging Table

Message Queue

Direct Connect

Flat File

Data Stage Consumption

ETL Load

Oracle

Information S

erverC

hange Data C

apture

IBM Information Server EDW

Out of the box

Out of the box

DataStage DSX file format

TCP via Data Stage operator

Teradata, DB2, Oracle, SQL Server, Sybase…

Including BalOp (ELT)

Page 7: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

7

1. DataStage extracts data for initial load using standard ETL functions2. CDC continuously captures changes made to source database3. CDC continuously writes changes to a set of staging tables using Live Audit

mappings4. DataStage reads the changes from the staging tables, transforms and

cleans the data as needed5. Update target database with changes6. Update internal tracking with last CDC bookmark processed

Ideal for:• Low Latency (minutes)• High data volumes (thousands of rows per second)• Any number of tables

CDC � DataStage Option 1: Database Staging

2 5

3 stagingarea DS/QS job

4

1database database

InfoSphere

CDC

Page 8: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

8

1. DataStage extracts data for initial load using standard ETL functions2. CDC continuously captures changes made to remote database3. CDC continuously writes change messages to MQ via CDC event

server target4. DataStage (via MQ connector) processes messages and passes data

off to downstream stages5. Updates written to target database

Ideal for:• Near real-time integration (seconds)• Low data volumes (hundreds of changes per second)• When infrastructure utilizes MQ Series

CDC � DataStage Option 2: MQ Based integration

2 5

3DS/QS job

4

1database database

MQInfoSphere

CDC

Page 9: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

9

1. DataStage extracts data for initial load using standard ETL functions or CDC can be used for refresh

2. CDC continuously captures changes made to source database3. CDC DataStage writes one file per table and periodically hardens the

files4. DataStage reads the changes from the complete files5. Update target database with changes

Ideal for:• Medium latency (a few minutes or more between periodic batches)• Very High data volumes requiring parallel loading• Up to hundreds of tables

CDC � DataStage Option 3: File Based

2 5

3File DS/QS job

4

1database database

InfoSphere

CDC

1

Page 10: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

10

1. DataStage extracts data for initial load using standard ETL functions or CDC can be used for the refresh

2. CDC continuously captures changes made to source database and flows over TCP/IP to CDC Transaction Stage

3. CDC Transaction Stage passes data off to downstream stages4. Updates target database with changed data. Bookmark persisted in the target

database along with the client data to maintain end-to-end transactional integrity5. Bookmark flows back to CDC source periodically, and at start of replication

Ideal for:• Near real-time integration (seconds)• Medium data volumes (hundreds to low thousands of rows per second)• Less than 150 tablesShould not be used for targeting Netezza

CDC � DataStage Option 4: Direct Connect

CDC

1 4

5

2

DS/QS job

databasedatabase

SourceCDC Transaction Stage

Database Connector Stage

CDC

DataStageTarget

1

3

5

2

2

Page 11: CDC DataStage Integration Options R3 - IBM · PDF fileDataStage and InfoSphere QualityStage through flat files, direct connection, message queues, or staging tables • Extremely low

Information Management Software

11

?

Questions?