infa dvo overview

25
1 © 2007 Informatica. Company Confidential. Forward-looking information is based upon multiple assumptions and uncertainties and does not necessarily represent the company’s outlook. PowerCenter Data Validation Option Overview January 2011

Upload: pavan2711

Post on 10-Nov-2015

28 views

Category:

Documents


6 download

DESCRIPTION

DVO

TRANSCRIPT

Informatica Corporate PresentationPowerCenter Data Validation Option Overview
January 2011
Source to Target
Production to Development
ETL version upgrade
Presentation Title
Currently Data Validation is done manually by writing SQL scripts
Customers estimate Data Validation should take ~ 30% of all hours spent on Data Integration
Most customers admit they do not do enough data validation, resulting in poorer data quality and higher project risk
PowerCenter upgrades take up to 6 months
It takes one day to upgrade the ETL software
Presentation Title
Takes a long time and is expensive
Time is spent writing queries and waiting for them to run
Error-prone manual process
Time/Cost pressure leads to “try it here and there” approach
The tester runs out of time/money before testing is done
Usual problems associated with writing custom code
No audit trail
Tool built on top of Informatica PowerCenter
Users define data rules using easy-to-understand GUI
Data is processed and evaluated using PowerCenter
Results are displayed in the GUI and stored for later retrieval and reporting
Presentation Title
2
3
1
Results
DB
10 tables
*
50 table pairs, 968 tests
Manual testing: 3 weeks
DVO: 1 week
*
Customer Example #2
PowerCenter Upgrade Testing
“ We used DataValidator to compare 14 tables and about 30M rows (setup time and training someone included) in less than 5 hours. The largest of the tables was 94 columns. When I asked our QA people how long it would take them to run scripts and test this amount of data, they mentioned months between the two of them.”
Tom Kato
Significant cost savings, faster time to market
50% source-to-target testing
80% regression testing
90% upgrade testing
Ability to test all data, not just a small sample
Ability to test in heterogeneous environments
No need to know SQL
*
*
*
3 Outputs
MyLineID Integer
MyCurrency String
MyAmount Decimal
Validation
F100 Manufacturing Company
Write hundreds of SQL statements to do simple comparisons
Move data into Access to do more complicated comparisons
Store results in a Word document
Run out of time, stop. Go live
Medium-Sized High-Tech Company
Problem: Compare two 100+ field tables during RDBMS upgrade
Write two SQL statements to pull 64,000 records from both tables
Import into Excel
Manually check data
*
Data Validation estimate: 20 - 40%
Data Validation estimate: 50 - 70%
Investi
gation
Testing
Development
Ability to validate lookups
Rapid test generation through scripting or Excel spreadsheets
Ability to run a DVO test from a scheduler, workflow or any other process
Performing aggregations in the database instead of PowerCenter
*
Step 2: Define the validation rules in DataValidator
Step 3: Run these rules to ensure the data conforms to the validation rules
DataValidator creates and executes all tests via PowerCenter
Results are loaded into the DataValidator results database and displayed in DataValidator UI
Step 4: Examine the results, identify source of inconsistencies in the ETL process
Step 5: Repeat for newly added records
*
the ETL Mappings?
Misunderstanding among business users, analysts and ETL developers
Source data being different from what is actually expected
Example:
VP of Sales: “I want a field that shows increase over last year”
Business Analyst: (this_year – last_year) / last_year
ETL developer creates: (this_year – last_year) / this_year
Checking the mapping input and output: mapping is correct
Data is not correct
Requirements gathering and specification
Analysis of source data
Logical and physical modeling
Building the data movement and transformation layer (mappings or code)
Testing and validation
Maintenance and upgrades
Over the years, various tools have been introduced to improve / automate these tasks
ETL tools ~1990s
ER tools ~1990s
Presentation Title
Create Table Pair
Command Line Interface
Ability to integrate DataValidator Tests into PowerCenter or any other Workflow
Ability to schedule DataValidator Tests
*