lauri pietarinen - what's wrong with my test data
TRANSCRIPT
My Background• Tietokonepalvelu (Pension Insurance) 85-97
– Mainframe development in PLI/DL/I environment– Support department 87-95
• Maintenance of prog. environment, DB2-training etc...
• AtBusiness Communications 97-04– Internet applications– Database design, DW-implementations, Java-programming,
Project management etc...
• Relational Consulting (own company) 04 – Independent database consultant
– Specialising in test data management
• Lauri.pietarinen (at) relational-consulting.com
Customers • Finland
– Ilmarinen (Insurance)– Arek (Insurance)– TietoEnator– Area (Travel agency)– + many others…
• Sweden– BGC– Alecta– SEB
Agenda
• Why is test data management important?
• Alternatives for populating test databases
• Technical issues involved– what is needed (scope of data)?– subsetting issues– de-identifying
• Case: Pension Insurance Company in Sweden
DEV
DB2
ZOS
UNIX
WIN
ORACLE
DB2UDB
.NET
BIZTALK
MQ
SYSTEST
DB2
ZOS
UNIX
WIN
ORACLE
DB2UDB
.NET
BIZTALK
MQ
ACCEPTANCE
DB2
ZOS
UNIX
WIN
ORACLE
DB2UDB
.NET
BIZTALK
MQ
PROD
DB2
ZOS
UNIX
WIN
ORACLE
DB2UDB
.NET
BIZTALK
MQ
Database App Env
Problems with test data
• Test data is not semantically valid– errors in test programs have corrupted the database– integrity over several systems
• external interfaces!
• Test data is not comprehensive– hard to build realistic test cases
• Test data cases are consumed– Contracts terminated and people declared dead
– "You can't step into the same river twice"• Herakleitos
programs can't even be started solving errors caused by faulty data
How to Populate the Test DB?
SQL-Scripts Robot over UI(e.g. QTP)
100% COPY
5%
5%
PROD
TEST
EXTRACT
12
3
4
How to Populate Test DB?• Copy total production full volume into test
– + is comprehensive and intact– + technically simple (can be done with standard tools)– - heavy operation with big databases
– - test environment hard to use and maintain– - ad hoc updates from production not possible– - does not solve problem of consumption and corruption
• Scripts– + create non existent cases
– + only need SQL-editor– - lots of repetitive work– - go out of date
• Extract subset from production– + right data when needed– + same technology can be used to manage the subsets
– - need to build home made tools/scripts or purchase one– - expert knowledge of database structure required
How to Extract?
• Home made tools/scripts– many organisations have such tools/scripts/programs– effort needed to maintain them
• often tied to one person (who will soon be retired!)
• Generic products– DataBee (Net 2000)– Grid-Tools (Grid-Tools)– Optim/Relational Tools (IBM)– Data Express (Micro Focus)
Lot's of issues still remain• What is a test case?
– must define what is needed for the spesific test– customer, with orders or without?– often simpler to extract superset of tables
• Finding the right cases for your test– green haired left handed midget– maintain library of keys and/or SQL-scripts?
• Bookkeeping (is somebody else already using this case?)
• Integrity over applications– External parties– 3rd party software
Some Concepts (Optim)• Extract
– start from a set of rows in start table and extract all related rows from specified tables
– use RI or "soft relations" for navigation
• Extract File– binary format file containing extracted data
• Insert– add rows from extract file into database
• Delete– delete rows that were extracted
• Compare– compare two extracts and flag deleted, inserted and
modified rows
C4
Subsetting Scenario
Test database
C4
C2 C5
C3C7C8
C9
PROGRAM
4 Compare2 Run Program
3 Extract after
Compare
1 Extract before
5 Delete
6 Insert original
C4C4
Impact on Program/DB Design• Batch programs should be able to operate on subsets
of cases– so as not to consume and disturb the whole database!– external parameters (e.g. list of customers) or other
indicators– new columns/tables in database for subsetting?
• Soft Date– Don't get date from the system, give it as a parameter
• Choice of indentifiers– surrogate keys/logical keys
• How identifiers are generated– surrogate table– sequence– select max(key)+1 from table
Case: Company X
• X is a Swedish insurance company that specialises in Labour Pension Insurances
• X recently renewed nearly all of it's application portfolio– Billing, payouts, insurance, DW, actuary,
extranet...– Went live April 1st 2008
• Large project with a budget of about 100M€– development time 6 years– up to 150 persons involved in the project
Case: X
• Technical platforms: >5• Kinds of DBMS's: 4• Number of databases: ~20• Number of tables: >1200• Number of integration interfaces: ~100• Number of batches: 150• Number of online dialogs: 100• Number of test cases: > 1400
Case: X
PROD
ACCEPTANCEDEV
SYSTEST 100%
TEST DATADB
5%
Fast Track /1-10 at a time
5%5%
Take/restore snapshot
QTP
Case: X
Optim EXTRACT Process Report
Extract File : K87376.TDSRES.B54.S004.EAF.SEXT.XFAccess Definition : TDS.EAF.EXTRACTCreated by : Job K87376, using SQLID K87376 on DB2 Subsystem DB2PTime Started : 2008-03-26 08.21.27Time Finished : 2008-03-26 08.48.37
Process Options: Process Mode : Batch Retrieve Data using : DB2 Limit Extract Rows : 40000000 RowList : 'K87376.TDSRES.B54.S004.EAF.SEXT.PNS'
Total Number of Extract Tables : 112
Total Number of Extracted Rows : 6676868
Total Number of First Pass Start Table Rows : 117172
5% of a total of 2M persons
X: Life Cycle Tests
• Test environment was loaded with one person at a time– from a set of about 30 persons with different
profiles
• 10 months worth of batches were run at the rate of about 7 min/month– Batches used "soft date" to simulate time flow
• Before/after compares were made on the database
De-identifying Sensitive Data
• Tightening regulation
• Outsourcing– providing your contractors with good test data is
essential– however, security issues become important
De-identifying Issues
• How to de-identify– use algorithm to create new id (soc.security nr)– create a random id and save in lookup table
• always use the same?
– use a random lookup table for names
• Issues– propagating changes to foreign keys– introducing company wide schemes– introducing extra company schemes
Create a TDM-System
• Wrapping it up by building a Test Data Management System– process for copying data from one environment to
the other– automated system to minimize manual work
involved– Imbed bookkeeping and deidentifying– Auditing and statistics "for free"