what is a data warehouse and how do i test it?

39
© 2011 Real-Time Technology Solutions, Inc. New York Philadelphia Atlanta www.rtts.com What is a Data Warehouse and How Do I Test It? A primer for Testers on Data Warehouses, the ETL process and Business Intelligence and how to test them

Upload: rtts

Post on 21-Nov-2014

8.308 views

Category:

Technology


5 download

DESCRIPTION

ETL Testing: A primer for Testers on Data Warehouses, ETL, Business Intelligence and how to test them. Are you hearing and reading about Big Data, Enterprise Data Warehouses (EDW), the ETL Process and Business Intelligence (BI)? The software markets for EDW and BI are quickly approaching $22 billion, according to Gartner, and Big Data is growing at an exponential pace. Are you being tasked to test these environments or would you like to learn about them and be prepared for when you are asked to test them? RTTS, the Software Quality Experts, provided this groundbreaking webinar, based upon our many years of experience in providing software quality solutions for more than 400 companies. You will learn the answer to the following questions: • What is Big Data and what does it mean to me? • What are the business reasons for a building a Data Warehouse and for using Business Intelligence software? • How do Data Warehouses, Business Intelligence tools and ETL work from a technical perspective? • Who are the primary players in this software space? • How do I test these environments? • What tools should I use? This video is geared towards:  QA Testers  Automated Test Engineers  Quality Assurance Analysts  Business Analysts  Developers  Project Managers ...and anyone else who is (a) new to the EDW space, (b) wants to be educated in the business and technical sides and (c) wants to understand how to test them.

TRANSCRIPT

Page 1: What is a Data Warehouse and How Do I Test It?

© 2011 Real-Time Technology Solutions, Inc.New York Philadelphia Atlanta www.rtts.com

What is a Data Warehouse and How Do I Test It?

A primer for Testers on Data Warehouses, the ETL process and Business Intelligence and how to test them

Page 2: What is a Data Warehouse and How Do I Test It?

RTTS is the leading provider of software quality

for critical business applications

Fast FactsFounded:1996 - consulting firmLocations:New York (HQ), Atlanta, Philly, Phoenix

Geographic region:Americas, EMEA, APAC

Customer profile:Fortune 1000o 350+ customerso 500+ projects

Strategic Partners:HP, IBM, MSFT, Oracle,

RTTS’ Software:QuerySurge™,TOMOS ALM ™

The Software Quality Experts

Page 3: What is a Data Warehouse and How Do I Test It?

Overview

What is Big Data? What is a Data Warehouse?

o About the ETL Processo The Data Warehouse marketplace

What is Business Intelligence?o The architectureo The BI marketplace

Testing the DW Architectureo Entry pointso The Mapping documento Functional test implementationo Test Tools

Testing BIo Functional test implementationo Performance Testing

Data Warehouse Test Tool demo Q&A

Page 4: What is a Data Warehouse and How Do I Test It?

What is a Big Data?

Page 5: What is a Data Warehouse and How Do I Test It?

Big data – defined as too much volume, velocity and variability to work on normal database architectures.

What is Big Data?

“The market for big data is $70 billion and growing by 15% a year.” - EMC COO Pat Gelsinger

SizeDefined as 5 petabytes or more 1 petabyte = 1,000 terabytes 1,000 terabytes = 1,000,000 gigabytes1,000,000 gigabytes = 1,000,000,000 megabytes

Page 6: What is a Data Warehouse and How Do I Test It?

Big Data Impact

Handles more than 1 million customer transactions every hour.• data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of

Congress.

Facebook handles 40 billion photos from its user base.

Google processes 1 Terabyte per hour

Twitter processes 85 million tweets per day

eBay processes 80 Terabytes per day

Others

Page 7: What is a Data Warehouse and How Do I Test It?

Requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.

Technologies include:• massively parallel processing (MPP) databases• data warehouses• datamining grids• distributed file systems• distributed databases• cloud computing platforms • the Internet, and • scalable storage system

Big Data Solutions

Page 8: What is a Data Warehouse and How Do I Test It?

What is a Data Warehouse?

Page 9: What is a Data Warehouse and How Do I Test It?

What is a Data Warehouse?

Data Warehouse• Typically a relational database that is designed for query and

analysis rather than for transaction processing

• A place where historical data is stored for archival, analysis and security purposes.

• Contains either raw data or formatted data

• Combines data from multiple sources• Sales• Salaries • Operational data • Human resource data• Inventory data• Web logs• Social networks• Internet text and docs• Other

Legacy DB

CRM/ERP DB

Finance DB

Page 10: What is a Data Warehouse and How Do I Test It?

Data Warehouse – Business Case

Why build a Data Warehouse?• Data stored in operational systems (OLTP) not

easily accessible

• OLTP systems are not designed for end-user analysis

• The data in OLTP is constantly changing

• May lack of historical data

• Diverse forms of data stored in different platforms

Page 11: What is a Data Warehouse and How Do I Test It?

Data Warehouse – Business Case

The Data Warehouse Business Solution• Collects data from different sources (other databases,

files, web services, etc)

• Integrates data into logical business areas

• Provides direct access to data with powerful reporting tools (BI)

Page 12: What is a Data Warehouse and How Do I Test It?

Data Warehouse – about the data

The Data Warehouse data

• Subject-oriented

• Integrated

• Non-volatile

• Time-variant

Page 13: What is a Data Warehouse and How Do I Test It?

Data Warehouse – the ETL process

ETL = Extract, Transform, Load

Why ETL?Need to load the data warehouse regularly (daily/weekly) so that it can serve its purpose of facilitating business analysis.

Transform – removing inconsistencies, adding missing fields, summarizing detailed data and deriving new fields to store calculated data.

Load – map the data and load it into the DW

100010110101010101010101010101111

101011111111111110101010101010101011 DATA LOAD

Extract - data from one or more OLTP systems and copied into the warehouse

Page 14: What is a Data Warehouse and How Do I Test It?

Legacy DB

CRM/ERP DB

Finance DB

Data Warehouse – the ETL process

Source Data ETL Process Target DW

Transform

1000101101010101 01010101010101111

101011111111111110101010101010101011 DATA LOAD

Load

Extract

Page 15: What is a Data Warehouse and How Do I Test It?

Data Warehouse – the marketplace

“The data warehousing market will see a compound annual growth rate of 11.5% through 2013 to reach a total of $13.2 billion in revenue.” - Consulting Specialist, The 451 Group

Data Warehouse sizeSmall data warehouses: < 5 TBMidsize data warehouses: 5 TB - 20 TBLarge data warehouses: >20 TB- Analyst firm, Gartner

Leaders in Data Warehouse Data Management Systems

- Analyst firm Gartner’s ‘Magic Quadrant for Data Warehouse Database Management Systems’

Page 16: What is a Data Warehouse and How Do I Test It?

Data Warehouse – the marketplace

Delivery Models• Stand-alone DBMS software • Cloud offerings• Data warehouse appliances

Leading Appliance Makers

Page 17: What is a Data Warehouse and How Do I Test It?

Business Intelligence (BI)

Page 18: What is a Data Warehouse and How Do I Test It?

Business Intelligence (BI)

B.I. – What is it?• Software applications used in spotting,

digging-out, and analyzing business data

• provides easy access to data and uses it in day to day operations, integrates data into logical business areas

• provides historical, current and predictive views of business operations

• made up of several related activities, including data mining, online analytical processing, querying and reporting.

Page 19: What is a Data Warehouse and How Do I Test It?

Business Intelligence (BI) - Who uses it?

Wal-Mart uses vast amounts of data and category analysis to dominate the industry.

Amazon and Yahoo follow a "test and learn" approach to business changes.

Hardee’s, Wendy’s, and T.G.I. Friday’s use BI to make strategic decisions.

Page 20: What is a Data Warehouse and How Do I Test It?

Business Intelligence (BI) & Data Marts

Data MartA database that has the same characteristics as a data warehouse, but is usually smaller and is focused on the data for one division or one workgroup within an enterprise.

Typically hold aggregated data and some granular data. It is a subset of the DW and makes it more efficient for Business Intelligence reporting.

Legacy DBCRM/

ERP DB

Finance DB

ETL ETL

Source Data ETL Process Target DW ETL Process Data Mart

Page 21: What is a Data Warehouse and How Do I Test It?

Business Intelligence (BI)

Legacy DB

CRM/ERP DB

Finance DBETL ETL

Source DataETL Process Target DW

ETL Process

Data Mart

Page 22: What is a Data Warehouse and How Do I Test It?

B.I.– the marketplace

“Worldwide business intelligence (BI) platform, analytic applications and performance management (PM) software revenue reached $10.5 billion in 2010, a 13.4 percent increase from 2009 revenue of $9.3 billion”

“The four large "stack" vendors (SAP, Oracle, IBM and Microsoft) continue to consolidate the market, owning 59 percent of the market share. ”

- Analyst firm Gartner

- Analyst firm Forrester Research’s ‘Forrester Wave’

Leaders in BI

Page 23: What is a Data Warehouse and How Do I Test It?

Testing a Data Warehouse Architecture

Page 24: What is a Data Warehouse and How Do I Test It?

Resources involved

• Business Analysts create requirements

• QA Testers develop and execute test plans and test cases. ***Skill Set required: Very strong SQL!!!

• Architects set up test environments

• Developers perform unit tests

• DBAs test for performance and stress

• Business Users perform functional User Acceptance Tests

Testing a DW – Resources Involved

For the purposes of this presentation, we will focus on a strategy for Testers.

Page 25: What is a Data Warehouse and How Do I Test It?

An effective data warehouse testing strategy focuses on the main structures within the data warehouse architecture:

1) The Sources2) The ETL layer3) The data warehouse itself4) The front-end (BI) data warehouse applications

Testing the Data Warehouse

Page 26: What is a Data Warehouse and How Do I Test It?

Testing the Data Warehouse - Entry Points

Recommended functional test strategy: Test every entry point in the system (feeds, databases, internal messaging, front-end transactions).

The goal: provide rapid localization of data issues between points

test entry point(s) test entry point test entry point

Legacy DB

CRM/ERP DB

Finance DB

ETL ETL

Source Data ETL Process Target DW ETL Process Data Mart

BI

Page 27: What is a Data Warehouse and How Do I Test It?

Target DW

Testing the Data Warehouse - Entry Points

Legacy DB

CRM/ERP DB

Finance DB

Source Data

File

File

Staging DBETL Process

ETL

ETL

ETL

ETL

ETL

ETL

test entry pointstest entry points

test entry points test entry points

Data MartsETL Process

ETL

ETL

BI

BI

Possible architecture

ETL

ETL

ETL

ETL

ETL

ETL

ETL Process

Page 28: What is a Data Warehouse and How Do I Test It?

Testing the DW – Mapping Document

a.k.a. Source to Target Map

It’s the critical element required to efficiently plan the ETL process.

Intention: capture business rules data flow mapping and data movement requirements.

Mapping Doc specifies: Source input definition Target/output details Business & data transformation

rules Absolute data quality

requirements Optional data quality

requirements.

Page 29: What is a Data Warehouse and How Do I Test It?

Testing the DW – Mapping Document

SELECT c.idCustomer "Customer ID", c.lastName "Customer Last Name", c.firstName "Customer First Name", o.idOrder "Order Number", p.name "Product Name", op.quantity "Quantity Ordered", CASE WHEN os.idOrderStatus = 5 AND o.refundDate IS NOT NULL THEN 'Returned' WHEN (os.idOrderStatus = 3 OR os.idOrderStatus = 4) AND o.shipDate IS NOT NULL THEN 'Delivered' ELSE 'Processing' END "Order Status"FROM Sales.Orders o, Sales.OrderStatus os, Sales.OrderProduct op, Sales.Product p, Sales.Category cat, Sales.Customer cWHERE o.order_idOrderStatus = os.idorderstatus ANDop.orderProduct_idOrder = o.idOrder ANDop.orderProduct_idProduct = p.idProduct ANDp.product_idCategory = cat.idCategory ANDcat.name = 'Electronics' ANDo.order_idCustomer = c.idCustomer ANDo.orderDate BETWEEN '01-SEP-10' AND '07-SEP-10'ORDER BY c.idCustomer, c.lastName, c.firstName, o.idorder

Source SELECT u.idUser "Customer ID", u.lastName "Customer Last Name", u.firstName "Customer First Name", p.idPurchase "Purchase Number", i.name "Item Name", oi.quantity "Quantity Ordered", ps.status "Purchase Status"FROM dw.Purchase p, dw.PurchaseStatus ps, dw.OrderItem oi, dw.Item i, dw.user_ u, dw.category catWHERE p.purchase_idPurchaseStatus = ps.idPurchaseStatus ANDoi.orderItem_idPurchase = p.idPurchase ANDoi.orderItem_idItem = i.idItem ANDp.purchase_idUser = u.idUser ANDi.item_idCategory = cat.idCategory ANDcat.name = 'Electronics' ANDSUBSTR(p.purchaseDate, 1, 5) BETWEEN '09-01' AND '09-07' ANDSUBSTR(p.purchaseDate, -2) = '10'ORDER BY u.idUser, u.lastname, u.firstname, p.idpurchase

Target

Page 30: What is a Data Warehouse and How Do I Test It?

Testing the DW – Implementation

Implementation of Functional Test

What is going on in the marketplace?

1. Manual Execution

2. Automated execution with standard test tools

3. Bulk automation with DW Test Tool

Page 31: What is a Data Warehouse and How Do I Test It?

Review Mapping

Docs

Write SQL in favorite editor

Run TESTs

Dump results to a file

Compare results manually or

w/compare tool

Report Defects and

issues

Tools Tasks

Timeline

Testing the DW– Manual Testing Flow

Page 32: What is a Data Warehouse and How Do I Test It?

Manual ETL Testing Flow Comments Check points across each leg so that each transformation is checked.

If a file compare tool is used, care must be taken to ensure that the result rows for each query are in the same order (the db is under no obligation to return rows in a specified order, unless the sql indicates an order).

This process can quick result in 100’s or 1,000’s of pairs of queries.

Only a very small sampling can be performed.

Testing the DW– Manual Testing Flow

Page 33: What is a Data Warehouse and How Do I Test It?

Functional Automation ETL Testing flow

1. Similar to previous - Extract mappings from mapping document

2. Write pairs of queries that test between any two points in the architecture.

3. Issue the queries via a Functional Automation tool

4. Have the functional Scripts dump the query result-sets to files

5. Compare the files, either by writing automation code or by using a file compare tool.

This process is dependent on the speed of the automation tool; only a fraction of the data can be covered per ETL per build.

Functional Tester

Testing the DW– Automated Testing Flow

Page 34: What is a Data Warehouse and How Do I Test It?

SQL (source)

SQL (target)

SQL (source) SQL

(target)

Legacy DB

CRM/ERP DB

Finance DB

Testing the DW– DW Test Tool

Page 35: What is a Data Warehouse and How Do I Test It?

Data Warehouse Test Automation tool

• Validates bulk verification up to 100% of all data

• Provides a huge increase in coverage and verification of your data

• Tremendously decreases your testing time and costs (i.e. huge ROI)

Testing the DW– DW Test Tool

Page 36: What is a Data Warehouse and How Do I Test It?

Testing the DW– Functional Test of BI

Functional Testing of BI

1. Extract mappings from mapping doc for the data mart

2. Execute reports

3. Verify that data is correct

Verify to the source

Verify field lengths and field level data

Verify logical dependencies of fields

Functional Tester

Automation tools can and should be used for regression purposes.

Page 37: What is a Data Warehouse and How Do I Test It?

For Business Intelligence (BI) applications, performance requirements must be met during batch report execution and normal user activity.

For BI applications, performance requirements must be met during batch report execution and normal user activity.

Since most BI applications are customized to meet the specific business requirements and data model of the organization, it is risky to rely on the initial performance testing done by the software vendor prior to their release.

It is therefore a common practice to test the performance of BI applications before their initial deployment and before any major system updates and upgrades.

Testing the DW– Performance Test of BI

Performance Tester

Page 38: What is a Data Warehouse and How Do I Test It?

QuerySurge™ DEMONSTRATION

Automating ETL Testing

DEMO

Page 39: What is a Data Warehouse and How Do I Test It?

Please visitwww.querysurge.comfor more information.

Thank you!