copyright 2007, information builders. slide 1 how well do you know your data? john ramoutsakis may...

24
Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Upload: clifford-bennett

Post on 30-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Copyright 2007, Information Builders. Slide 1

How well do you know your DATA?

John Ramoutsakis

May 10, 2012

Page 2: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Is Data Liability?

$$$ for Data Storage $$$ for Data Backups $$$ for Data Archiving $$$ for Data Replication $$$ for Data Synchronization $$$ for Disaster Recovery Planning

Page 3: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Is Data Asset?

Helps in making decisions Provides 360 degree view across the enterprise Helps to understand the customer Helps in building effective Marketing Campaigns Predictive Analysis Statistical Analysis Sentimental Analysis

Page 4: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Data Governance Program

People Organizations need

executive sponsorship

Process Documented repeatable

processes and procedures

Technology Data Integration, Data

Quality, Data Synchronization, and Data Management

Data Governance

People

ProcessTechnology

Page 5: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

iWay Data Integration Enablement

SFA/CRM Amdocs/Clarify BMC/Remedy MSDynamics Oracle/Siebel Salesforce.com SAP

Data Warehouse DB2 ETL Oracle/Essbase MS SSAS/OLAP Netezza SAP BW Teradata

B2B Internet EDI Legacy EDI MFT Online B2B XML

ERP/Financials Ariba I2 JD Edwards Lawson Manugistics Microsoft Oracle SAP

Industry HIPAA CIDX HL7 RNIF SWIFT 1Sync

Legacy Systems CICS IMS VSAM .NET Java TUXEDO etc

300+Adapters

Page 6: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Data Profiling

Statistical Analysis An overview of summary values, such as

extremes, distribution and frequency analysis. Domain Analysis

A configurable analysis of data types. Mask and Group Analysis

An overview of value formats, groups and dimensions.

Business Rules An analysis of the results of user-defined

business rules. Foreign Key and Dependency Analyses

An inside look into complex connections in the data.

Drill Through The option to display individual records that

correspond to aggregated results. Data Mart

Reporting and analysis across multiple data set analyses

Web and/or hardcopy report viewing and distribution

Page 7: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Data Quality Management Cycle

Parsing

Association(householding)

Formatcorrection

Issues causesidentification

Contentevaluation

Metadataunderstanding

Automaticcorrection

Profiling

Context-basedcleansing

Devianceidentification

Standardization

Ongoingmonitoring

Enrichment

KPIdefinition

Unification

Deduplication/ identification

Data understandingMonitoring and reporting

Data enhancement Data cleansing

Page 8: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

iWay Data Quality Center

Parsing: Decomposition of fields

into component parts.

Cleansing: Modification of data values

to meet domain restrictions, integrity constraints

or other business rules that define sufficient

data quality for the organization.

Standardization: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.

Validation: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.

Enrichment: Enhancing the value of internally held data by appending related attributes from external sources.

Matching: Identification, linking or merging related entries within or across sets of data.

Page 9: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Mastering Master Data

What is Master Data? Data describing your main business entities Data duplicated in multiple systems Data reused by multiple business processes

Examples Customer/Citizen/Patient Company/Partner/Agency Products/Items/Equipment Vendors/Suppliers Cost Centers/Employees Etc, etc, …

Page 10: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Master Data – Match & Merge

Unification identification of the set of records connected to one

person address vehicle contact …etc.

Deduplication golden record creation (the best representation of the identified subject)

Identification new data entries – to identify subject (person, address, etc.) to which the new record is

connected (matched)

Complex business rules using sophisticated algorithms and functions including

Levenstein distance Hamming distance Edit distance Data quality scores values Data stamps of last modification Source system originating data etc.

Page 11: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Data Quality Portal - Complex Exception Handling

Exception DB

ResolutionQueue

DQplan

KPI / DQIcalculation

Portal

Invalid dataextraction

Reports

Resolution queue

Workflow

Exceptionmanagement

Page 12: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Human Mind vs. Computer Systems

Hahaha raed tihs! i cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonemnel pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh?

Page 13: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Original data – before cleansing

Source data

Name G SIN Birth Date Address

Dr. John Smith M 000000000 12/16/1978 14618 110 Ave Surrey V3R 2A9

Smtih W. John M 095-242-434 16.12.1978 Surrey 14618 110 Ave

Jhon William Simth SIN095242434 781612 25 Linden Str Toronto M4X 1V5

Dr. J.W. Smith M 095242433 11/16/78

John Smith 095252433 16.11.1978 8500 Leslie L3T 7M8 Toronto

Smith Jhon 16.11.1978 8500 Leslie street Marham

John Smiht 095252433 16.11.1978

Page 14: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Prepared data (after cleansing)

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

Jhon Simth M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Smith M 1978-11-16

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

Jhon Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Page 15: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Match

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

Jhon Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Smith M 1978-11-16

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

Jhon Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Page 16: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Merge

Cleansed data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smtih M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

Jhon Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

Golden record

First Last G SIN Birth Date Address

John Smith M

095242434 1978-12-16

M4X 1V5;ON;Toronto;25 Linden Street

The newest permanent address

The most frequent address

V3R 2A9;BC;Surrey;14618 110 Avenue

Page 17: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Merged records – before update

Source data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 M4X 1V5;ON;Toronto;25 Linden Street

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Golden record

First Last G SIN Birth Date Address

John Smith M 095242434 1978-12-16 M4X 1V5;ON;Toronto;25 Linden Street

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

Page 18: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Merged records – after update

Source data

First Last G SIN Birth Date Address

John Smith M 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095252433 M4X 1V5;ON;Toronto;25 Linden Street

John Smith M 095252433 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smith M 1978-11-16 L3T 7M8;ON;Markham;8500 Leslie Str.

John Smiht 095252433 1978-11-16

Golden record

First Last G SIN Birth Date Address

John Smith M 095242434 1978-12-16 V3R 2A9;BC;Surrey;14618 110 Avenue

John Smith M 095252433 1978-11-16 M4X 1V5;ON;Toronto;25 Linden Street

One updated source recordmay cause modification in several records in MDC

Page 19: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Real World Use Case

The Goal Major hospital group is building a Master Patient Index Need to bring in acquisitioned systems Cleanse, Standard, Deduplicate

The Challenge Previously manually processed by hiring temporary staff Current phase projected to take temporary staff of 20 over 18 months

The Strategy Automate the cleansing, matching and merging business rules Data Stewardship provides human oversight to automated process

The Benefits Identifies the duplicate records according to very complex business rules Reusable rules for future phases Significantly reduced project time – from 18 down to 4 months. Over 400% ROI projected

Page 20: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Real World Use Case

Goal Performance Management Business Intelligence Change Management Process

The Challenge 100 Locations 14 Systems with out-of-sync master data

The Strategy Cleanse, Standardize, Match Master Data Management – Directorate, Borough, Site, Service Type, Service

Point, Team, Staff, Patient Master Data Governance Workflow

The Benefits Dynamic organizational change to support strategic initiatives Complete visibility into performance of organization vs goals

Page 21: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Real World Use Case

The Goal Services organization supporting the airline industry sells decision support information to

the industry members.

The Challenge Data Quality was adversely affecting the customer base satisfaction Data Quality was impacting new revenue generation opportunities

The Strategy Profile analysis according to specific business validation rules Monitor rolling 13 month window comparison of monthly data profiles Accumulate and report analysis to data providers

The Benefits Improves customer satisfaction and confidence in the information Increases reliability of the information as new data sources are added Documents and audits quality-control processes for customer review Reduces the dependency on human resources to detect and correct data quality issues

Page 22: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Summary of considerations

Access to variety of data sources Ability to influence data improvement anywhere in the

process Useable in batch and/or (real) real-time processing mode Extensible by customized business rules Access to third party data and services Historical and distributable analysis Reusability across multiple phases and projects Integrated data stewardship Platform flexibility for deployment and licensing Vendor partnership and support

Copyright 2007, Information Builders. Slide 22

InformationAccess

DataQuality

MasterData

Management

DataGovernance

Page 23: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

iWay Software Benefits

Integrate All Information

Any Data

Any System

Any Protocol

Any Platform

Any Process Latency

Scheduled

Process Driven

Event Driven

User Driven

Real-time, Online, and Batch

Data Integration

Application Integration

Business Integration

Service Oriented Architecture

Single Solution Platform

Single Engine

Fast and Scalable

Secure and Reliable

Fully Extensible

Page 24: Copyright 2007, Information Builders. Slide 1 How well do you know your DATA? John Ramoutsakis May 10, 2012

Questions?