how well do you know your data?

Download How well do you know your DATA?

Post on 23-Feb-2016

79 views

Category:

Documents

1 download

Embed Size (px)

DESCRIPTION

How well do you know your DATA?. Glenn Wiebe May 15, 2012. Is Data Liability?. $$$ for Data Storage $$$ for Data Backups $$$ for Data Archiving $$$ for Data Replication $$$ for Data Synchronization $$$ for Disaster Recovery Planning. Is Data Asset?. Helps in making decisions - PowerPoint PPT Presentation

TRANSCRIPT

Title of Your Presentation

Copyright 2007, Information Builders. Slide 1How well do you know your DATA? Glenn Wiebe

May 15, 20121Is Data Liability?$$$ for Data Storage$$$ for Data Backups$$$ for Data Archiving$$$ for Data Replication$$$ for Data Synchronization$$$ for Disaster Recovery Planning

2Is Data Asset?Helps in making decisionsProvides 360 degree view across the enterpriseHelps to understand the customerHelps in building effective Marketing Campaigns Predictive AnalysisStatistical AnalysisSentimental Analysis

3Data Governance ProgramPeopleOrganizations need executive sponsorship

ProcessDocumented repeatable processes and procedures

TechnologyData Integration, Data Quality, Data Synchronization, and Data ManagementiWay Data Integration EnablementSFA/CRMAmdocs/ClarifyBMC/RemedyMSDynamicsOracle/SiebelSalesforce.comSAPData WarehouseDB2ETLOracle/EssbaseMS SSAS/OLAPNetezzaSAP BWTeradataB2BInternet EDILegacy EDIMFTOnline B2BXMLERP/FinancialsAribaI2JD EdwardsLawsonManugisticsMicrosoftOracleSAPIndustryHIPAACIDXHL7RNIFSWIFT1Sync

Legacy SystemsCICSIMSVSAM.NETJavaTUXEDOetc

300+Adapters5Data ProfilingStatistical AnalysisAn overview of summary values, such as extremes, distribution and frequency analysis.Domain AnalysisA configurable analysis of data types.Mask and Group AnalysisAn overview of value formats, groups and dimensions.Business RulesAn analysis of the results of user-defined business rules.Foreign Key and Dependency AnalysesAn inside look into complex connections in the data.Drill ThroughThe option to display individual records that correspond to aggregated results.Data MartReporting and analysis across multiple data set analysesWeb and/or hardcopy report viewing and distribution

6Data Quality Management CycleParsingAssociation(householding)FormatcorrectionIssues causesidentificationContentevaluationMetadataunderstandingAutomaticcorrectionProfilingContext-basedcleansingDevianceidentificationStandardizationOngoingmonitoringEnrichmentKPIdefinitionUnificationDeduplication/ identificationData understandingMonitoring and reportingData enhancementData cleansingiWay Data Quality CenterParsing: Decomposition of fieldsinto component parts.

Cleansing: Modification of data valuesto meet domain restrictions, integrity constraintsor other business rules that define sufficientdata quality for the organization.

Standardization: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.

Validation: Formatting of values into consistent layouts based on industry standards, local standards, user-defined business rules and knowledge bases of values and patterns.

Enrichment: Enhancing the value of internally held data by appending related attributes from external sources.

Matching: Identification, linking or merging related entries within or across sets of data.

8Mastering Master DataWhat is Master Data?Data describing your main business entitiesData duplicated in multiple systemsData reused by multiple business processes

ExamplesCustomer/Citizen/PatientCompany/Partner/AgencyProducts/Items/EquipmentVendors/SuppliersCost Centers/EmployeesEtc, etc, 9Master Data Match & MergeUnificationidentification of the set of records connected to one person addressvehiclecontactetc.

Deduplicationgolden record creation (the best representation of the identified subject)

Identificationnew data entries to identify subject (person, address, etc.) to which the new record is connected (matched)

Complex business rulesusing sophisticated algorithms and functions includingLevenstein distanceHamming distanceEdit distanceData quality scores valuesData stamps of last modificationSource system originating dataetc.

10Data Quality Portal - Complex Exception HandlingException DBResolutionQueueDQplanKPI / DQIcalculationPortal

Invalid dataextractionReportsResolution queueWorkflowExceptionmanagementHuman Mind vs. Computer Systems Hahaha raed tihs! i cdnuolt blveiee taht I cluod aulaclty uesdnatnrd waht I was rdanieg. The phaonemnel pweor of the hmuan mnid, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it dseno't mtaetr in waht oerdr the ltteres in a wrod are, the olny iproamtnt tihng is taht the frsit and lsat ltteer be in the rghit pclae. The rset can be a taotl mses and you can sitll raed it whotuit a pboerlm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe. Azanmig huh? 12Original data before cleansingSource dataNameGSINBirth DateAddressDr. John SmithM00000000012/16/197814618 110 Ave Surrey V3R 2A9Smtih W. JohnM095-242-43416.12.1978Surrey 14618 110 AveJhon William SimthSIN09524243478161225 Linden Str Toronto M4X 1V5Dr. J.W. SmithM09524243311/16/78John Smith09525243316.11.19788500 Leslie L3T 7M8 Toronto Smith Jhon16.11.19788500 Leslie street MarhamJohn Smiht09525243316.11.197813Prepared data (after cleansing)Cleansed dataFirstLastGSINBirth DateAddressJohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmtihM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJhonSimthM095242434M4X 1V5;ON;Toronto;25 Linden StreetSmithM1978-11-16JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JhonSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmiht0952524331978-11-1614MatchCleansed dataFirstLastGSINBirth DateAddressJohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmtihM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJhonSmithM095242434M4X 1V5;ON;Toronto;25 Linden StreetSmithM1978-11-16JohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JhonSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmiht0952524331978-11-1615MergeCleansed dataFirstLastGSINBirth DateAddressJohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmtihM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJhonSmithM095242434M4X 1V5;ON;Toronto;25 Linden StreetGolden recordFirstLastGSINBirth DateAddressJohnSmithM0952424341978-12-16M4X 1V5;ON;Toronto;25 Linden StreetThe newest permanent addressThe most frequent addressV3R 2A9;BC;Surrey;14618 110 Avenue16Merged records before updateSource dataFirstLastGSINBirth DateAddressJohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmithM095242434M4X 1V5;ON;Toronto;25 Linden StreetJohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmiht0952524331978-11-16Golden recordFirstLastGSINBirth DateAddressJohnSmithM0952424341978-12-16M4X 1V5;ON;Toronto;25 Linden StreetJohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.17Merged records after updateSource dataFirstLastGSINBirth DateAddressJohnSmithM1978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmithM095252433M4X 1V5;ON;Toronto;25 Linden StreetJohnSmithM0952524331978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmithM1978-11-16L3T 7M8;ON;Markham;8500 Leslie Str.JohnSmiht0952524331978-11-16Golden recordFirstLastGSINBirth DateAddressJohnSmithM0952424341978-12-16V3R 2A9;BC;Surrey;14618 110 AvenueJohnSmithM0952524331978-11-16M4X 1V5;ON;Toronto;25 Linden StreetOne updated source recordmay cause modification in several records in MDC18Real World Use CaseThe GoalMajor hospital group is building a Master Patient IndexNeed to bring in acquisitioned systemsCleanse, Standard, DeduplicateThe Challenge Previously manually processed by hiring temporary staffCurrent phase projected to take temporary staff of 20 over 18 monthsThe StrategyAutomate the cleansing, matching and merging business rulesData Stewardship provides human oversight to automated processThe BenefitsIdentifies the duplicate records according to very complex business rulesReusable rules for future phasesSignificantly reduced project time from 18 down to 4 months.Over 400% ROI projected

19Real World Use CaseGoal Performance ManagementBusiness IntelligenceChange Management ProcessThe Challenge100 Locations14 Systems with out-of-sync master dataThe StrategyCleanse, Standardize, MatchMaster Data Management Directorate, Borough, Site, Service Type, Service Point, Team, Staff, PatientMaster Data Governance WorkflowThe BenefitsDynamic organizational change to support strategic initiativesComplete visibility into performance of organization vs goals

20Real World Use CaseThe GoalServices organization supporting the airline industry sells decision support information to the industry members.The ChallengeData Quality was adversely affecting the customer base satisfactionData Quality was impacting new revenue generation opportunitiesThe StrategyProfile analysis according to specific business validation rulesMonitor rolling 13 month window comparison of monthly data profilesAccumulate and report analysis to data providersThe BenefitsImproves customer satisfaction and confidence in the information Increases reliability of the information as new data sources are addedDocuments and audits quality-control processes for customer reviewReduces the dependency on human resources to detect and correct data quality issues

21Summary of considerationsAccess to variety of data sourcesAbility to influence data improvement anywhere in the processUseable in batch and/or (real) real-time processing modeExtensible by customized business rulesAccess to third party data and servicesHistorical and distributable analysisReusability across multiple phases and projectsIntegrated data stewardshipPlatform flexibility for deployment and licensingVendor partnership and supportCopyright 2007, Information Builders. Slide 22InformationAccessDataQualityMasterDataManagementDataGovernance22iWay Software BenefitsIntegrate All InformationAny DataAny SystemAny ProtocolAny PlatformAny Proc