introduction to the ibm dataops methodology and practice ...... · regulatory requirements that may...

32
Digital Event Experience Introduction to the IBM DataOps methodology and practice Julie Lockner Director, Portfolio Optimization and Offering Management IBM Data and AI Steven Eliuk VP, Deep Learning & Governance Automation IBM Global CDO Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Upload: others

Post on 21-May-2020

13 views

Category:

Documents


2 download

TRANSCRIPT

DigitalEventExperience

Introduction to the IBM DataOpsmethodology and practiceJulie LocknerDirector, Portfolio Optimization and Offering ManagementIBM Data and AI

Steven EliukVP, Deep Learning & Governance AutomationIBM Global CDO

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

81%do not understand the data required for AI

2

8XAI pioneers are 8X more likely to have a robust data architecture

There is no AI without IA(information architecture)

“No amount of AI algorithmic sophisticationwill overcome a lack of data (architecture)...”Data collection & preparation is the mosttime consuming and difficult part of AI.Think 2020 / May 5, 2020 / © 2020 IBM Corporation

COLLECT - Make data simple and accessible

ORGANIZE - Create a business-ready analytics foundation

ANALYZE - Build and scale AI with trust & explainability

INFUSE - Operationalize AI throughout the business

AI

The AI Ladder A prescriptive approach to the journey to AI

MODERNIZEUnlock the value of data for an AI and multicloud world

One Platform, Any CloudTalent &

Skills

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Use your data

Trust your data

Know your dataDataOps delivers business-ready data fastORGANIZE

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

COLLECT ANALYZESelf-service interaction for data preparation and testing

Data Governance and Curation

Data Quality

Master Data Management

Data Integration

Data Replication

Data Virtualization

Know

Catalog & Metadata Management

Trust Use

ORGANIZE: Critical information architecture capabilities

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

6

Prepare Data Pipelines“Most dreaded part of AI”

Data OperationsDiscover, understand, ingest,integrate, assess quality, clean data

Build Run Manage

Months - Quarters

Problem Statement: Business users need access to high quality data fast. Data pipelines are the primary source of bottlenecks.

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

“Our study shows that 95% of organizations see negative impacts from poor data quality, resulting in wasted resources and additional costs.”https://www.experian.co.uk/assets/data-quality/experian-global-data-management-report-jan-2019.pdf

Poor Data Quality and Governance Cause Negative Business Impact

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

Introducing DataOps

“DataOps is a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization.”

Gartner

8IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

9

Prepare Build Run Manage

Months - Quarters

Hours - Days

DataOps Consistently Delivers High Quality Data Fast

DataOps expedites delivery of high-quality data by:

— Streamlining data pipeline processes.

— Automating core operations on data.

— Incorporating agile processes and workflows.

— Taps into data sources and consumers for end-to-end DataOps.

— Automates test data generation and management

— Enables collaborative communication across key stakeholders and SME.

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Impact – Know Your Data in Minutes Data Inventory Case Study

Reduction in business glossary creation time

Reduction in time to discover metadata and assign terms

Number of technical assets across multiple clouds discovered in less than 5 mins

85% 90% 200,000

Financial Services, Telecommunications, Retail Examples, Healthcare Payer

2 HourROI

Uncovered Protected Health InformationPHI / PII exposure

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Impact - Trust Your DataData Quality Case Study International Bank

13Per hour (manual)

50Per min (automated)

Data records update speed

6%Per hour (manual)

93%Per min (automated)

Data quality score

2 yearsNet promoter score

230xData quality improvement

With DataOps

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Impact – Use Your DataData Integration Use CaseLeading European Retailer

Customer affinity analysis

Inventory stock positions

>3 weeks

< 2 minutes

Data change delay on reporting systems

20 days

< 1 day

~24 hours

< 4 hours

DataOps Impact

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Comparing the two scenarios. Which one is yours?

Single iterationMonths-QuartersOne outcome, costly if wrong

Multiple iterationsDays-WeeksMultiple outcomes, more chances for success

Without DataOps With DataOps

80%Data Prep

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps requires Automation and Multicloud Architecture

Automateddata curation and qualityservices

Automated metadata managementand catalog services

Self-services interaction

Automated data integration

Automated test data management services

Automated master data management

Governed data access services

Business-ready data

OrganizeDataOps Delivers Business Ready Data Fast

On-Prem

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Maturity Model

• Know: Spreadsheets• Trust: Emails• Use: Hand coding

• Know: Departmental / LOB Catalog• Trust: Data Quality Program• Use: Data Virtualization, Data Integration and Data Replication

• Know: Enterprise Catalog• Trust: Data Governance Program with Data Stewardship and Business

Glossary• Use: Self Service Data Prep and Test Data Management

• Know: Enforced and Enriched Catalog• Trust: Compliance, Business Ontology and

Automated Classification• Use: DataOps for All Data Pipelines

Advanced DataOps

Developed DataOps

FoundationalDataOps

No DataOps

Increased business valueand speed in Delivering business-ready data.

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Methodology Automates Data Management Best Practices

DataOps Methodology

— Prioritize and align data pipelines with business objective and success criteria.

— Associated with the Data Engineering discipline

— Automatically measures accuracy and speed of data capture, quality and use.

— Automates data and metadata ingestion and classification.

— Automatically assesses data quality issues and alerts when anomalies are detected.

— Automatically initiates remediation via workflow.

— Automates test data management

— Automatically ensures authorized use of published data assets by enforcing data privacy and governance policies.

Inventory and categorize data

Publish data and use

Deliver quality and governance

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps Interoperates with Peer Organizations

DataOps Interoperates Cross-Functionally

- Application development teams publish source data and incorporate feedback from DataOps to improve data definitions and data quality.

- IT security and compliance teams publish security, privacy and governance policies to DataOps teams to be enforced and respond to audits when necessary.

- Data science teams consume data assets published by data engineering and leverage DataOps for model lineage, data definitions and security and privacy policies.

- Lines-of-business leverage the output of DataOps for accessing high-quality data quickly and efficiently while providing feedback for data definitions, data quality and submitting new assets to be catalogued, assessed and published.

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps combines people, process and technology

Organization design

Executive Sponsor

Executive Steering CommitteeCDO, CIO, LOB Execs,

Chief Risk Officer

Data Architecture Working GroupEnterprise Data Architect

Data ModelersDatabase Administrators

Enterprise Data Governance CouncilData Governance ManagerBusiness Process Owners

Compliance and Legal

Data Custodians

Data Governance OfficeMeta Administrator

Data Governance Analyst Domain Data Stewards

Lead Data Steward

DataOpsData Pipeline Deployment & Test

DataOps Monitoring & ManagementSelf-Service Operations

Data Engineers

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

DataOps in Action at IBM’s Global Chief Data Office

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

IBM CEO

SVP Finance & Operations, Chief Financial Officer

Global Chief Data OfficeVP Finance, ControllerEnterprise Ops & Services

CAO

CIO

Enterprise Data Standards

E2E Data Flows

Enterprise Governance Workflow automation

Data Acquisition (M&A, 3rd Party, Public)

Data Stewardship

Advanced Technology

Hybrid Cloud Development Environment

Production Platform & Solutions Engineering Delivery

Business Controls, Support & Operations

Production Platform Release Mgmt & Project Mgmt

Discovery

Budget & Financial Controls

Platform Adoption

AI Accelerator

BUDO Network

Client Reference Data

Product Data

Modernization & Transformation leveraging

Enterprise Data & AI Platform

Enterprise Data & AI Platform Adoption & Value Creation Client & Product Master DataEnterprise Data Governance Deep Learning

IBM Global Chief Data Office Organizational Structure

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

Importance of Metadata

It can take DAYS for SMEs to review/ approve business term

MetadataEvery enterprise struggles with the problem of labeling

Large risk item, consider:• Untapped potential in dark data• Data Governance, Compliance, Audits, potential Leakage of sensitive data

METADATA makes data visibleand understandable

Metadata unlocks data

Users can easily find, understand and trust the data they need to drive business insights WITH SPEED

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Examples of Metadata Benefits

Regulatory ComplianceMetadata management conducted on a unified platform that provides stewardship, data lineage, and impact analysis services is the best assurance that an organization can validate and demonstrate that the data reported is true.

• e.g., GDPR, Government Owned Entity

Productivity & DiscoveryData is abundant. Much of it comes from existing systems and data stores for which no documentation exists or the documentation does not reflect the changes and updates of those systems and data stores.

• Data scientists can spend 80% of their time finding and cleaning data prior to using it!

Risk AvoidanceMetadata management provides the measure of trust that businesses need. Through data lineage and impact analysis, businesses can know the accuracy, completeness and currency of the data used in their planning or decision-making models.

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

IBM GCDO automated metadata generation (AMG)

Distributed Federated Learning

Lack of data for model training impacts the performance

Local restrictions related to processing of the business information within the limits of certain jurisdiction

Automated Metadata Generation (AMG) uses automation and data science to link data

• A complex series of organic Deep Learning models were developed for CEDP metadata classifications

• Backed by micro-services: Can be installed anywhere (cloud, container)

• ~60TB of labeled training data in addition to public sources and synthetically generated data

Challenges addressedImplementation

Compliance with local regulation

Larger volume of training data allows to achieve better performance

No isolated business units that lack training data

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Unified.Classifying terabytes of data to make it easily discoverable while providing the data stewardship, lineage, and impact analysis to assure it is trustworthy

Dramatically enhancedData Qualitywith regulatory & governance checks

~$27 million

inproductivity

savings

Up to 95% reductionin cycle time:targeted at full automation in 18 months

An AI-powered process for curating, verifying, and classifying data that enhances speed and usability at speed

IBM GCDO Automated Metadata Generation (AMG)

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Small Tag Set as a Product

Project StagesTo provide top-5 recommendation

5x less workload

Goal: full automation, i.e. zero SME involved

600 terms

2500 terms

30%of data

70%of data

How we define it:• Better prediction quality

is available for the small tag set

• No need to provide top-5 recommendations, the choice is easy

~95% workload decreaseTo provide single recommendation

20x less workload

To provide the correct Metadata

NO workload, almost.

1

2

3

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Watson Knowledge CatalogAutomated cataloging to discover, classify, prepare & share data

• ML-driven intelligent discoverability of data sources, models, notebooks, AI artifacts

• Operationalize Data governance program • Data lineage in the language of the Business

IBM Watson / © 2020 IBM CorporationThink 2020 / May 5, 2020 / © 2020 IBM Corporation

Watson Knowledge Catalog now with automated metadata generation

Up to 96% accuracy on holdout data

Up to 70% accuracy on data that was once inaccessible

Business terms can differ across the different groups in an organization.

To address this: AMG's classifications in the current releaseuse an "umbrella" set of 25 terms defined to cover the varying cases we see at the GCDO

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Q1 2018

Concept development of MVP 1Proven internally in GCDOAnd on external enterprise

use cases

MVP 3

Q4 2019

Released in Watson Knowledge Catalog services forCloud Pak for Data

Subsequentrelease 1

Subsequentrelease 2

AMG capability roadmap

MVP 2

Q2 2018 Q4 2018 2020

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Getting Started

— Try Watson Knowledge Catalog today at ibm.com/Watson-Knowledge-Catalog

— Schedule a DataOps Garage Workshop with one of our DataOps Center of Excellence Experts by contacting [email protected]

— Learn more about IBM DataOps at ibm.com/DataOps

Use your data Know your data

Trust your data

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

Notices and disclaimers

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

© 2020 International Business Machines Corporation. No part of this document may be reproduced or transmitted in any form without written permission from IBM.

U.S. Government Users Restricted Rights — use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM.

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

Information in these presentations (including information relating to products that have not yet been announced by IBM) has been reviewed for accuracy as of the date of initial publication and could include unintentional technical or typographical errors. IBM shall have no responsibility to update this information. This document is distributed “as is” without any warranty, either express or implied. In no event, shall IBM be liable for any damage arising from the use of this information, including but not limited to, loss of data, business interruption, loss of profit or loss of opportunity. IBM products and services are warranted per the terms and conditions of the agreements under which they are provided. The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions.

IBM products are manufactured from new parts or new and used parts. In some cases, a product may not be new and may have been previously installed. Regardless, our warranty terms apply.”

Any statements regarding IBM's future direction, intent or product plans are subject to change or withdrawal without notice.

Performance data contained herein was generally obtained in a controlled, isolated environments. Customer examples are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual performance, cost, savings or other results in other operating environments may vary.

References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

Workshops, sessions and associated materials may have been prepared by independent session speakers, and do not necessarily reflect the views of IBM. All materials and discussions are provided for informational purposes only, and are neither intended to, nor shall constitute legal or other guidance or advice to any individual participant or their specific situation.

Notices and disclaimerscontinued

Think 2020 / May 5, 2020 / © 2020 IBM Corporation

It is the customer’s responsibility to insure its own compliance with legal requirements and to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer’s business and any actions the customer may need to take to comply with such laws. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer follows any law.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products.Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

IBM, the IBM logo, and ibm.com are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at: www.ibm.com/legal/copytrade.shtml.