data quality strategy in a big data analytics world · 1) analytics: the widening divide, ibm study...

30
© 2014 IBM Corporation IBM Strategy & Analytics 24 June 2014 Data Quality Strategy in a Big Data Analytics World Conference Session Information and Data Quality Summit 2014 Dr. Alexander Borek IBM Center of Competence for Advanced Analytics IBM Strategy & Analytics @bigdatarisk

Upload: others

Post on 13-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

1

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data Quality Strategy in a Big Data Analytics World Conference Session Information and Data Quality Summit 2014 Dr. Alexander Borek IBM Center of Competence for Advanced Analytics

IBM Strategy & Analytics

@bigdatarisk

Page 2: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

2

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Cause we are living in a Big Data Analytics World...

Big Data is no longer an abstract concept. It is today’s reality companies compete in.

Page 3: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

3

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Hence, these are the questions I would like to explore with you today

1. How can we characterize the new Big Data Analytics World companies compete in?

2. What does it mean for data quality strategy? – What will change, what stays the same? – Which parts of data quality management

need companies to reconsider?

A word of warning: This talk is not about Hadoop, Mongo DB and other Big Data technologies.

Page 4: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

4

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

PART 1

How can we characterize the new Big Data Analytics World companies compete in?

“In the business world, the rearview mirror is always clearer than the windshield.” Warren Buffet

Page 5: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

5

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Biggest change in 2014: Forget about the 3 V´s of Big Data

3 V´s of Big Data Volume, Variety, Velocity

This is how talk on Big Data used to begin

3 M´s of Big Data Make Me Money

The discussion has shifted towards how to make tangible and sustainable profit from (Big) data

Page 6: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

6

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Big Data Analytics: Most important enabler for business innovation

1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title

Analytic capabilities are growing the divide of competitive advantage1

33% higher revenue, growth, 12x more profit growth and 32% ROIC

Data is the new oil2

Exponentially growing data from sensors and social media

Big Data is the next frontier of innovation, competition and productivity3

Transformative potential in almost every industry—hundreds of billion$

Business Perspective

Page 7: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

7

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Big Data: A technology-driven innovation trend

New paradigms for managing and processing data at scale NoSQL, NewSQL, and the CAP theorem, Hadoop, in-memory, column-oriented, ...

New economics for managing and processing data at scale Exponentially decreasing cost of storage, memory and computing power

Vast field of technical innovation Database theory, internet giants, open source and venture capital

Technical Perspective

Page 8: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

8

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Companies are progressing in their big data adoption

8% 20% 18% 19% 35 %

Doing nothing

Knowledge gathering

Developing Strategy

Piloting Experimenting

Deployed Investments

Source: Gartner, Big Data Adoption in 2013 Shows Substance Behind the Hype, 12 Sep 2013

Page 9: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

9

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Growing number of enterprises are getting engaged

Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg

Page 10: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

10

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Significant Big Data investments planned

Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg

Page 11: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

11

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

C-Level is now the largest supporter of Big Data efforts

Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg

Page 12: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

12

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Sponsorship Source of value Funding

Culture Measurement Trust

Expertise Data Platform

Strategy

Organization

Capabilities

Executive support and involvement Actions and decisions that generate value

Financial rigor in analytics funding process

Availability and use of data and analytics

Evaluating impact on business outcomes

Organizational confidence

Development and access to skills and capabilities

Data management practices Integrated capabilities delivered by hardware and software

Source: Analytics: A blueprint for value, IBM Institute for Business Value Study 2013

IBM identified Nine Levers which are the success factors for Big Data Analytics based on a study with 700 executives from 70 countries

Page 13: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

13

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

An integrated Digital Strategy enables corporate digital capabilities by integrating physical and

digital components based on aligned digitally empowered business processes across the

operations to meet customer and stakeholder expectations.

Trend: Many companies are redefining their digital strategy at the moment – For a vast majority, Big Data is an important part of it

Page 14: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

14

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014 Commerzbank AG | RDA | Workshop | June 16th, 2014

Trend: Many companies are appointing a Chief Data Officer - the new hero of Big Data Analytics

Chief Data Officers are the fastest growing new data role associated with big data and analytics

“By 2017, 50% of all companies

in regulated industries will have a Chief Data Officer.”

– Gartner

Page 15: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

15

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Trend: Many companies are investing into a Center of Excellence for Big Data and Analytics

Center of Excellence

Imperative: Make scarce resources available to the most relevant analytics enabled business initiatives

Organizational model: Center of Competence. Share resources, create synergies, faster knowledge sharing and learning

Roles: Digital Transformation Business Consultants, Data Scientists, Specialists for Data Integration – Governance – Quality

Page 16: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

16

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Still, there are many open questions remaining with regards to Big Data strategy for a majority of organizations

1. Strategy 2. Structure 3. Process 4. Capabilties

Key design questions:

What is the strategy for Big Data innovation? What is the scope?

How is it aligned to business and IT?

Which principles and KPIs should guide Big Data innovation management?

Key design questions:

Which organizational structure should be adopted for Big Data?

Should there be a Center of Excellence for Big Data Analytics?

What are the goals and responsibilities of the governance bodies?

How are they aligned with existing bodies?

Key design questions:

How should the Big Data innovation process be designed?

How are ideas created, selected, implemented and monitored?

How can Big Data resources be shared and optimally utilized throughout the full innovation lifecycle?

Key design questions:

What capabilities do we want to develop and in which order?

How can the existing technological capabilities be extended to a Big Data environment?

How can skills and expertise necessary for success be developed and retained?

Page 17: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

17

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

PART 2

What does it mean for data quality strategy?

“We make the world we live in and shape our own environment.” Orison Swett Marden

Page 18: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

18

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

So, what is new in terms of data management?

Traditional Data Management Big Data Analytics World

A large proportion of data come from outside

Focus on structured and unstructured data

Real-time analysis to improve the outcome

The goal is that analytics results are accurate

Database as moving target, quick cycles

Pay attention to „data flows“*

Business users conduct analysis themselves

All internal and external data sources are used to gain best insight in a given situation

*Source: Davenport 2012. How Big Data is Different. http://sloanreview.mit.edu/article/how-big-data-is-different/

Most data assets come from within company

Focus on structured data

Look at data to assess what occurred in past

The goal is that each single record is correct

Good database design requires years

Pay attention to „data stocks“*

Business users have to ask IT for analysis

There are clearly defined information requirements for each business process

Page 19: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

19

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data quality management – What´s new?

A large proportion of data come from outside

Data quality management has to include data coming from external sources

You cannot load in all the data and cleanse it, data quality checks have to be done on filtered data

Focus on structured and unstructured data

Data quality management has to consider unstructured data

Unstructured data has to be made structured and then traditional data quality checks can be applied

Page 20: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

20

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data quality management – What´s new?

Real-time analysis to improve the outcome

Data quality management needs to be executed in real-time

Streaming analytics can be used also to make data quality checks and interventions

The goal is that analytics results are accurate

Data quality management has to focus on the quality of the outcome

Hence, data quality management has to find new ways to ensure the quality of analytics output, which can include also traditional data quality checks

Page 21: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

21

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data quality management – What´s new?

Database as moving target, quick cycles

Data quality management needs to be able to react to constant changes in database and application design

A more agile and flexible approach to data quality management and data governance is required

Pay attention to „data flows“*

Data quality management has to manage information services rather than information products

The focus becomes less the data asset but rather the capability to produce the right data flow

Page 22: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

22

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data quality management – What´s new?

Business users conduct analysis themselves

Data quality management has to be able to advise business users with regards how trustworthy the raw data and analytics are

The roles and responsibilties need to be adapted accordingly, including the skills and capabilities of the DQ staff (analytics proficient, business fluent)

All internal and external data sources are used to gain best insight in a given situation

Data quality management has to ensure that silos are broken up and business users can use ALL data for analysis without

DQ manager is advocate of business users when it comes to providing access to data needed for analysis technically and organizationally

Page 23: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

23

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

There are four critical process points that need to be addressed by data quality management in a Big Data Analytics World

Capture

Integrate

Analyze

Visualize

There are four critical phases during which quality of Big Data has to be ensured, all of them are equally important:

1. First, when Big Data is captured from various sources.

2. Second, when Big Data is integrated with data from other sources (which can be both Big Data and traditional data).

3. Third, when analytics is applied to Big Data in form of statistics, data mining, aggregations and other types of analysis.

4. Fourth, when Big Data insights are visualized and communicated to decision makers.

At each phase, quality needs to be actively controlled and managed using different approaches.

Page 24: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

24

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

The goal of data quality management which is to ensure that information users can trust the information needs new methods

„Fitness for use of data“ as definition of data quality still holds true in a Big Data Analytics World, but what if the single data record becomes neglectable?

• If I use a customer address record as part of thousand of other adress records to understand customer behavior linked to a particular zipcode for my marketing campaign, the data quality isse might become completely neglectable for this type of analysis.

• This will not happen for direct operational data usage, e.g. if I have a wrong customer address and need to send a letter to my customer, a single wrong address leads to an unwanted outcome as the letter cannot reach its destination.

• There will be always a lot of traditional data usage in operational processes, where many things remain the same as before.

• Moreover, if I can enrich the adress data in real-time with external data from the Internet, the data quality problem can be resolved immediately.

Page 25: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

25

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Big Data Analytics might require management of data quality in different zones of trust

External data

Extended internal and supplier data

Insights

Data for operations

Page 26: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

26

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Chief Data Officer

The Chief Data Officers could become the new home for data quality management and data governance

Finding ways to use existing data assets to advance the cause of the organization

Data leverage

Finding new avenues of earnings and revenue opportunities outside existing processes and functions

Data monetization

Augmenting existing datasets through the combination of - fragmented internal data sources - the acquisition of external data from government feeds or social media sources - and the integration of a business partner’s data

Data enrichment

Managing the health of the data under governance

Data upkeep

Protect data as an asset

Data protection

Page 27: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

27

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Data quality managers should be part of every Big Data or Analytics team to ensure that analytics results meet customer requirements

Center of Excellence

Data quality managers should be an integral part of a new Center of Excellence that is built for Big Data and Analytics

Currently, data scientist often do their work without involving the data quality manager in a company

The quality of analytics is often not standardized

Page 28: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

28

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

PART 3

Summary and conclusions

“Hell, there are no rules here - we're trying to accomplish something.” Thomas A. Edison

Page 29: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

29

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014

Summary and conclusions

Data quality management in a Big Data Analytics World needs to...

... include data coming from external sources

... consider unstructured data

... be executed in real-time

... focus on the quality of the outcome

... be able to react to constant changes in database design

... manage information services rather than products

... be able to advise business users that conduct analytics

... ensure that business users can use ALL data

Key conclusions:

There are four critical process points that need to be addressed: Capture, Integrate, Analyze, Visualize

The goal of data quality management which is to ensure that information users can trust the information needs new methods

Big Data Analytics might require management of data quality in different zones of trust

The Chief Data Officers could become the new home for data quality management and data governance

Data quality managers should be part of every Big Data or Analytics team to ensure that analytics results meet customer requirements

Page 30: Data Quality Strategy in a Big Data Analytics World · 1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title Analytic capabilities

30

© 2014 IBM Corporation

IBM Strategy & Analytics

24 June 2014