data quality strategy in a big data analytics world · 1) analytics: the widening divide, ibm study...
TRANSCRIPT
1
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data Quality Strategy in a Big Data Analytics World Conference Session Information and Data Quality Summit 2014 Dr. Alexander Borek IBM Center of Competence for Advanced Analytics
IBM Strategy & Analytics
@bigdatarisk
2
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Cause we are living in a Big Data Analytics World...
Big Data is no longer an abstract concept. It is today’s reality companies compete in.
3
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Hence, these are the questions I would like to explore with you today
1. How can we characterize the new Big Data Analytics World companies compete in?
2. What does it mean for data quality strategy? – What will change, what stays the same? – Which parts of data quality management
need companies to reconsider?
A word of warning: This talk is not about Hadoop, Mongo DB and other Big Data technologies.
4
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
PART 1
How can we characterize the new Big Data Analytics World companies compete in?
“In the business world, the rearview mirror is always clearer than the windshield.” Warren Buffet
5
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Biggest change in 2014: Forget about the 3 V´s of Big Data
3 V´s of Big Data Volume, Variety, Velocity
This is how talk on Big Data used to begin
3 M´s of Big Data Make Me Money
The discussion has shifted towards how to make tangible and sustainable profit from (Big) data
6
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Big Data Analytics: Most important enabler for business innovation
1) Analytics: The Widening Divide, IBM Study 2) Quote: Ann Winblad 3) Quote: McKinsey study of the same title
Analytic capabilities are growing the divide of competitive advantage1
33% higher revenue, growth, 12x more profit growth and 32% ROIC
Data is the new oil2
Exponentially growing data from sensors and social media
Big Data is the next frontier of innovation, competition and productivity3
Transformative potential in almost every industry—hundreds of billion$
Business Perspective
7
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Big Data: A technology-driven innovation trend
New paradigms for managing and processing data at scale NoSQL, NewSQL, and the CAP theorem, Hadoop, in-memory, column-oriented, ...
New economics for managing and processing data at scale Exponentially decreasing cost of storage, memory and computing power
Vast field of technical innovation Database theory, internet giants, open source and venture capital
Technical Perspective
8
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Companies are progressing in their big data adoption
8% 20% 18% 19% 35 %
Doing nothing
Knowledge gathering
Developing Strategy
Piloting Experimenting
Deployed Investments
Source: Gartner, Big Data Adoption in 2013 Shows Substance Behind the Hype, 12 Sep 2013
9
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Growing number of enterprises are getting engaged
Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg
10
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Significant Big Data investments planned
Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg
11
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
C-Level is now the largest supporter of Big Data efforts
Source: http://b-i.forbesimg.com/louiscolumbus/files/2014/01/big-data-investments.jpg
12
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Sponsorship Source of value Funding
Culture Measurement Trust
Expertise Data Platform
Strategy
Organization
Capabilities
Executive support and involvement Actions and decisions that generate value
Financial rigor in analytics funding process
Availability and use of data and analytics
Evaluating impact on business outcomes
Organizational confidence
Development and access to skills and capabilities
Data management practices Integrated capabilities delivered by hardware and software
Source: Analytics: A blueprint for value, IBM Institute for Business Value Study 2013
IBM identified Nine Levers which are the success factors for Big Data Analytics based on a study with 700 executives from 70 countries
13
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
An integrated Digital Strategy enables corporate digital capabilities by integrating physical and
digital components based on aligned digitally empowered business processes across the
operations to meet customer and stakeholder expectations.
Trend: Many companies are redefining their digital strategy at the moment – For a vast majority, Big Data is an important part of it
14
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014 Commerzbank AG | RDA | Workshop | June 16th, 2014
Trend: Many companies are appointing a Chief Data Officer - the new hero of Big Data Analytics
Chief Data Officers are the fastest growing new data role associated with big data and analytics
“By 2017, 50% of all companies
in regulated industries will have a Chief Data Officer.”
– Gartner
15
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Trend: Many companies are investing into a Center of Excellence for Big Data and Analytics
Center of Excellence
Imperative: Make scarce resources available to the most relevant analytics enabled business initiatives
Organizational model: Center of Competence. Share resources, create synergies, faster knowledge sharing and learning
Roles: Digital Transformation Business Consultants, Data Scientists, Specialists for Data Integration – Governance – Quality
16
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Still, there are many open questions remaining with regards to Big Data strategy for a majority of organizations
1. Strategy 2. Structure 3. Process 4. Capabilties
Key design questions:
What is the strategy for Big Data innovation? What is the scope?
How is it aligned to business and IT?
Which principles and KPIs should guide Big Data innovation management?
Key design questions:
Which organizational structure should be adopted for Big Data?
Should there be a Center of Excellence for Big Data Analytics?
What are the goals and responsibilities of the governance bodies?
How are they aligned with existing bodies?
Key design questions:
How should the Big Data innovation process be designed?
How are ideas created, selected, implemented and monitored?
How can Big Data resources be shared and optimally utilized throughout the full innovation lifecycle?
Key design questions:
What capabilities do we want to develop and in which order?
How can the existing technological capabilities be extended to a Big Data environment?
How can skills and expertise necessary for success be developed and retained?
17
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
PART 2
What does it mean for data quality strategy?
“We make the world we live in and shape our own environment.” Orison Swett Marden
18
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
So, what is new in terms of data management?
Traditional Data Management Big Data Analytics World
A large proportion of data come from outside
Focus on structured and unstructured data
Real-time analysis to improve the outcome
The goal is that analytics results are accurate
Database as moving target, quick cycles
Pay attention to „data flows“*
Business users conduct analysis themselves
All internal and external data sources are used to gain best insight in a given situation
*Source: Davenport 2012. How Big Data is Different. http://sloanreview.mit.edu/article/how-big-data-is-different/
Most data assets come from within company
Focus on structured data
Look at data to assess what occurred in past
The goal is that each single record is correct
Good database design requires years
Pay attention to „data stocks“*
Business users have to ask IT for analysis
There are clearly defined information requirements for each business process
19
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data quality management – What´s new?
A large proportion of data come from outside
Data quality management has to include data coming from external sources
You cannot load in all the data and cleanse it, data quality checks have to be done on filtered data
Focus on structured and unstructured data
Data quality management has to consider unstructured data
Unstructured data has to be made structured and then traditional data quality checks can be applied
20
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data quality management – What´s new?
Real-time analysis to improve the outcome
Data quality management needs to be executed in real-time
Streaming analytics can be used also to make data quality checks and interventions
The goal is that analytics results are accurate
Data quality management has to focus on the quality of the outcome
Hence, data quality management has to find new ways to ensure the quality of analytics output, which can include also traditional data quality checks
21
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data quality management – What´s new?
Database as moving target, quick cycles
Data quality management needs to be able to react to constant changes in database and application design
A more agile and flexible approach to data quality management and data governance is required
Pay attention to „data flows“*
Data quality management has to manage information services rather than information products
The focus becomes less the data asset but rather the capability to produce the right data flow
22
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data quality management – What´s new?
Business users conduct analysis themselves
Data quality management has to be able to advise business users with regards how trustworthy the raw data and analytics are
The roles and responsibilties need to be adapted accordingly, including the skills and capabilities of the DQ staff (analytics proficient, business fluent)
All internal and external data sources are used to gain best insight in a given situation
Data quality management has to ensure that silos are broken up and business users can use ALL data for analysis without
DQ manager is advocate of business users when it comes to providing access to data needed for analysis technically and organizationally
23
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
There are four critical process points that need to be addressed by data quality management in a Big Data Analytics World
Capture
Integrate
Analyze
Visualize
There are four critical phases during which quality of Big Data has to be ensured, all of them are equally important:
1. First, when Big Data is captured from various sources.
2. Second, when Big Data is integrated with data from other sources (which can be both Big Data and traditional data).
3. Third, when analytics is applied to Big Data in form of statistics, data mining, aggregations and other types of analysis.
4. Fourth, when Big Data insights are visualized and communicated to decision makers.
At each phase, quality needs to be actively controlled and managed using different approaches.
24
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
The goal of data quality management which is to ensure that information users can trust the information needs new methods
„Fitness for use of data“ as definition of data quality still holds true in a Big Data Analytics World, but what if the single data record becomes neglectable?
• If I use a customer address record as part of thousand of other adress records to understand customer behavior linked to a particular zipcode for my marketing campaign, the data quality isse might become completely neglectable for this type of analysis.
• This will not happen for direct operational data usage, e.g. if I have a wrong customer address and need to send a letter to my customer, a single wrong address leads to an unwanted outcome as the letter cannot reach its destination.
• There will be always a lot of traditional data usage in operational processes, where many things remain the same as before.
• Moreover, if I can enrich the adress data in real-time with external data from the Internet, the data quality problem can be resolved immediately.
25
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Big Data Analytics might require management of data quality in different zones of trust
External data
Extended internal and supplier data
Insights
Data for operations
26
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Chief Data Officer
The Chief Data Officers could become the new home for data quality management and data governance
Finding ways to use existing data assets to advance the cause of the organization
Data leverage
Finding new avenues of earnings and revenue opportunities outside existing processes and functions
Data monetization
Augmenting existing datasets through the combination of - fragmented internal data sources - the acquisition of external data from government feeds or social media sources - and the integration of a business partner’s data
Data enrichment
Managing the health of the data under governance
Data upkeep
Protect data as an asset
Data protection
27
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Data quality managers should be part of every Big Data or Analytics team to ensure that analytics results meet customer requirements
Center of Excellence
Data quality managers should be an integral part of a new Center of Excellence that is built for Big Data and Analytics
Currently, data scientist often do their work without involving the data quality manager in a company
The quality of analytics is often not standardized
28
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
PART 3
Summary and conclusions
“Hell, there are no rules here - we're trying to accomplish something.” Thomas A. Edison
29
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014
Summary and conclusions
Data quality management in a Big Data Analytics World needs to...
... include data coming from external sources
... consider unstructured data
... be executed in real-time
... focus on the quality of the outcome
... be able to react to constant changes in database design
... manage information services rather than products
... be able to advise business users that conduct analytics
... ensure that business users can use ALL data
Key conclusions:
There are four critical process points that need to be addressed: Capture, Integrate, Analyze, Visualize
The goal of data quality management which is to ensure that information users can trust the information needs new methods
Big Data Analytics might require management of data quality in different zones of trust
The Chief Data Officers could become the new home for data quality management and data governance
Data quality managers should be part of every Big Data or Analytics team to ensure that analytics results meet customer requirements
30
© 2014 IBM Corporation
IBM Strategy & Analytics
24 June 2014