kvalitetssikring av “big data” - qrn.no · 2019-07-31 · kvalitetssikring av “big data ......

38
SAFER, SMARTER, GREENER DNV GL © 1 Kvalitetssikring av “Big Data” Er organisasjonens data et virksomhetskritisk produksjonsmiddel? Per Myrseth Data Management Competence Centre (DMCC) DNV GL Digital Solutions

Upload: others

Post on 08-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

DNV GL © SAFER, SMARTER, GREENERDNV GL ©1

Kvalitetssikring av “Big Data”

Er organisasjonens data et virksomhetskritisk produksjonsmiddel?

Per MyrsethData Management Competence Centre (DMCC) DNV GL Digital Solutions

DNV GL ©

Main topics of today

▪ Data as an asset and production means, including Big Data

▪ Data management as part of general quality management

▪ Risk management of data dependent operations

3

DNV GL ©

Big data

4

Note:

Most project and use

cases are fine with

small data….

DNV GL ©

Digitalization: The relationship between processes, data and technology

5

Processes

DataTechnology

Business goals

Designed and managed to meet

People

DNV GL ©

Digitalisation roadmap and creating value from data

6

Processes

Data Technology

Business roles and goals

Processes

Data Technology

Business roles and goals

Current competitive landscape Changed and/or improved

• Technology

• Data

• Processes

• Competence & capacity

People People

Future competitive landscape

DNV GL ©

Context and trends => food for thoughts

7

DNV GL ©

Value creation from data and the risk of using bad data

8https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year

DNV GL ©

Intangible assets, investments and market value

9

https://ssrn.com/abstract=3009783

DNV GL ©

Need for TRUST in data and information

10

DNV GL ©

Creating value from data

Data/Info Management

Data

ingestio

n, s

tream

ing o

r

batc

hes

Data

quality

assessm

ent

Oth

er D

ata

Pre

para

tion

Data

cle

ansin

g

Data

harm

oniz

atio

n

Security and access control

Condition based

maintenance

Business Use-Cases

Analytics

Safety and Barrier

Mgmt.

Operational Efficiency

DNV GL ©

Data life cycle, when does data bring value

14

Value to business

Creation Several years later

Create the data today that bring

value to business tomorrow

DNV GL ©

v

c

lp q r s y z

Known good

Known insufficient

Known unavailable

Unidentified potential

Data we have versus data we need

15

v

c

lp q r s y z

e

ik

o

d

g

j

u

a

f

h

m nw x

b

Data relevant for the task

Importance of

contribution

Y(t) = at+bt+ct+ dt+et+ft+ gt+ht+ it+ jt+kt+ lt+ mt+nt+ot+pt+qt+ rt+st+ut+vt+wt+ xt+yt +zt

DNV GL ©

Having the right data at hand:a prerequisites for transformation, automation, and success with data science and new services

16

Low data quality

Few data sources

Data fit for purpose within cost and risk

Future way for working

High data quality

Every data source of your dreams

Current way for working

DNV GL ©

Explorative or use case driven value creation

17

Small data

Know which problem to solve

Big data

Do not know what problem to solve

Project AProject B

Project DProject C

Where to place projects A-D

• Incremental improvements

• High or low ROI risk

• Something totally new

• Easy to copy innovation

DNV GL ©

Understanding the data quality consequences of data in value chains

18

Merge/Integrated

Dataset D1

Dataset D2

Dataset D3

Dataset D123

Derived

Dataset D4

ORIGINAL

- SLA

- Cost

- Contract rights/IPR

- Liability

- Confidentiality

TRANSFORMED

- SLA

- Income

- Contract rights/IPR

- Liability

- Confidentiality

Lineage of data in a data flowSLA - Service Level Agreement

IPR - Intellectual Property Rights

DNV GL ©

Data value chain

http://www.dlineage.com/assets/img/articles/impact-analysis-4.jpg

DNV GL ©

‘Enterprise-Oriented’ Strategic Long Term Focus

‘Project-Oriented’ Short Term Focus

Many system vendorsProject focus

Personnel change

System IntegratorsQuick fix

Data management : long term and enterprise focus is needed

KPI’sDeadlines Pilots

Outside my scopeSmall budget

Standards, who cares…

Technology driven

Silos

Data management debt

Data management responsible

DNV GL ©

Food for thoughts => response to trends and challenges

21

DNV GL ©

Digitalization: The relationship between processes, data and technology

22

Processes

DataTechnology

Business goals

Designed and managed to meet

People

DNV GL ©

Data as an asset and business driver

▪ The primary driver for Data Management is

to enable organizations to get value from

their data assets, just as effective

management of financial and physical

assets enables organizations to get value

from those assets. – (Ref DAMA DMBOK 2.0)

▪ Information is recognized as an enterprise asset. Although

generally accepted accounting principles (GAAP) as yet do not

require the reporting of information assets on the balance

sheet, infonomics deems that organizations acknowledge that

information is more than merely a resource.

– (Ref: March 2017, Doug Laney of Gartner spoke on Infonomics)

23

DNV GL ©

Extended data quality assessment process

24

Define scope

Data quality assessment

Data usage risk assessment Fit for use within acceptable risk

Risk based data quality improvement

Organizational maturity assessment

Data exploration and profilingGet access to data, prepare, explore

• Define scope • Consequence dimensions• Perform risk assessment

• Root causes• Measures • Prioritize

• Intended use • Context

• Datasets & criticality• Data origin and path

• Data• IT Systems

• Processes • Organization

Continuous im

pro

vem

ent

cycle

• Identify and define req.• Prepare and configure • Perform assessment

• Organizational scope• Interview & review• Analyse and score

Sensor system assessment

• Clarify scope• Choose methodology• Analyse and score

Algorithm assessment

• Clarify scope• Choose methodology• Analyse and score

Extended

scope

DNV GL ©

Data Quality Assessment Framework in use

25

Production line for data,Sensor system, Operational & IT

TechnologyData Asset Data Use

Risk

The Organization

Data quality assessment activities

Organizational Maturity

Assessment

Data Quality Assessments

Continuous improvements

Sensor system

assessment

Algorithm assessment

DNV GL ©

Organizational Maturity Assessment - 2

27

Maturitylevel

GovernanceOrganization and

peopleProcesses Process Efficiency

Requirement definition

Metrics and dimensions

Architecture, tools and technologies

Data standards

LEVEL 5 -Optimized

Data management policies governs and drives improvements

Data management board oversees improvement activities

Processes for continuous improvement in place

Processes provides feed-back and feed-forward to support continuous improvement

Baseline established and improvements measured according to requirements

Metrics defines baseline to support continuous improvement

Tools support policy driven continuous improvement cycle

Standard compliance and domain models are subject to continuous improvement

LEVEL 4 –Managed

Policies defined in relation to business objectives

Skillset extended to include risk analysis of quality issues aligned with business objectives

Processes for impact analysis and risk mgmt. in place

Monitoring is performed across enterprise and published as KPI’s and trends

Requirements are linked to business impacts

Metrics are linked to business impacts and risk analysis

Tools are driven by business objectives and include support for root cause analysis and risk mgmt.

Standards are used actively to reduce risk for critical business operations

LEVEL 3 –Defined

Policies defined at enterprise level

Roles and required skills defined at enterprise level

Processes are defined and implemented consistently across enterprise

Defined metrics are monitored in advance of business impact

Requirements defined and communicated at enterprise level

Framework for metrics and dimensions defined at enterprise level

Architecture in place at enterprise level supporting full stack data management

Standards, domain models and semantics used at enterprise level

LEVEL 2 –Repeatable

Local initiatives address the requirement for policies

Locally defined roles and some basic skills

Best practices in place but not used consistently

Generic metrics are monitored at point of impact

Local initiatives define requirements

Metrics are reused locally in projects

Tools and technologies used consistently in selected projects

Industry standards and domain models used selectively across projects

LEVEL 1 -Initial

Only ad-hoc or temporal policies in place

No formally defined roles or skillset

Ad-hoc or reactive responses to quality issues

No baseline and no monitoring of quality issues

Re-engineering used to derive requirements

Project specific metrics

Tools are used ad-hoc per project

Ad-hoc and inconsistent use of standards

Objectives,

Policy, Culture, Awareness, Risks, Capabilities to handle DQ issues

Organization, roles, responsibilities, authority, skillsets

Structured and vetted ways of handling and preventing DQ issues

Measure, monitor and use metrics to mitigate DQ issues

DQ Requirements defined,communicated and acted upon

DQ metrics defined, setup, measured and monitored

DQ Tools forprocessing, analysing and correcting DQ issues with data assets

Use available standards, models,ontologies and taxonomies – a corporate «DQ language»

People and

processes

Definitions

and

requirements

Technology

and

standards

Target Line

short way to improvements

longer way to improvements

Objective is to use what

already has been developed

DNV GL ©

How to set up, monitor and improve your data management

▪ Data Management provides foundation for organizing data effectively

Required capability areas:

28

Governance

Organization

and people

Processes

Process

efficiencyMetrics &

dimensions

Requirement

definition

Architecture, tools &

technologies

Data

standards

DNV GL ©

Data Quality Engine

30

Production line for data,Sensor system, Operational & IT

TechnologyData Asset Data Use

The Organization

DNV GL Runtime services

Data Quality Engine

Quality status &

cleansed data

Data Quality rules

Data Cleaning Engine

Data cleaning

rules

DNV GL ©

Data quality metrics – accurate and complete data

▪ Data types and format

▪ Data quality flags

▪ Collection frequency (missing records/duplicate

records)

▪ Value distribution according to valid range

▪ Redundancy for reference data – correlations

▪ Rate of change

▪ Outliers

▪ Identification tags – traceability

▪ Missing values

▪ Sensor drift

▪ Consistency

▪ Code lists

▪ Event logs

31

Min

Max

Outside range

Time collision

RoC

Missing data

t

v

DNV GL ©

Data quality and cleansing process

32

RAW

DATA

CLEANSED

DATA

CHANGE

LOG

DATA QUALITY

DATA CLEANSING

Access raw

data

Use data quality

rules to determine

if data need

cleansing, reuse

rules from data

quality

assessment

Clean data according to

predefined activities

- Replace values

(default/interpolation)

- Insert new values

- Delete values

- Tag values

- …

Store cleansed data

Record all changes

applied to raw data

- What changed

- Why

- By whom

- When

- …

Data quality engine

Data quality rules

Data cleansing

rules

DNV GL ©

DATA USAGE RISK BOW TIE

33

DNV GL ©

Extended data quality assessment process

34

Define scope

Data quality assessment

Data usage risk assessment Fit for use within acceptable risk

Risk based data quality improvement

Organizational maturity assessment

Data exploration and profilingGet access to data, prepare, explore

• Define scope • Consequence dimensions• Perform risk assessment

• Root causes• Measures • Prioritize

• Intended use • Context

• Datasets & criticality• Data origin and path

• Data• IT Systems

• Processes • Organization

Continuous im

pro

vem

ent

cycle

• Identify and define req.• Prepare and configure • Perform assessment

• Organizational scope• Interview & review• Analyse and score

Sensor system assessment

• Clarify scope• Choose methodology• Analyse and score

Algorithm assessment

• Clarify scope• Choose methodology• Analyse and score

Extended

scope

DNV GL ©

Data storage architecture, where

35

Sensor Operational

systemIT-systems

On Asset/plant On premises

Cloud

Partner/ third party

DNV GL ©

Production line and

main process

Supporting process:

Data management and

data quality monitoring

Business activities and data relationship

36

Data

Business

activity 1Business

activity 2

Business

activity 3

Create Use Create Use Create

Data quality

monitoring

Data issue

handling

DNV GL ©

Data value chain and interoperability layers

37

ICT and communication

Data and semantics

Business model and process

Legal and contractual

37

ICT and communication

Data and semantics

Business model and process

Legal and contractual

Enterprise 1 Enterprise 2

Value Chain/

Value network

DNV GL ©

Data quality and data cleansing on top of Veracity

38

Data capture

and handling

Data

transfer Data

access

Data quality

assessment

Data

cleansing

Data quality

acceptanceData

transfer

Data

accessData use and

analytics

P2PData quality

improvements

Data continuous

improvements

Customer site DNV GL Provider on Veracity

DNV GL ©

Recommended practice, guides andmarket material

39

SELF-ASSESSMENT

DNV GL ©

Way forward

40

DNV GL ©

Where to find us?

External access information:

Feature article on DNVGL.com front page, on treating data as asset

https://www.dnvgl.com/feature/data-quality.html

Guides and best practices

Recommended practice:

Data quality assessment framework, DNV GL RP-0497

https://rules.dnvgl.com/docs/pdf/DNVGL/RP/2017-01/DNVGL-RP-0497.pdf

Guide:

Creating Value from Data in Shipping

https://www.dnvgl.com/maritime/Creating-Value-from-Data-in-Shipping/index.html

41

DNV GL ©

Where to find us on Veracity

Organizational Maturity Assessment (Advisory) (based on ISO 8000 Data Quality)

https://store.veracity.com/organizational-maturity-assessment/product/24348a37-fa51-4fec-b36e-06c4d9c38615

Data Quality Assessment (Advisory): (based on ISO 8000 Data Quality)

https://store.veracity.com/data-quality-assessment/product/658d3000-51b0-424a-b4a6-0909a22b164a

Data Quality as a Service (Software as a service): (based on ISO 8000 Data Quality)

https://store.veracity.com/data-quality-as-a-service/product/1063bf6f-a4dd-4de1-a072-be50f7c79b81

Data Risk Assessment (Advisory): (based on ISO 31000 Risk Management)

https://store.veracity.com/data-risk-assessment/product/550f312e-5493-4cd4-a818-7af2145653ae

GDPR - Personal Data Risk Evaluation (Advisory) (based on ISO 31000)

https://store.veracity.com/gdpr-personal-data-risk-evaluation/product/81ceca7b-80c1-4c70-a2a4-4d37e59c81ee

Data management maturity self-assessment:

https://store.veracity.com/data-management-maturity-self-assessment/product/f83f117e-0338-4729-9dfe-e7aabbe225b8

42

DNV GL ©

SAFER, SMARTER, GREENER

www.dnvgl.com

The trademarks DNV GL®, DNV®, the Horizon Graphic and Det Norske Veritas®

are the properties of companies in the Det Norske Veritas group. All rights reserved.

Data is the new business enabler

43

Jørgen StangChief Specialist

[email protected]

Per MyrsethChief Specialist

Head of Section [email protected]

Jarl S. MagnussonChief Specialist

[email protected]

Simen SandelienPrincipal Engineer

[email protected]

Karl John PedersenPrincipal Specialist

[email protected]