thomas janssen - challenges in the data integration space in a modern world - futuredata2017

28
MIND THE GAP CHALLENGES IN THE DATA INTEGRATION SPACE IN A MODERN WORLD With advent of Big Data, and with the Data Mining and Machine Learning space maturing rapidly, the appetite for analytical data is growing exponentially. However, to be able to leverage this wealth of information a robust approach to data management is crucial.

Upload: cebit-australia

Post on 22-Jan-2018

442 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

0

analytics8.com.au

MIND THE GAPCHALLENGES IN THE DATA INTEGRATION SPACE IN A MODERN WORLD

With advent of Big Data, and with the Data Mining and Machine Learning space maturing rapidly, the appetite for analytical data is growing exponentially. However, to be able to leverage this wealth of information a robust approach to data management is crucial.

Page 2: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

2

analytics8.com.au

SYDNEY – MELBOURNE – LONDON – CHICAGO – NY – SAN FRANCISCO – RALEIGH – DALLAS – MADISON – DENVER

We enable our customers to better knowand understand their business, theircustomers and their markets through dataand analytics.

Thomas Janssen,

Data Integration and Solution Architecture Specialist

Thomas has extensiveexperience as both a solutionarchitect and as a dataintegration specialist. Overhis 15+ year career he hasbeen exposed to all facets ofdata management, from DataWarehousing to Master Data

Management, Data Quality, and Big Data. AsAnalytics8’s Data & Integration practicemanagers, he works with a team of experiencedspecialists to help organisations derive more valuefrom their data. Having been with A8 since 2008,he specializes in a number of industries includingHealth Care, Insurance, Financial Services andHigher Education.

Introductions

Tel +61 3 9670 9180 | Mob +61 403 113 305| [email protected]

Page 3: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

3

analytics8.com.au

Canned Reports

Ad-hoc Reports

Query Drill-down

Statistical Analysis

Forecast

Predict

Alerts

Raw Data

CleanData

Optimise

Analytical Maturity

Com

petit

ive

Adva

ntag

e

Business Intelligence

Analytics

Influence - What is the best that can happen?

Specific Choices - What will happen next?

Quantify - What if these trends continue?

Insight - Why is this happening?

What actions are needed?

Where exactly is the problem?

How many, how often, where?

What happened?

Machine Learning

Automate - autonomously discover insights

Page 4: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

4

analytics8.com.au

What does Advanced Analytics mean for my organisation?

How do I progress my Advanced Analytics initiative beyond the prototype phase?

Page 5: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

5

analytics8.com.au

SETTING

THE STAGE

INTRODUCINGNPS

MEDICINEWISEACCELER8

KEYS TO BEST PRACTICE

DATA MANAGEMENTCONCLUSIONS

Session Agenda

Page 6: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

Becoming the premier custodian for Australia’s general practice clinical data

»An estimated 95% of all health care is deliveredin General Practice

»Patient data is managed in a highly distributedmanner by over 7,000 independent GPsurgeries

» Privacy, security and consent are majorconcerns

»Collecting primary care data for analyticalpurposes is, in short, no picnic - but NPS set outto achieve just that.

»NPS assembled a repository of primary care datato assist general practices with understanding theirclinical activity, how conditions are diagnosed, andhow they are treated

»NPS needed a partner who could not only helpwith designing and developing the datawarehouse platform necessary to collect this data,but that understood the very specific challengeswith managing and interpreting clinical data

»They selected Analytics8

Page 7: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

Making a difference in Australian Healthcare

“How often do you prescribe antibiotics for upper respiratory tract infections that do not routinely require antibiotic therapy?”

“With your help, how could antibiotic use in Australia change?”

If each GP prescribed on average:• 2 fewer scripts per week, the national reduction would be in the order of 3,100,000 • 3 fewer scripts per week, this would equal approximately 4,700,000 fewer prescriptions

Page 8: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

OVER 2 MILLION PATIENTS….…TREATED IN OVER 600 PRACTICES…

…whose data is entered manually in half a dozen types of software…

..

…interpreted by clinical analysts…

and analyzed by epidemiologists

… and collected & stored by NPS in a single data hub…

…which is anonymized & transmitted through multiple intermediaries…

Page 9: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

» The Data Warehouse is transformative for NPS,exposing data faster, cleaner & more transparently

» The data warehouse will allow NPS to quadruple thenumber of clinical partnerships and to engage withnew consumers of health information

» This was a process with clear benefits not just toNPS itself but (potentially) to health care in Australia

» A robust layer of automated data quality andreconciliation reports, driven by Acceler8 providesNPS with the ability to drive better data ownershipinitiatives towards clinical practitioners, improvingthe way clinical data of Australians is managednationwide.

The solution: Benefits…

» Analytics8 using Acceler8™ were chosen for theData Warehouse implementation

» Based on a Scrum project delivery approach

» Data Vault modelling was used

» Automatic generation of the (ETL) data processinglayer using Acceler8™

» Enabled faster delivery, allowing NPS team to focuson implementation of complex transformation,interpretation and quality cleansing rules

» Fast innovation and exploration were to core designprinciples.

Page 10: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

10

analytics8.com.au

Store design specs and process execution logs alongside the data

it creates

Metadata Driven

Standardisation of code ensures consistency and best practice data processing

Pattern-Based Design

Fully supported and continuously updated with additional functionality

Support and Maintenance

Out of the box end-to-end data lineage, operational logs, and automated reconciliation

Fully Auditable

Application to manage process generation and operational control

ApplicationA holistic end-to-end solution architecture supporting a model-driven implementation

Solution Architecture

Including Standards, Templates and Conventions

Documentation

Automatically generated code at all stages of the SDLC to

dramatically increase turn-around

Automation Templates

ACCELER8 – A DATA FOUNDATION FRAMEWORK

Page 11: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

11

analytics8.com.au

FACECHALLENGES WE

Page 12: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

12

analytics8.com.au

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

ALGORITHMS MODELED OUTPUT

Advanced Analytics

DATA INPUT

DATALAKE

Page 13: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

13

analytics8.com.au

Notable because:• Moderate-to-high volumes of data;• Sparse Information Density;• Internally & externally sourced;• Disparate data structures;• Disparate data behaviour;• Poor Data Quality.

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

Ƒ(x)

ALGORITHMS MODELED OUTPUT

‘Enterprise’ Analytics

DATA

CLEANSE

INTEGRATE

Page 14: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

14

analytics8.com.au CrowdFlower 2016 Data Scientist report

Page 15: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

15

analytics8.com.au

Advanced Analytics as a Strategic Asset

Page 16: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

16

analytics8.com.au

Large organisations are incredibly complex ecosystems that generate tons of data, but not necessarily in a way that is easily accessible or interpretable.

A fractured IT landscape is the standard, not the exception;• Organisations are managed through separate - but overlapping - systems• Separate business units & subsidiaries collect data independently• Systems have a finite lifespan within the organisation

To achieve the necessary depth and breadth of data to feed the analytics learning cycle requires data from more than one system.

Each data source will come with independent data challenges.

Creating a holistic view of data requires data integration across many individual data sets.

Lessons Learned

OF THE TOTAL EFFORT TO DEVELOP MACHINE LEARNING MODELS, 80% WILL BE SPENT ON DATA PREPARATION

Page 17: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

ACHIEVETHE GOALS TO

INSIGHT

Page 18: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

18

analytics8.com.au

What we need:1. The freedom for data scientists to focus on what they do best;2. The ability to prepare once, reuse & modify many times;3. Data that reflects the current state of the organisation;4. The agility to very quickly add more data, add interpretations;5. The scalability to increase scope, volume, and velocity.

This requires a single platform to designed collect, merge and prepare data, with a holistic design that is optimised for integration.

Our aims

Page 19: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

19

analytics8.com.au

Up to 80% of effort spent is skewed towards collecting and preparing data.

SKEWED TOWARDS DATA

Data preparation as a separate function that supports analytics

SKILLSETCollecting data for each analytics initiative separately does not scale

BESPOKE COLLECTION

Key goal: a re-focus of effort

60%Data Process Development

19%Sourcing Data Sets

13%Mining Data for Patterns

5%Designing a Solution

79%

Page 20: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

20

analytics8.com.au

The focus for data scientists should be on delivering value

SKEWED TOWARDS INSIGH

Data collection and integration requires a significantly different skillset.

SKILLSETUse pattern-based development and automate, wherever possible

STANDARDIZED COLLECTION

Key goal: a re-focus of effort

20%Data Process Development

19%Sourcing Data Sets

60%Mining Data for Patterns

5%Designing a Solution

25%

Page 21: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

21

analytics8.com.au

SUCCESSBUILDING BLOCKS OF

Page 22: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

Towards Agility in ImplementationIM Agility Challenges

Stakeholders require greater flexibility, acknowledging that both their organization (its processes, its technologies, its goals) and the industry in which it operates change constantly.

Stakeholders look for greater transparency, a greater level trust in the information they are being provided with as well as in the care with which this information is managed.

Stakeholders demand faster turn-around cycles, delivering value with greater regularity and at reduced cost.

Page 23: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

23

analytics8.com.au

Complex operational processes lead to complex data sets which

in turn leads to a data preparation challenge. It is tempting to design

solutions that are optimized for each individual challenge. This is,

however, a false economy. Layered complexity

Consistency & RepeatabilityConsistency of design leads to repeatability, leads to reduced cost, faster construction and higher quality.

Atomicity of DesignStandardise design artefacts to a limited number of entity types. One component, one purpose.

Scalability is KeyA modular architecture of low complexity components ensures there is no built-in glass ceiling to data processing – in terms of Volume, Variety and Velocity.

Late BindingDesign for change. Separate data processing from data interpretation.

Modular Design

Page 24: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

24

analytics8.com.au

Increased development effort = increased costReduced ability to respond to changeAbsence or inconsistent implementation of standard processingData quality issues

BESPOKE DEVELOPMENTBuild once – use many timesAgree on a limited number of standardized design patternsDevelop automated, repeatable packages that encapsulate this behaviourReplace individual data processes with parameterized ‘calls’ to these packages

DESIGN PATTERN APPROACH

“Nothing is particularly hard if you divide it into small jobs.”

Henry Ford

Pattern Based Development

Page 25: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

25

analytics8.com.au

Metadata Driven QA

Page 26: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

26

analytics8.com.au

CONCLUSIONWHERE DOES THIS LEAVE US?

The ability to use advanced analytics to better understand one’s organisation adds a powerful tool to the organisation’s information arsenal. To be able to leverage the

it as a strategic asset however will require a robust approach to data preparation.

Page 27: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

27

analytics8.com.au

ConclusionsMACHINE LEARNING

One-off PoC model or continuous use?

Scope for more beyond model #1?

More than one data source / integration required?

Complex data manipulation /

cleansing?

ONE’S ABILITY TO PREPARE DATA IN AN EFFICIENT &

SUSTAINABLE MANNER WILL CONSIDERABLY IMPACT YOUR

CHANCE OF SUCCESS.

YES

YES

YES

Continuous

ADVANCED ANALYTICS

To leverage Advanced Analytics the question is not if adata integration platform is required, but how.

Separate data integration from data science.

Standardisation and automation are key success factors.

Build with longevity in mind. Tight deadlines lead totunnel vision on the immediate goal.

Page 28: Thomas Janssen - Challenges in the data integration space in a modern world - FutureData2017

Knowledge, Experience, Insight, Success