thomas janssen - challenges in the data integration space in a modern world - futuredata2017
TRANSCRIPT
0
analytics8.com.au
MIND THE GAPCHALLENGES IN THE DATA INTEGRATION SPACE IN A MODERN WORLD
With advent of Big Data, and with the Data Mining and Machine Learning space maturing rapidly, the appetite for analytical data is growing exponentially. However, to be able to leverage this wealth of information a robust approach to data management is crucial.
2
analytics8.com.au
SYDNEY – MELBOURNE – LONDON – CHICAGO – NY – SAN FRANCISCO – RALEIGH – DALLAS – MADISON – DENVER
We enable our customers to better knowand understand their business, theircustomers and their markets through dataand analytics.
Thomas Janssen,
Data Integration and Solution Architecture Specialist
Thomas has extensiveexperience as both a solutionarchitect and as a dataintegration specialist. Overhis 15+ year career he hasbeen exposed to all facets ofdata management, from DataWarehousing to Master Data
Management, Data Quality, and Big Data. AsAnalytics8’s Data & Integration practicemanagers, he works with a team of experiencedspecialists to help organisations derive more valuefrom their data. Having been with A8 since 2008,he specializes in a number of industries includingHealth Care, Insurance, Financial Services andHigher Education.
Introductions
Tel +61 3 9670 9180 | Mob +61 403 113 305| [email protected]
3
analytics8.com.au
Canned Reports
Ad-hoc Reports
Query Drill-down
Statistical Analysis
Forecast
Predict
Alerts
Raw Data
CleanData
Optimise
Analytical Maturity
Com
petit
ive
Adva
ntag
e
Business Intelligence
Analytics
Influence - What is the best that can happen?
Specific Choices - What will happen next?
Quantify - What if these trends continue?
Insight - Why is this happening?
What actions are needed?
Where exactly is the problem?
How many, how often, where?
What happened?
Machine Learning
Automate - autonomously discover insights
4
analytics8.com.au
What does Advanced Analytics mean for my organisation?
How do I progress my Advanced Analytics initiative beyond the prototype phase?
5
analytics8.com.au
SETTING
THE STAGE
INTRODUCINGNPS
MEDICINEWISEACCELER8
KEYS TO BEST PRACTICE
DATA MANAGEMENTCONCLUSIONS
Session Agenda
Becoming the premier custodian for Australia’s general practice clinical data
»An estimated 95% of all health care is deliveredin General Practice
»Patient data is managed in a highly distributedmanner by over 7,000 independent GPsurgeries
» Privacy, security and consent are majorconcerns
»Collecting primary care data for analyticalpurposes is, in short, no picnic - but NPS set outto achieve just that.
»NPS assembled a repository of primary care datato assist general practices with understanding theirclinical activity, how conditions are diagnosed, andhow they are treated
»NPS needed a partner who could not only helpwith designing and developing the datawarehouse platform necessary to collect this data,but that understood the very specific challengeswith managing and interpreting clinical data
»They selected Analytics8
Making a difference in Australian Healthcare
“How often do you prescribe antibiotics for upper respiratory tract infections that do not routinely require antibiotic therapy?”
“With your help, how could antibiotic use in Australia change?”
If each GP prescribed on average:• 2 fewer scripts per week, the national reduction would be in the order of 3,100,000 • 3 fewer scripts per week, this would equal approximately 4,700,000 fewer prescriptions
OVER 2 MILLION PATIENTS….…TREATED IN OVER 600 PRACTICES…
…whose data is entered manually in half a dozen types of software…
..
…interpreted by clinical analysts…
and analyzed by epidemiologists
… and collected & stored by NPS in a single data hub…
…which is anonymized & transmitted through multiple intermediaries…
» The Data Warehouse is transformative for NPS,exposing data faster, cleaner & more transparently
» The data warehouse will allow NPS to quadruple thenumber of clinical partnerships and to engage withnew consumers of health information
» This was a process with clear benefits not just toNPS itself but (potentially) to health care in Australia
» A robust layer of automated data quality andreconciliation reports, driven by Acceler8 providesNPS with the ability to drive better data ownershipinitiatives towards clinical practitioners, improvingthe way clinical data of Australians is managednationwide.
The solution: Benefits…
» Analytics8 using Acceler8™ were chosen for theData Warehouse implementation
» Based on a Scrum project delivery approach
» Data Vault modelling was used
» Automatic generation of the (ETL) data processinglayer using Acceler8™
» Enabled faster delivery, allowing NPS team to focuson implementation of complex transformation,interpretation and quality cleansing rules
» Fast innovation and exploration were to core designprinciples.
10
analytics8.com.au
Store design specs and process execution logs alongside the data
it creates
Metadata Driven
Standardisation of code ensures consistency and best practice data processing
Pattern-Based Design
Fully supported and continuously updated with additional functionality
Support and Maintenance
Out of the box end-to-end data lineage, operational logs, and automated reconciliation
Fully Auditable
Application to manage process generation and operational control
ApplicationA holistic end-to-end solution architecture supporting a model-driven implementation
Solution Architecture
Including Standards, Templates and Conventions
Documentation
Automatically generated code at all stages of the SDLC to
dramatically increase turn-around
Automation Templates
ACCELER8 – A DATA FOUNDATION FRAMEWORK
11
analytics8.com.au
FACECHALLENGES WE
12
analytics8.com.au
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
ALGORITHMS MODELED OUTPUT
Advanced Analytics
DATA INPUT
DATALAKE
13
analytics8.com.au
Notable because:• Moderate-to-high volumes of data;• Sparse Information Density;• Internally & externally sourced;• Disparate data structures;• Disparate data behaviour;• Poor Data Quality.
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
Ƒ(x)
ALGORITHMS MODELED OUTPUT
‘Enterprise’ Analytics
DATA
CLEANSE
INTEGRATE
14
analytics8.com.au CrowdFlower 2016 Data Scientist report
15
analytics8.com.au
Advanced Analytics as a Strategic Asset
16
analytics8.com.au
Large organisations are incredibly complex ecosystems that generate tons of data, but not necessarily in a way that is easily accessible or interpretable.
A fractured IT landscape is the standard, not the exception;• Organisations are managed through separate - but overlapping - systems• Separate business units & subsidiaries collect data independently• Systems have a finite lifespan within the organisation
To achieve the necessary depth and breadth of data to feed the analytics learning cycle requires data from more than one system.
Each data source will come with independent data challenges.
Creating a holistic view of data requires data integration across many individual data sets.
Lessons Learned
OF THE TOTAL EFFORT TO DEVELOP MACHINE LEARNING MODELS, 80% WILL BE SPENT ON DATA PREPARATION
ACHIEVETHE GOALS TO
INSIGHT
18
analytics8.com.au
What we need:1. The freedom for data scientists to focus on what they do best;2. The ability to prepare once, reuse & modify many times;3. Data that reflects the current state of the organisation;4. The agility to very quickly add more data, add interpretations;5. The scalability to increase scope, volume, and velocity.
This requires a single platform to designed collect, merge and prepare data, with a holistic design that is optimised for integration.
Our aims
19
analytics8.com.au
Up to 80% of effort spent is skewed towards collecting and preparing data.
SKEWED TOWARDS DATA
Data preparation as a separate function that supports analytics
SKILLSETCollecting data for each analytics initiative separately does not scale
BESPOKE COLLECTION
Key goal: a re-focus of effort
60%Data Process Development
19%Sourcing Data Sets
13%Mining Data for Patterns
5%Designing a Solution
79%
20
analytics8.com.au
The focus for data scientists should be on delivering value
SKEWED TOWARDS INSIGH
Data collection and integration requires a significantly different skillset.
SKILLSETUse pattern-based development and automate, wherever possible
STANDARDIZED COLLECTION
Key goal: a re-focus of effort
20%Data Process Development
19%Sourcing Data Sets
60%Mining Data for Patterns
5%Designing a Solution
25%
21
analytics8.com.au
SUCCESSBUILDING BLOCKS OF
Towards Agility in ImplementationIM Agility Challenges
Stakeholders require greater flexibility, acknowledging that both their organization (its processes, its technologies, its goals) and the industry in which it operates change constantly.
Stakeholders look for greater transparency, a greater level trust in the information they are being provided with as well as in the care with which this information is managed.
Stakeholders demand faster turn-around cycles, delivering value with greater regularity and at reduced cost.
23
analytics8.com.au
Complex operational processes lead to complex data sets which
in turn leads to a data preparation challenge. It is tempting to design
solutions that are optimized for each individual challenge. This is,
however, a false economy. Layered complexity
Consistency & RepeatabilityConsistency of design leads to repeatability, leads to reduced cost, faster construction and higher quality.
Atomicity of DesignStandardise design artefacts to a limited number of entity types. One component, one purpose.
Scalability is KeyA modular architecture of low complexity components ensures there is no built-in glass ceiling to data processing – in terms of Volume, Variety and Velocity.
Late BindingDesign for change. Separate data processing from data interpretation.
Modular Design
24
analytics8.com.au
Increased development effort = increased costReduced ability to respond to changeAbsence or inconsistent implementation of standard processingData quality issues
BESPOKE DEVELOPMENTBuild once – use many timesAgree on a limited number of standardized design patternsDevelop automated, repeatable packages that encapsulate this behaviourReplace individual data processes with parameterized ‘calls’ to these packages
DESIGN PATTERN APPROACH
“Nothing is particularly hard if you divide it into small jobs.”
Henry Ford
Pattern Based Development
25
analytics8.com.au
Metadata Driven QA
26
analytics8.com.au
CONCLUSIONWHERE DOES THIS LEAVE US?
The ability to use advanced analytics to better understand one’s organisation adds a powerful tool to the organisation’s information arsenal. To be able to leverage the
it as a strategic asset however will require a robust approach to data preparation.
27
analytics8.com.au
ConclusionsMACHINE LEARNING
One-off PoC model or continuous use?
Scope for more beyond model #1?
More than one data source / integration required?
Complex data manipulation /
cleansing?
ONE’S ABILITY TO PREPARE DATA IN AN EFFICIENT &
SUSTAINABLE MANNER WILL CONSIDERABLY IMPACT YOUR
CHANCE OF SUCCESS.
YES
YES
YES
Continuous
ADVANCED ANALYTICS
To leverage Advanced Analytics the question is not if adata integration platform is required, but how.
Separate data integration from data science.
Standardisation and automation are key success factors.
Build with longevity in mind. Tight deadlines lead totunnel vision on the immediate goal.
Knowledge, Experience, Insight, Success