exploring how to use hadoop in your healthcare big data strategy
TRANSCRIPT
© 2016 Health CatalystProprietary and Confidential
Exploring How to Use Hadoop in your Healthcare Big Data Strategy
1
Sean StohlSenior Vice President, Product DevelopmentHealth Catalyst
© 2016 Health CatalystProprietary and Confidential
2
© 2016 Health CatalystProprietary and Confidential
Poll Question #1
3
What brought you to this webinar? 115 Respondents
1. Everyone is talking about Big Data/Hadoop – What is it? – 31%2. Searching for uses cases – What is the value proposition? – 42%3. Need help implementing it – 7%4. Want to hear others’ experiences – 16%5. I am bored so why not try this webinar – 4%
© 2016 Health CatalystProprietary and Confidential
4
Learning Objectives
Be able to explain
• What is Big Data and Hadoop
• Why do we need Big Data and Hadoop in Healthcare
• What are the challenges to adoption
• How do I get started
• See it in action
© 2016 Health CatalystProprietary and Confidential
5
Scaling Up Limits
© 2016 Health CatalystProprietary and Confidential
6
What does it take to reach the Big Data threshold?
3 V’s of Big Data
© 2016 Health CatalystProprietary and Confidential
We Are Not “Big Data” in Healthcare Yet
7
© 2016 Health CatalystProprietary and Confidential
8
Volume, Velocity, and Variety aren’t the only reasons to move
Dear Data…
© 2016 Health CatalystProprietary and Confidential
• Created by Doug Cutting and Mike Cafarella at Yahoo in 2005.
• Hadoop named after Cutting’s son’s toy elephant.
• “The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term.” - Doug Cutting
• Open-sourced software framework that supports processing and storing of large data sets distributed across clusters of commodity hardware.
• HDFS – Hadoop Distributed File System. File System that provides the capability to distribute data across a cluster to take advantage of the parallel processing of Map Reduce.
• Map Reduce - Parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query.
History of Hadoop
© 2016 Health CatalystProprietary and Confidential
Poll Question #2
10
How would you categorize your organization’s involvement with Hadoop? 126 Respondents
1) Piloting Hadoop in the Cloud or Plan to – 9%2) Piloting Hadoop on Premise or Plan to – 18%3) Heavily using Hadoop in the Cloud – 1%4) Heavily using Hadoop on Premise – 5%5) Unsure or not applicable – 68%
© 2016 Health CatalystProprietary and Confidential
11
• Data Growth
• Different Types of Workload• Semi Structured• Archiving• Streaming• Machine Learning
Why Big Data and Hadoop in Healthcare
© 2016 Health CatalystProprietary and Confidential
Just Beginning: Digitization of Health
12
“EMR data represents ~8% of the data we need for population health and precision medicine.” — Alberta Secondary Use Data Project
The Growing Ecosystem of Human Health Data
Healthcare Encounter
Data
7x24 Biometric
DataConsumer
Data
Genomic &
Familial Data
Social Data
Outcomes Data
© 2016 Health CatalystProprietary and Confidential
13
• Structured• Data that can be stored relationally in RDBMS
• Semi Structured• Data that has some organizational properties but isn’t in a relational database format• CSV, XML, X12 (835/837) , HL7, JSON• Doctor Notes - Template Generated Sections
• Unstructured• E-mails, text messages, Word documents, videos, and pictures• Doctor Notes – Free Form Sections
Types of Data
© 2016 Health CatalystProprietary and Confidential
14
Archiving
© 2016 Health CatalystProprietary and Confidential
15
Streaming
© 2016 Health CatalystProprietary and Confidential
16
© 2016 Health CatalystProprietary and Confidential
17
© 2016 Health CatalystProprietary and Confidential
Implementation
18
© 2016 Health CatalystProprietary and Confidential19
Challenges to Adoption and How to Overcome Them
© 2016 Health CatalystProprietary and Confidential
Poll Question #3
20
Which challenge has been or would be the greatest barrier for your organization to adopt Hadoop? 137 Respondents
1. People with the right skill sets – 33%2. Funding hardware costs - 8%3. Defining the business value – 37%4. Security concerns – 6%5. Unsure or not applicable – 16%
© 2016 Health CatalystProprietary and Confidential
21
© 2016 Health CatalystProprietary and Confidential
22
Challenges to adoption
OrganizationalBuyingAdministeringUsing
© 2016 Health CatalystProprietary and Confidential
23
Organizational
Stuck in the Mud
© 2016 Health CatalystProprietary and Confidential
24
Buying
© 2016 Health CatalystProprietary and Confidential
25
© 2016 Health CatalystProprietary and Confidential
26
Cloud
© 2016 Health CatalystProprietary and Confidential
27
Administering Fewer experienced people Lack of best practices Myriad of tools Open Source yes – but lots of assembly required Security?
© 2016 Health CatalystProprietary and Confidential
28
Administering
© 2016 Health CatalystProprietary and Confidential
29
Packaged Solutions
© 2016 Health CatalystProprietary and Confidential
30
Administering
© 2016 Health CatalystProprietary and Confidential
31
Invest in your people
© 2016 Health CatalystProprietary and Confidential
32
Using• Which SQL on Hadoop
Hive
Impala
Spark SQL
Apache Drill
© 2016 Health CatalystProprietary and Confidential
33
Tools continue to Evolve
http://www.infoworld.com/article/3131058/analytics/big-data-face-off-spark-vs-impala-vs-hive-vs-presto.html
© 2016 Health CatalystProprietary and Confidential
34
Don’t Rip and Replace
© 2016 Health CatalystProprietary and Confidential
35
Meeting in the middle
RDBMS Vendors
• Oracle• SQL Server• Teradata• …
Hadoop Solutions
• Hortonworks• Cloudera• Mapr• Cloud• …
Convergence
© 2016 Health CatalystProprietary and Confidential
36
Additive Approach
© 2016 Health CatalystProprietary and Confidential
37
Data Operating System
Demos
© 2016 Health CatalystProprietary and Confidential
Lessons Learned
39
1. Let use cases help drive the need to implementing Hadoop. (Be Pragmatic.)2. Think additive.3. Invest in people now.4. In general, the Cloud will give you the most flexibility in deploying Hadoop.
© 2016 Health CatalystProprietary and Confidential
Thank You
40