hbasecon 2012 | real-time and batch hbase for healthcare at explorys

22
page 1 Mixing Real Time and Batch with HBase HBaseCon 2012 Doug Meil Casey Stella Dan Washburn

Upload: cloudera-inc

Post on 18-Jun-2015

1.892 views

Category:

Technology


1 download

DESCRIPTION

Explorys leverages HBase and the Hadoop stack to power the next generation of Enterprise Performance Management for Healthcare. The Explorys team will present an overview in 3 parts: Explorys functional and technical overview, approaches in MapReduce performance tuning, and system operations for HBase and Hadoop.

TRANSCRIPT

  • 1. Mixing Real Time andBatch with HBaseHBaseCon 2012Doug MeilCasey StellaDan Washburnpage 1

2. Explorys Technical OverviewDoug Meil Chief Software ArchitectHBase Committerpage 2 3. Healthcare organizations that leverage BIG DATA andtake action on it will survive and thrive.page 3 4. The volume of dataplus the variety of systems andsources of datais piling up at a velocitythat traditional dataapproaches were not designedto support. Healthcares Data Overloadpage 4 5. Explorys Provides... A platform to leverage data acrosssystems, venues, and partners todrive care quality, cost efficiency,BIG and risk mitigation. Rapidly deployable Software-as-a- DATA Service apps for leadership andproviders. Extensible Data-as-a-Servicefunctions to support healthcare ITand business intelligence.page 5 6. Explorys Customers and Patient Span By ZIP Code 80 hospitals, hundreds of ambulatory practices and thousands of providers caring for 14 million patients.page 6 7. page 7 8. 44 billion curated clinical, operation, and financial data points,4 4,0 0 0,1 3 1,1 1 7and counting.page 8 9. What Explorys DoesPlatform and AppsThe Applicationsj Explore: High speedsearch and population Measure: Provider & group level performanceexploration. metrics and benchmarks. DataGridRegistry: AutomatedEngage: Rule-basedcare and disease patient & provider workflowmanagement registries. and outreach.page 9 10. What Explorys DoesPlatform and Apps (video demo)j DataGridpage 10 11. HBase and MR at ExplorysCasey StellaSenior Software Engineerpage 11 12. Map Reduce Strategies HBase at Explorys HBase is our transactional data store Keys group data from a given patient together MR jobs process data from HBase Transform data and report data Sample data Emit data into a form which can be accessed efficiently from applications Nave MR jobs cause much, much stresspage 12 13. Local AggregationMap Task 1 Locally aggregate processing of a patientPatient 1 : Encounter in an individual mapperPatient 1 : Observation Fewer keys and chunkier valuesPatient 1 : Observation Sorting is cheaperPatient 1 : Diagnosis Careful Map Task 2 Patient data can span tasks Patient 1 : Drug Patient 2 : Encounter Potential scalability issues Patient 2 : Observation Patient 2 : Observation Data Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyercovers this technique very wellpage 13 14. Map Reduce and Junior Engineers Map Reduce is Distributed Computing for the masses Masses still do stupid things Masses still have to write MR jobs to do their job Safety at Explorys Most of our engineers start without prior experience in Hadoop or HBase Giving them a book only goes so far Need a combination of process and technology Still an uphill battlepage 14 15. Map Reduce and Junior Engineers Process Jobs are tested in development grid with real data Most map reduce jobs are pushed into teams where MR and HBase educationare very important Technology Constructed an API wrapping Hadoop mapreduce package Alternate job builder interface with added type-safety Adds the ability to swap-out at launch-time different contextspage 15 16. Building a Solid FoundationDaniel WashburnSystems Engineerpage 16 17. Key Components Performance ManagementReleaseConfigurationManagement ManagementTeamworkpage 17 18. Performance Management Collect as much as you can Ganglia, OpenTSDB Nagios, Zenoss Understand what youre monitoring If you dont know what a metric means, look it up! Work with customers to understand whats important to them Act on it State-based alerting is where many people stop Data-driven, predictive approach is the goal Create dashboardspage 18 19. Configuration Management Consistency is essential Do this while youre still small! Choose a methodology Parallel execution/distribution Configuration management engine Implement it Parallel-ssh, mcollective Puppetpage 19 20. Release Management Upgrade early and often Become comfortable with the process The logistics of upgrading can be tough, but its worth it Get involved with the community HBase is constantly evolving The mailing lists and IRC channel are very active Your contribution might help someone elsepage 20 21. Teamwork It takes a village to raise an HBase Effective communication is essential Were all part of the effort Administrators Engineers Developers End userspage 21 22. Thank You! Questions? Doug Meil Chief Software Architect [email protected] Casey Stella Senior Software Engineer [email protected] Daniel Washburn Systems Engineer [email protected] www.explorys.compage 22