SavannaHadoop on OpenStack
Ilya Elterman (Mirantis)Matthew Farrellee (Red Hat)Sergey Lukjanov (Mirantis)
Agenda
● Savanna Overview● Current state
○ EDP overview○ other features
● Roadmap● Live Demo
Agenda
● Savanna Overview● Current state
○ EDP overview○ other features
● Roadmap● Live Demo
Mission:
To provide the OpenStack community with an open, cutting edge, performant and scalable data
processing stack and associated management interfaces
● provision and operate Hadoop clusters ● schedule and operate Hadoop jobs
OpenStack Data Processing - Savanna
Hadoop - Big Data Platform
http://www.google.com/trends/explore?q=hadoop+openstack#q=openstack%2C%20hadoop&cmpt=q
Popularity
Hadoop OpenStack
Use Cases
● Self-service provisioning of Hadoop clusters● Utilization of unused compute capacity for
bursty workloads● Run Hadoop workloads in few clicks without
expertise in Hadoop ops
Architecture Overview
Data Sources
Savanna Python Client RE
ST A
PI
Cluster Configuration
Manager
Horizon
Keystone
Auth
Data Access Layer
Swift
Savanna Pages
HadoopVM
Vendors Plugins
HadoopVM
HadoopVM
HadoopVM
Resources Orchestration
Manager
Job Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Savanna Status
● Official incubated OpenStack project● v0.3 released 17 Oct 2013● Supported Hadoop distros:
○ Vanilla Apache Hadoop (reference implementation)○ Hortonworks Data Platform 1.3.x○ Intel Distribution on review○ Cloudera Distribution in blueprint
● Included in OpenStack distros:○ RDO - http://openstack.redhat.com○ Mirantis OpenStack - http://software.mirantis.com
Cluster Provisioning Performance
Agenda
● Savanna Overview● Current state
○ EDP overview○ other features
● Roadmap● Live Demo
● End users have data and questions○ The data lives in a data repository○ The questions are embodied in code
● Savanna Elastic Data Processing (EDP) brings the Hadoop ecosystem to the end user○ Hides all cluster management behind the scenes
EDP Overview
EDP
“Customers launch millions of Amazon EMR clusters every
year.”http://aws.amazon.com/elasticmapreduce/
EDP
● Variety and depth of value add offerings on top of clouds are growing
● Offerings are rarely open, rarely allow for choice● Examples - Google Cloud, Azure, AWS
EDP
Savanna and EDP can both match and exceed use cases provided by most
public clouds
EDP in Savanna v0.3
● UI, integrated into Horizon, for ad-hoc analytics queries based on Hive or Pig
● API to execute MapReduce jobs without exposing details of underlying infrastructure
● Pluggable data sources: Swift● Supported job types: Jar, Pig, Hive● Integration with Oozie for workflow management
Agenda
● Savanna Overview● Current state
○ EDP overview○ other features
● Roadmap● Live Demo
Cluster Ops in Savanna 0.3
● REST API● Configuration templates● Manual cluster scaling● Data node anti-affinity and location control● Full support of data locality - rack and 4-level
awareness for HDFS and Swift● Swift integration
OpenStack Integration in Savanna 0.3
● OpenStack Dashboard plugin● Both Neutron and Nova Network support● Keystone trusts used for async operations● Python client
Agenda
● Savanna Overview● Current state
○ EDP overview○ other features
● Roadmap● Live Demo
Live Demo
Icehouse Roadmap
● Integration with OpenStack ecosystem○ Heat○ Tempest○ Devstack○ Ceilometer○ Ironic
● EDP enhancements● Code hardening● Polished api v2● Performance testing
Design Summit Sessions
Friday, November 8● 1:30pm Network and installation topologies● 2:20pm Heat integration and scalability● 3:10pm Further OpenStack integration● 4:10pm Savanna in Icehouse
http://goo.gl/2iEv8u
Q&A