20150425 experimenting with openstack sahara on docker

Download 20150425 experimenting with openstack sahara on docker

Post on 14-Aug-2015

48 views

Category:

Technology

6 download

Embed Size (px)

TRANSCRIPT

  1. 1. Big Data Technologies Experimenting with Openstack Sahara on Docker Weiting Chen weiting.chen@intel.com
  2. 2. BIG DATA TECHNOLOGY Legal Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. 2015 Intel Corporation.
  3. 3. BIG DATA TECHNOLOGY Agenda Background How to use Docker with Sahara Performance Testing Conclusion
  4. 4. BIG DATA TECHNOLOGY Who We Are We are from Intel Big Data Technology Group. We push big data technology forward into OpenStack We contribute Sahara source code in OpenStack, bring Cloudera CDH 5.3 plugin in Kilo.
  5. 5. BIG DATA TECHNOLOGY Sahara Background Sahara becomes a core project in Juno Bring Hadoop into OpenStack Add more features to Kilo release Two Key Features 1. To provide users easily provisioning Hadoop clusters by specifying several parameters 2. Analytics as a Service for data scientist or analyst
  6. 6. BIG DATA TECHNOLOGY Sahara Key Features - Provision Cluster Create/Terminate Cluster Heat API/Nova Direct API Integrate with Neutron/Nova Network Use Guide as a template Anti-affinity Cluster Scaling Add Node/Remove Node Support More Plugins in Kilo Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR/Storm
  7. 7. BIG DATA TECHNOLOGY Sahara Key Features - Elastic Data Processing Support Job Type Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase Support Data Locality Rack/Hypervisor/Swift Data Source Internal: Internal HDFS(Ephemeral Disk/Cinder) External: Swift/HDFS Run Job in Transient Cluster *Different Plugin provide different capabilities
  8. 8. BIG DATA TECHNOLOGY Sahara Working Flow Fast Cluster Provisioning Select Hadoop Version Select Base Image w/ Hadoop Define Cluster Configuration Provision Cluster Operate Cluster Terminate Cluster Analytic as a Service using Elastic Data Processing Select Hadoop Version Configure Jobs Set Limit for Cluster Execute Jobs Get The Result Choose type of the job: pig, hive, jar-file, etc. Select input and output data location (Swift support) Cluster will be removed automatically after the job completion Provide the details Hadoop configuration, like size, topology, and others Sahara will provision VMs, install and configure Hadoop Support Scale out Cluster to add/remove nodes
  9. 9. BIG DATA TECHNOLOGY Sahara Data Processing Swift OpenStack Virtual Clusters OpenStack Virtual Clusters HDFS Collector Agent Data Stream Pattern 2: External - SwiftPattern 1: Internal - HDFS Only Collector Agent Collecting Data Collecting Data OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift. OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance. Cinder Ephemeral Disk MapReduce MapReduce
  10. 10. BIG DATA TECHNOLOGY Docker Background An open source project The latest version is v1.6 Automates the deployment of applications inside software containers Provide fast and application portability Use libcontainer library to use virtualization facilities from Linux kernel Resource isolation using cgroups, kernel namespaces, etc
  11. 11. BIG DATA TECHNOLOGY Sahara + Docker Deliver Better Performance (compare with hypervisors) Optimize Resource Utilization Reduce Cost Fast Deployment
  12. 12. BIG DATA TECHNOLOGY Sahara Architecture Sahara RESTAPI Horizon Python Sahara Client Sahara Pages Keystone Auth DAL Image Registry Provisioning Engine Vendor Plugins EDP Hadoop VM Hadoop VM Hadoop VM Hadoop VM Nova|Heat|Cinder Glance Neutron
  13. 13. BIG DATA TECHNOLOGY Sahara + Docker Architecture Sahara RESTAPI Horizon Python Sahara Client Sahara Pages Keystone Auth DAL Image Registry Provisioning Engine Vendor Plugins EDP Hadoop VM Hadoop VM Hadoop VM Hadoop VM Nova|Heat|Cinder Glance nova docker driver Docker Registry Docker Image Docker Neutron
  14. 14. BIG DATA TECHNOLOGY Sahara CDH Plugin Controller Computing Node1 Cloudera Manager API Python Client (Migrate from CM-API Client) Sahara Service Horizon(OpenStack Dashboard) CDH Plugin Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine. Step2: Use CM API Client to connect to CM and provision the other services in the cluster. STEP1 STEP2 End Customer VM1 - Master VM2 - Slave Cloudera Manager (Cloudera Express v5.1.3, CDH v5.0.0 & CM API v7) Job History Resource Manager Oozie Server Name Node Secondary Name Node Data Node Node Manager STEP3 CDH Cluster
  15. 15. BIG DATA TECHNOLOGY Nova Docker Driver Introduced with Havana, move out Icehouse and Juno For Juno, Must install an older version novadocker # git checkout -b pre-i18n 9045ca43b645e72751099491bf5f4f9e4bddbb91 Implement a RESTFul client via httplib to communicate with Docker For Kilo(Upstream), Need to install docker-py Use Docker API Client to communicate with Docker
  16. 16. BIG DATA TECHNOLOGY Authenticate & Hostname Issue Use username & password instead of inject authorized key into instance No cloud-init in docker image, use username & password instead of inject key Upgrade Docker version to support change hostname Docker v1.2 or later can support to change hostname Change sudo mv etc-host /etc/hosts to sudo cp etc-host /etc/hosts Docker v1.3 response the device is busy when using mv. By using cp to replace mv can be success to run the change
  17. 17. BIG DATA TECHNOLOGY Network Port Issue Open Privilege Mode to expose all the ports in the container Modify nova docker driver source code to add privileged=True and publish all ports
  18. 18. BIG DATA TECHNOLOGY Docker Image Build a docker image by using Dockerfile Refer sahara-image-elements to build a CDH5 docker image Build a docker image may take a lot of time(try-and-error) Better use Dockerfile cache to reduce the time building the image Copy docker image to every compute node manually Must copy docker image to all the compute nodes, currently glance cannot support to copy the image to compute node If the image cannot be found in docker images, nova will raise an error during starting an instance
  19. 19. BIG DATA TECHNOLOGY Build Docker Image - using Dockerfile Using docker build to build image by DockerFile # docker build -t $image_name:$tag From centos:centos6 MAINTAINER Weiting Chen weiting.chen@intel.com ENV http_proxy http://xxx:1080 RUN echo 'proxy=http://xxx:1080' >> /etc/yum.conf RUN yum install -y cloudera-manager-agent EXPOSE 21 Add ENV variables at beginning 1. Add proxy setting in individual software configuration 2. Install required software Expose Required Service Port Dockerfile Example
  20. 20. BIG DATA TECHNOLOGY Register & Copy Docker Image to Compute Nodes Register docker image to glance # docker save cdh5:20150425 | glance image-create --is-public=True -- container-format=docker --disk-format=raw --name cdh5:20150425 Copy image to all compute nodes # scp cdh5:20150425.tar $compute_node:./ Load image to docker registry # docker load -i cdh5:20150425.tar If no image can be used in computing node, it will raise an error from nova.
  21. 21. BIG DATA TECHNOLOGY Nova Docker Driver Network Set network to none Nova docker driver would leverage existing network configuration from Neutron Support Linux Bridge or OVS NOT use docker0 Use VXLAN in our experiment Create a bridge to OVS automatically Set Privilege Mode to True for convenience Need to set port mapping during docker run if not use privilege mode
  22. 22. BIG DATA TECHNOLOGY Docker Network Host1 Docker Container1 Container2 Container3 eth0 172.17.42.10 eth0 172.17.42.11 eth0 172.17.42.12 docker0 172.17.42.1 Host1 Docker Container1 Container2 Container3 eth0 192.168.0.1 eth1 10.10.10.1 docker0 172.17.42.1 Host1 Docker Container1 Container2 Container3 docker0 172.17.42.1 Bridge Mode Host Mode None Mode Default Mode Support multiple namespaces Only one namespace Nova Docker Driver use this Configure network and connect to bridge via driver
  23. 23. BIG DATA TECHNOLOGY Docker Network Performance Host1 Host2 Host1 C1 941 Mb Host2 941 Mb Host1 C1 Host2900 Mb C1 Host1 C1 941 Mb Host1 C1 C2 14 Gb Container to the same Host Container to Container in the same Host Container to Container in different Host Container to different Host Host to Host phy. network br-ex(floating ip) br-tun BACKGROUND OpenStack Juno using VXLAN Use Docker v1.3 1Gb Ethernet br-ex(floating ip) qbr~ 14Gb w/ DVR
  24. 24. BIG DATA TECHNOLOGY Neutron VXLAN without DVR Controller/Network Node Compute Node br-tun patch-tun br-int br-ex eth1 br-tun patch-tun br-int qvo~ 172.16.0.0/16 192.168.0.0/16 VM vm0 eth2 eth2 VM vm0 qbr~qvb~ patch-int tap~ 10.0.0.0/16 tap patch-int tap qdhcp ns~ snat- sg~ qg~ qrouter~ qr~ tap tap
  25. 25. BIG DATA TECHNOLOGY Neutron VXLAN with DVR Controller/Network Node Compute Node br-tun patch-tun br-int br-ex eth1 br-tun patch-tun br-int tap qvo~ 172.16.0.0/16 192.16