Download - Hadoop Everywhere & Cloudbreak
![Page 1: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/1.jpg)
Hadoop EverywhereHortonworks. We do Hadoop.
![Page 2: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/2.jpg)
$ whoamiSean RobertsPartner Solutions EngineerLondon, EMEA & everywhere
@seanolinkedin.com/in/seanorama
MacGyver. Data Freak. Cook. Autodidact. Volunteer. Ancestral Health. Fito. Couchsurfer. Nomad
![Page 3: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/3.jpg)
- HDP 2.3- http://hortonworks.com/
- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:- http://hortonworks.com/partners/learn/
What’s New!
![Page 4: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/4.jpg)
Agenda● Hadoop Everywhere● Deployment challenges & requirements● Cloudbreak & our Docker approach● Workshop: Your own CloudBreak
○ And auto-scaling with Periscope● Cloud best practicesReminder:● Attendee phone lines are muted● Please ask questions in the chat
![Page 5: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/5.jpg)
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DisclaimerThis document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.
This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.
Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
![Page 6: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/6.jpg)
Hadoop Everywhere
![Page 7: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/7.jpg)
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Any applicationBatch, interactive, and real-time
Any dataExisting and new datasets
AnywhereComplete range of deployment options
Commodity Appliance Cloud
YARN: data operating system
Existing applications
Newanalytics
Partner applications
Data access: batch, interactive, real-time
Hadoop Everywhere
![Page 8: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/8.jpg)
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hybrid Deployment ChoiceWindows, Linux, On-Premise or CloudData “gravity” guides choice
Compatible ClustersRun applications and data processing workloads wherever and whenever needed
Replicated DatasetsDemocratize Hadoop data access via automated sharing of datasets using Apache Falcon
Hadoop Up There, Down Here...Everywhere!
Dev / Test BI / ML
IoT Apps
On-Premises
![Page 9: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/9.jpg)
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Cases Where?Active Archive / Compliance Reporting Sensitive data = “down here”; “up there” valid for many
scenarios
ETL / Data Warehouse Optimization Usually has “down here” gravity; DW in cloud is changing that
Smart Meter Analysis Data typically flows “up there”
Single View of Customer May have “down here” gravity; unless you’re using SaaS apps
Supply Chain Optimization May have heavy “down here” gravity
New Data for Product Management “Up there” could be considered for many scenarios.
Vehicle Data for Transportation/Logistics Why not “up there”?
Vehicle Data for Insurance May have “down here” gravity (ex. join with existing risk data)
Anywhere? Up There or Down Here?
![Page 10: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/10.jpg)
DeploymentChallenges & Requirements
![Page 11: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/11.jpg)
Deployment challenges● Infrastructure is different everywhere
○ e.g. Each cloud provider has their own API○ e.g. Each provider has different networking methods
● OS/images are different everywhere● How to do service discovery?● How to dynamically scale/manage?
See prior operations workshops
![Page 12: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/12.jpg)
- Infrastructure- Operating System- Environment Prepared (see docs)- Ambari Agent/Server installed & registered- Deploy HDP Cluster
- Ambari Blueprints or Cluster Wizard- Ongoing configuration/management
Deployment requirements
![Page 13: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/13.jpg)
Options for Automation- Many combinations of tools
- e.g. Foreman, Ansible, Chef, Puppet, docker-ambari, shell scripts, CloudFormation, …
- Provider specific- Cisco UCS, Teradata, HP, Google’s bdutil, …
- Docker with Cloudbreak
Using Ambari with all of the above!
![Page 14: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/14.jpg)
https://github.com/seanorama/ambari-bootstrap/
Demo: Basic script-based example
![Page 15: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/15.jpg)
https://github.com/seanorama/ambari-bootstrap
Requirements:● Infrastructure prepped (see HDP docs)● Nodes with RedHat EL or CentOS 6 systems● HDFS paths mounted (see HDP docs)● sudo or root access
ambari-bootstrap
![Page 16: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/16.jpg)
After Ambari deployment● (optional) Configure local YUM/APT repos● Deploy HDP with Ambari Wizard or Blueprint● Ongoing configuration/management
![Page 17: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/17.jpg)
Using Ansiblehttps://github.com/rackerlabs/ansible-hadoop
![Page 18: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/18.jpg)
Build once. Deploy anywhere.
Docker
![Page 19: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/19.jpg)
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
![Page 20: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/20.jpg)
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Multiplicity of
Stacks
Multiplicity of hardware
environments
Static website Web frontend
User DB
Queue
Analytics DB
Development VMQA server Public Cloud
Contributor’s laptop
Docker is a “Shipping Container” System for Code
Production ClusterCustomer Data Center
An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container
![Page 21: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/21.jpg)
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker• Container based virtualization• Lightweight and portable• Build once, run anywhere• Ease of packaging applications• Automated and scripted• Isolated
![Page 22: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/22.jpg)
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Why Is Docker So Exciting?For Developers:Build once…run anywhere
• A clean, safe, and portable runtime environment for your app.
• No missing dependencies, packages etc.• Run each app in its own isolated container• Automate testing, integration, packaging• Reduce/eliminate concerns about
compatibility on different platforms• Cheap, zero-penalty containers to deploy
services
For DevOps:Configure once…run anything
• Make the entire lifecycle more efficient, consistent, and repeatable
• Eliminate inconsistencies between SDLC stages
• Support segregation of duties• Significantly improves the speed and
reliability of CICD• Significantly lightweight compared to VMs
![Page 23: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/23.jpg)
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
More Technical ExplanationWHY WHA
T• Run on any LINUX
• Regardless of kernel version (2.6.32+)• Regardless of host distro• Physical or virtual, cloud or not• Container and host architecture must match
• Run anything• If it can run on the host, it can run in the
container• i.e. if it can run on a Linux kernel, it can run
• High Level—It’s a lightweight VM• Own process space• Own network interface• Can run stuff as root
• Low Level—It’s chroot on steroids• Container=isolated processes• Share kernel with host• No device emulation (neither HVM nor PV)
from host)
![Page 24: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/24.jpg)
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker - How it worksApp
A
Hypervisor (Type 2)
Host OS
Server
GuestOS
Bins/Libs
AppA’
Guest
OS
Bins/Libs
AppB
Guest
OS
Bins/Libs
Docker
Host OS kernel
Server
bin
App A
lib
App B
VM
Container
Containers are isolated. Share OS and bins/libraries
GuestOS
GuestOS
…result is significantly faster deployment, much less overhead, easier migration, faster restart
lib
App B
lib
App B
lib
App B
binA
pp A
![Page 25: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/25.jpg)
CloudbreakTool for Provision and Managing Hadoop Clusters In The Cloud
![Page 26: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/26.jpg)
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak• Developed by SequenceIQ• Open source with Apache 2.0
license [ Apache project soon ]• Cloud and infrastructure
agnostic, cost effective Hadoop As-a-Service platform API.
• Elastic – can spin up any number of nodes, add/remove on the fly
• Provides full cloud lifecycle management post-deployment
![Page 27: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/27.jpg)
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Key Features of CloudbreakElastic
• Enables provisioning an arbitrary node Cluster
• Enables (de)commissioning nodes from Cluster
• Policy and time based based scaling of cluster
Flexible
• Declarative and flexible Hadoop cluster creation using blueprints
• Provision to multiple public cloud providers or Openstack based private cloud using same common API
• Access all of this functionality through rich UI, secured REST API or automatable Shell
Enterprise-ready
• Supports basic, token based and OAuth2 authentication model
• The cluster is provisioned in a logically isolated network
• Tracking usage and cluster metrics
![Page 28: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/28.jpg)
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
BI / Analytics(Hive)
IoT Apps(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test(all HDP services)
Data Science(Spark)
Cloudbreak
1. Pick a Blueprint2. Choose a Cloud3. Launch HDP!
Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev /
Test
![Page 29: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/29.jpg)
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak Approach• Use Ambari for heavy lifting
• Provisioning of Hadoop services• Monitoring
• Use Ambari Blueprints• Assign Host groups to physical instance types
• Public/Private Cloud provider API abstracted• Azure/Google/Amazon/Openstack
• Run Ambari agent/server in Docker container• Networking: docker run –net=host• Service discovery: consul (previously serf)
![Page 30: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/30.jpg)
Workshop: Your own Cloudbreak
![Page 31: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/31.jpg)
cloudbreak-deployer● https://github.com/sequenceiq/cloudbreak-deployer
Requirements:● A Docker host (laptop, server or Cloud infrastructure)● Resources:
○ Very little. Tested with 2GB of RAM.
Workshop: Your Own CloudBreak
![Page 32: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/32.jpg)
Requirement: a Docker host● OSX or Windows: http://boot2docker.io/
○ boot2docker init○ boot2docker up○ eval "$(boot2docker shellinit)"○ boot2docker ssh
● Linux: Install the docker daemon● Anywhere: docker-machine “lets you create Docker hosts on your
computer, on cloud providers, and inside your own data center”○ Example on Rackspace:
■ docker-machine create --driver rackspace \--rackspace-api-key $OS_PASSWORD \--rackspace-username $OS_USERNAME \--rackspace-region DFW docker-rax
■ docker-machine ssh docker-rax
![Page 33: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/33.jpg)
Install cloudbreak-deployerhttps://github.com/sequenceiq/cloudbreak-deployer
● curl \ https://raw.githubusercontent.com/sequenceiq/cloudbreak-deployer/master/install | sh && cbd --version
● cbd init● cbd start
You’ll then have your own CloudBreak & Periscope server with API and Web UI
![Page 34: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/34.jpg)
Done: Your own Cloudbreak
![Page 35: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/35.jpg)
Deploy a cluster with your CloudBreak
![Page 36: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/36.jpg)
Documentation:http://sequenceiq.com/cloudbreak/#cloudbreak-credentials
1. Add Credentials
![Page 37: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/37.jpg)
2. Create Cluster
![Page 38: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/38.jpg)
3. Use your ClusterAmbari available as expected
To reach your Hadoop hosts:● SSH to Docker Host
○ Hosts arre listed in “Cloud stack description”○ ssh cloudbreak@IPofHost
● Shell to the “ambari-agent” container○ sudo docker ps | grep ambari-agent
■ note the CONTAINER ID○ sudo docker -it CONTAINERID bash
● Use the hosts as usual. e.g.:○ hadoop fs -ls /
![Page 39: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/39.jpg)
Cloudbreak internals
![Page 40: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/40.jpg)
Page 40 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Cloudbreak Internals
Uluwatu(cbreak UI)
Sultans(User mgmt UI)
Browser
CloudbreakshellOAuth2
(UAA)
uaa-db(psql)
Cloudbreak(rest API)
cb-db(psql)
Periscope(autoscaling
)
ps-db(psql)
consul registrator ambassador
docker
![Page 41: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/41.jpg)
Docker
![Page 42: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/42.jpg)
Page 42 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm• Native clustering for Docker• Distributed container orchestration• Same API as Docker
![Page 43: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/43.jpg)
Page 43 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm – How it works • Swarm managers/agents• Discovery services• Advanced scheduling
![Page 44: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/44.jpg)
Page 44 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul • Service discovery/registry• Health checking• Key/Value store• DNS• Multi datacenter aware
![Page 45: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/45.jpg)
Page 45 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul – How it works • Consul servers/agents
• Consistency through a quorum (RAFT)
• Scalability due to gossip based protocol (SWIM)
• Decentralized and fault tolerant
• Highly available
• Consistency over availability (CP)
• Multiple interfaces - HTTP and DNS
• Support for watches
![Page 46: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/46.jpg)
Page 46 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari • Easy Hadoop cluster provisioning
• Management and monitoring
• Key feature - Blueprints
• REST API, CLI shell
• Extensible• Stacks• Services• Views
![Page 47: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/47.jpg)
Page 47 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari – How it works• Ambari server/agents
• Define a blueprint (blueprint.json)
• Define a host mapping (hostmapping.json)
• Post the cluster create
![Page 48: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/48.jpg)
Page 48 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Run Hadoop as Docker containers
HDP as Docker Containersvia Cloudbreak
• Fully Automated Ambari Cluster installation• Avoid GUI, use rest API only (ambari-shell)• Fully Automated HDP installation with blueprints• Quick installation (pre-pulled rpms)• Same process/images for dev/qa/prod• Same process for single/multinode
Cloudbreak Ambari HDP
Installs Ambari on the VMs
Docker
VM
Docker
VM
Docker
Linux
Instructs Ambari to build
HDP cluster
Cloud Provider/Bare Metal
Provisions VMs from
Cloud Providers
![Page 49: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/49.jpg)
Page 49 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Provisioning – How it works
Start VMs - with a running
Docker daemon
Cloudbreak Bootstrap•Start Consul Cluster
•Start Swarm Cluster (Consul for discovery)
Start Ambari servers/agents - Swarm API
Ambari services
registered in Consul
(Registrator)
Post Blueprint
![Page 50: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/50.jpg)
Page 50 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
![Page 51: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/51.jpg)
Page 51 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn amb-ser amb-
agn
amb-agn
amb-agn
amb-agn
![Page 52: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/52.jpg)
Page 52 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn amb-ser amb-
agn
amb-agn
amb-agn
amb-agn
Blueprint
![Page 53: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/53.jpg)
Page 53 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn- hdfs- hbase
amb-seramb-agn-hdfs-hive
amb-agn-hdfs-yarn
amb-agn-hdfs-zookpr
amb-agn-nmnode-hdfs
![Page 54: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/54.jpg)
Workshop: Auto-Scale your Clusterwith Periscope
![Page 55: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/55.jpg)
Page 55 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Optimize Cloud Usage via Elastic HDP Clusters
Dev / Test
Auto-scaling Policy
• Policies based on any Ambari metrics• Dynamically scale to achieve physical elasticity• Coordinates with YARN to achieve elasticity based on
the policies.
![Page 56: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/56.jpg)
Page 56 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Scaling for Static and Dynamic Clusters
Auto-scale PolicyAuto-scale PolicyAuto-scale Policy
YARN
Ambari Alerts
Ambari Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak Static
Dynamic
Enforces Policies Scales Cluster/YARN Apps
Metrics and Alerts Feed Cloudbreak/Periscope
![Page 57: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/57.jpg)
Scale by Ambari Monitoring Metric1. Ambari: review metric2. CloudBreak: set alert3. Cloudbreak: set scaling policy
![Page 58: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/58.jpg)
Scale up/down by time1. Set time-based alert2. Set scaling policy
Repeat with an alertand policy whichscales down
![Page 59: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/59.jpg)
Roadmap
![Page 60: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/60.jpg)
Page 60 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Summary Cloudbreak● It’s own project
(separate from Ambari)● Supported on Linux
flavors which support Docker
Periscope● Feature of Cloudbreak 1.0● Will be embedded in
Ambari later in 2015
![Page 61: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/61.jpg)
Page 61 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Release Timeline
Cloudbreak 1.0 GA
June/July 2015
Cloudbreak 2.0 GA2H2015
Ambari 2.1.0HDP “Dal” / 2.3
Ambari 2.2HDP “Erie” / 2.4
Cloudbreak 1.1August 2015
(est)
Ambari 2.1.1HDP “Dal-M10”
CloudbreakIncubator Proposal
July/August 2015 (est)
![Page 62: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/62.jpg)
Page 62 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Supported Cloud Environments
Cloudbreak HDP 2.3
Microsoft Azure GA
AWS GA
Google Compute GA
Cloudbreak HDP 2.3
Cloudbreak HDP 2.4
Openstack Community Tech Preview Tech Preview
Red Hat OSP TBD
HP Helion GA (Tentative)
Mirantis OpenStack
![Page 63: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/63.jpg)
HDP as a Service
![Page 64: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/64.jpg)
Hortonworks Data Platform On Azure
![Page 65: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/65.jpg)
RackspaceCloud Big Data Platform● Rapidly spin up on-demand HDP clusters● Integrated with Cloud Files (OpenStack Swift)● Opt-in for Managed Services by Rackspace
Managed Big Data Platform● Fully Managed HDP on Dedicated and/or Cloud● Leverage Fanatical Support and Industry Leading SLA’s● Supported by Rackspace with escalation to Hortonworks
![Page 66: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/66.jpg)
CSC
![Page 67: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/67.jpg)
HDP on IaaS - Best Practices
![Page 68: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/68.jpg)
Microsoft Azure● Deployment
○ Deploy using CloudBreak○ Deploy using HWX Azure Gallery Image
● Integrated with Azure Blob Storage● Supported directly by Hortonworks● Other offerings
○ Microsoft HDInsight○ HDP Sandbox
![Page 69: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/69.jpg)
Azure Deployment Guideline● All in same Region● Instance Types
○ Typical: A7○ Performance: D14○ 8x1TB Standard LRS x3 Virtual Hard Disk per
server● Multiple Storage Accounts are recommended
○ Recommend no more than 40 Virtual Hard Disks per Storage Account
![Page 70: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/70.jpg)
Azure Blob StoreAzure Blob Store (Object Storage)
● wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path>
Can be used as a replacement for HDFS● Thoroughly tested in HDP release test suites
![Page 71: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/71.jpg)
Amazon Web Services● Deploy using CloudBreak● Integrated with AWS S3 (object storage)● Supported directly by Hortonworks
![Page 72: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/72.jpg)
Amazon Deployment Guideline● All in same Region/AZ● Instances with Enhanced
Networking
Master Nodes:● Choose EBS Optimized● Boot: 100GB on EBS● Data: 4+ 1TB on EBS
Worker Nodes:● Boot: 100GB on EBS● Data: Instance Storage
○ EBS can be used, but local is preferred
Instance Types:● Typical: d2.● Performance: i2.https://aws.amazon.com/ec2/instance-types/
![Page 73: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/73.jpg)
AWS RDS● Some services rely on MySQL, Oracle or PostgreSQL:
○ Apache Ambari○ Apache Hive○ Apache Oozie○ Apache Ranger
● Use RDS for these instead of managing yourself.
![Page 74: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/74.jpg)
AWS S3 (Object Storage)● s3n:// with HDP 2.2 (Hadoop 2.6)● s3a:// with HDP 2.3 (Hadoop 2.7)
Not currently a direct replacement for HDFS
Recommended to configure access with IAM Role/Policy● https://docs.aws.amazon.
com/IAM/latest/UserGuide/policies_examples.html#iam-policy-example-s3
● Example: http://git.io/vLoGY
![Page 75: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/75.jpg)
Amazon Deployment Guideline● All in same Region/AZ● Instances with Enhanced
Networking
Master Nodes:● Choose EBS Optimized● Boot: 100GB on EBS● Data: 4+ 1TB on EBS
Worker Nodes:● Boot: 100GB on EBS● Data: Instance Storage
○ EBS can be used, but local is preferred
Instance Types:● Typical: d2.● Performance: i2.https://aws.amazon.com/ec2/instance-types/
![Page 76: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/76.jpg)
Google Cloud● Deploy using
○ CloudBreak○ Google bdutil with Apache Ambari plug-in
● Integrated with Google Cloud Storage● Supported directly by Hortonworks
![Page 77: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/77.jpg)
Google Deployment Guideline
● Instance Types○ Typical: n1 standard 4 with single 1.5 TB
persistent disks○ Performance: n1 standard 8 with 1TB SSD
● Google GCS (Object Storage)● gs://<CONFIGBUCKET>/dir/file● Not currently a replacement for HDFS
![Page 78: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/78.jpg)
S3 & GCS as Secondary storage systemThe connectors are currently eventually consistent so do not replace HDFS
Backup● Falcon, distCP, hadoop fs, HBase ExportSnapshot● Kafka+Storm bolt sends messages to S3/GCS
providing backup & point-in-time recovery sourceInput/Output● Convenient & broadly used upload/download method
○ As a middleware to ease integration with Hadoop & limit access● Publishing static content (optionally with CloudFront)
○ Removes need to manage any web services ● Storage for temporary/ephemeral clusters
![Page 79: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/79.jpg)
Questions
![Page 80: Hadoop Everywhere & Cloudbreak](https://reader035.vdocuments.site/reader035/viewer/2022081513/55c891c9bb61eb26718b45f4/html5/thumbnails/80.jpg)
$ shutdown -h now
- HDP 2.3- http://hortonworks.com/
- Hadoop Summit recordings:- http://2015.hadoopsummit.org/san-jose/- http://2015.hadoopsummit.org/brussels/
- Past & Future workshops:- http://hortonworks.com/partners/learn/