docker based hadoop deployment

47
Docker-Based Hadoop Provisioning On Cisco InterCloud Innovation Architect, CIS CTO Group Cisco Dmitri Chtchourov Rakesh Saha Product Management Hortonworks

Upload: rakesh-saha

Post on 14-Aug-2015

858 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Docker based Hadoop Deployment

Docker-BasedHadoop ProvisioningOn Cisco InterCloud

Innovation Architect, CIS CTO Group

Cisco

Dmitri Chtchourov Rakesh SahaProduct Management

Hortonworks

Page 2: Docker based Hadoop Deployment

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Cautionary Statement Regarding Forward-Looking Statements

This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements.

The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation.

Trademarks

Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions.  Other names used herein may be trademarks of their respective owners.

Page 3: Docker based Hadoop Deployment

3© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Speakers

Rakesh SahaProduct ManagementHortonworks

Dmitri ChtchourovInnovation Architect, CIS CTO GroupCisco

Page 4: Docker based Hadoop Deployment

4© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Agenda

• About Hortonworks

• Cloudbreak – Docker-based Hadoop provisioning tool

• Introduction to Docker

• Hadoop Provisioning using Docker

• Cisco and Hortonworks Collaboration

Page 5: Docker based Hadoop Deployment

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

About HortonworksO

NLY 100

open source Apache Hadoop data platform

%Founded in 2011

HADOOP1STdistribution to go public

IPO Fall 2014 (NASDAQ: HDP)

subscription

customers322 employees across

600+

countries

technology partners1000+ 17

TM

Page 6: Docker based Hadoop Deployment

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hortonworks Mission:

Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop

Customer Momentum• 300+ customers in seven quarters, growing at 75+/quarter

• Two thirds of customers come from F1000

Hortonworks and Hadoop at Scale• HDP in production on largest clusters on planet

• Multiple +1000 node clusters, including 35,000 nodes at Yahoo!, 800 nodes at Spotify

• Founded in 2011

• Original 24 architects, developers, operators of Hadoop from Yahoo!

• We are leaders in Hadoop community

• 500+ employees

Page 7: Docker based Hadoop Deployment

© Hortonworks Inc. 2011 – 2015. All Rights Reserved

OPERATIONAL TOOLS

DEV & DATA TOOLS

INFRASTRUCTURE

HDP is deeply integrated in the data centerS

OU

RC

ES

EXISTING Systems

Clickstream Web &Social Geolocation Sensor & Machine

Server Logs Unstructured

DA

TA S

YS

TE

M

RDBMS EDW MPP

APPL

ICAT

ION

S

Deep PartnershipsHortonworks engages in deep engineered relationships with the leaders in the data center, such as Cisco, Microsoft, EMC, Pivotal, Teradata, Red Hat, SAS & SAP.

Broad PartnershipsOver a 1,000 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users.

HDP

Go

vern

ance

&

Inte

gra

tio

n

Sec

uri

ty

Op

erat

ion

sData Access

Data Management

YARN

Page 8: Docker based Hadoop Deployment

8© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Agenda

Cloudbreak Docker Provisioning Collaboration

Page 9: Docker based Hadoop Deployment

9© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Cloudbreak

• Developed by SequenceIQ

• Open source with Apache 2.0 license [ Apache project soon ]

• Deploys selected services to public and private cloud via Ambari Blueprints

• Elastic – can spin up any number of nodes, add/remove on the fly

• Provides full cloud lifecycle management post-deployment

Page 10: Docker based Hadoop Deployment

10© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

BI / Analytics(Hive)

IoT Apps(Storm, HBase, Hive)

Launch HDP on Any Cloud for Any Application

Dev / Test(all HDP services)

Data Science(Spark)

Cloudbreak

1. Pick a Blueprint2. Choose a Cloud3. Launch HDP!

Example Ambari Blueprints:

IoT Apps, BI / Analytics, Data Science, Dev / Test

Page 11: Docker based Hadoop Deployment

11© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Hadoop in Cloud Provisioning with Cloudbreak

CreateTemplates

ProvideBlueprint

AssociateCredentials

LaunchCluster

Page 12: Docker based Hadoop Deployment

12© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Provisioning: Template

CreateTemplate

ProvideBlueprint

AssociateCredentials

LaunchCluster

Page 13: Docker based Hadoop Deployment

13© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Provisioning: Blueprint

CreateTemplate

ProvideBlueprint

AssociateCredentials

LaunchCluster

Page 14: Docker based Hadoop Deployment

14© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Provisioning: Provider Credentials

CreateTemplate

ProvideBlueprint

AssociateCredentials

LaunchCluster

Page 15: Docker based Hadoop Deployment

15© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Provisioning: Launch

CreateTemplate

ProvideBlueprint

AssociateCredentials

LaunchCluster

Page 16: Docker based Hadoop Deployment

16© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Specialized Blueprints

Quick productivity with pre-configured clusters blueprints

Lambda Architecture

Machine Learning

Batch ETL

Page 17: Docker based Hadoop Deployment

17© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

BI / Analytics(Hive)

IoT Apps(Storm, HBase, Hive)

Dev / Test(all HDP services)

Data Science(Spark)

Autoscaling Policy

• Policies based on any Ambari metrics• Coordinates with YARN • Policies are based on Metrics or Time • Scaling can be service or component

type specific

Optimize cloud usage via Elastic Clusters

Page 18: Docker based Hadoop Deployment

18© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Auto-scale Policy

Auto-scale Policy

Auto-scale Policy

YARN

Ambari Alerts

Ambari Metrics

Ambari

Ambari

Ambari

Provisioning

CloudbreakStatic

Dynamic

Enforces PoliciesScales Cluster/YARN Apps

Metrics and Alerts Feed Cloudbreak

Scaling for Static and Dynamic Clusters

Page 19: Docker based Hadoop Deployment

19© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Provisioning – How it works

Start VMs - with a running

Docker daemon

Cloudbreak Bootstrap•Start Consul Cluster

•Start Swarm Cluster (Consul for discovery)

Start Ambari servers/agents - Swarm API

Ambari services

registered in Consul

(Registrator)

Post Blueprint

Page 20: Docker based Hadoop Deployment

20© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Agenda

Cloudbreak Docker Provisioning Collaboration

Page 21: Docker based Hadoop Deployment

21© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Multiplicity of

Stacks

Multiplicity of hardware

environments

Static website Web frontend User DB Queue Analytics DB

Development VM QA server Public Cloud

Contributor’s laptopProduction

ClusterCustomer Data

Center

An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container

Docker is a “Shipping Container” System for Code

Page 22: Docker based Hadoop Deployment

22© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Lightweight, portable Build once, run anywhere VM – without the overhead of a VM Isolated containers Automated and scripted

Docker

Page 23: Docker based Hadoop Deployment

23© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Why Is Docker So Exciting?

For Developers:

Build once…run anywhere

• A clean, safe, and portable runtime environment for your app.

• No missing dependencies, packages etc.

• Run each app in its own isolated container

• Automate testing, integration, packaging

• Reduce/eliminate concerns about compatibility on different platforms

• Cheap, zero-penalty containers to deploy services

For DevOps:

Configure once…run anything

• Make the entire lifecycle more efficient, consistent, and repeatable

• Eliminate inconsistencies between SDLC stages

• Support segregation of duties

• Significantly improves the speed and reliability of CICD

• Significantly lightweight compared to VMs

Page 24: Docker based Hadoop Deployment

24© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

AppA

Hypervisor (Type 2)

Host OS

Server

GuestOS

Bins/Libs

AppA’

GuestOS

Bins/Libs

AppB

GuestOS

Bins/LibsD

ocker

Host OS kernel

Server

binA

pp A

lib

App

B

VM

Container

Containers are isolated,Share only the kernel

GuestOS

GuestOS

…result is significantly faster deployment, much less overhead, easier migration, faster restart

lib

App

B

lib

App

B

lib

App

B

bin

App

A

Docker: Containers vs. VMs

Page 25: Docker based Hadoop Deployment

25© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Agenda

Cloudbreak Docker Provisioning Collaboration

Page 26: Docker based Hadoop Deployment

26© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

HDP as Docker Containersvia Cloudbreak

• Running Ambari Cluster in Containers• Use Blueprint to define services• All HDP services share a single container

Cloudbreak

Ambari HDP

Installs Ambari on the VMs

Docker

VM

Docker

VM

Docker

Linux

Instructs

Ambari to build

HDP cluster

Cloud Provider/Bare Metal

Provisions VMs from

Cloud Providers

Run Hadoop as Docker Containers

Page 27: Docker based Hadoop Deployment

27© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Swarm + Consul for Placement and Discovery

Page 28: Docker based Hadoop Deployment

28© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

Page 29: Docker based Hadoop Deployment

29© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

amb-agn

amb-seramb-agn

amb-agn

amb-agn

amb-agn

Blueprint

Page 30: Docker based Hadoop Deployment

30© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Cloudbreak

Run Hadoop as Docker containers

Docker Docker

DockerDockerDocker

Docker

amb-agn- hdfs- hbase

amb-seramb-agn-hdfs-hive

amb-agn-hdfs-yarn

amb-agn-hdfs-zookpr

amb-agn-nmnode-hdfs

Page 31: Docker based Hadoop Deployment

31© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

• Quick installation with pre-pulled rpms

• Same process/images for dev/qa/prod

• Same process for single/multi-node

Benefits of running Hadoop on Docker

Page 32: Docker based Hadoop Deployment

32© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Demo

Page 33: Docker based Hadoop Deployment
Page 34: Docker based Hadoop Deployment
Page 35: Docker based Hadoop Deployment
Page 36: Docker based Hadoop Deployment
Page 37: Docker based Hadoop Deployment
Page 38: Docker based Hadoop Deployment
Page 39: Docker based Hadoop Deployment
Page 40: Docker based Hadoop Deployment
Page 41: Docker based Hadoop Deployment
Page 42: Docker based Hadoop Deployment

42© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Agenda

Cloudbreak Docker Provisioning Collaboration

Page 43: Docker based Hadoop Deployment

43© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Cisco and Hortonworks’ Partnership

100% open source Hadoop Distribution, Support and Training

Integrated Infrastructures for Big Data

CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY,

SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.

Page 44: Docker based Hadoop Deployment

44© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Results of the collaboration

• Efficient Hadoop as a service

• Adoption of Docker for enterprise Hadoop deployment

Tasks Cisco InterCloud

Public Cloud Provider

HDP installation15:04 mins 11:55 mins

Teragen (avg of 3 execution)7:08 mins 22:15 mins

Terasort(avg of 3 execution)32:09 mins 60:12 mins

Teravalidate(avg of 3 execution)

2:31 mins 10:40 mins

Page 45: Docker based Hadoop Deployment

45© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Observations Future Collaboration

• Docker is maturing inside enterprises

• Interest to run Docker on top of bare

metal  

• Big data app developers are leaning

towards containerization of apps

• YARN is becoming application

deployment platform beyond big data

apps

• Demand for native containerized fully

managed app on YARN

• Run Docker natively on Openstack

• Run Docker on Yarn

• OpenStack bare metal

Page 46: Docker based Hadoop Deployment

46© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Conclusion

Data Science

IoT

BI / Analytics

Dev / Test

Blueprints

HDP

HDP + Cisco InterCloud - Efficient Hadoop-as-a-service

Page 47: Docker based Hadoop Deployment

47© 2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Learn More

Download the Hortonworks Sandbox

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about Cisco & Hortonworkshttp://hortonworks.com/partner/cisco/

More about Hortonworks’ Acquisition of SequenceIQhttp://bit.ly/1R1ktxO