james horey (opencore.io) ferry - share and deploy big data applications with docker

24
Ferry - Share & Deploy Big Data Applications with Docker James Horey

Upload: pydata

Post on 11-Aug-2014

255 views

Category:

Data & Analytics


2 download

DESCRIPTION

Ferry is a Python-based, open-source tool to help developers share and run big data applications. Users can provision Hadoop, Cassandra, GlusterFS, and Open MPI clusters locally on their machine using YAML and afterwards distribute their applications via Dockerfiles. These capabilities are useful for data scientists experimenting with big data technologies, developers that need an accessible big data development environment, or for developers simply interested in sharing their big data applications. In this presentation, I’ll introduce you to Docker, show you how to create a simple big data application in Ferry, and discuss ways the Python community can contribute to the open-source project. I’ll also discuss future directions for Ferry with a focus on better application sharing and operational deployments.

TRANSCRIPT

Page 1: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Ferry - Share & Deploy Big Data Applications with Docker

James Horey

Page 2: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

• Writing a simple application with Bokeh

• Packaging our application with Docker

• Orchestrating our application with Ferry

Technical material can be found at: https://github.com/jhorey/pydata

Page 3: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Bokeh

Page 4: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

U.S. Census

http://api.census.gov/data/2011/acs5?get=DP03_0062E&for=county:*&in=state:06

Median income All counties California

Page 5: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Download some data

Page 6: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Let’s install Bokeh$ pip install bokeh >> Downloading/unpacking bokeh >> SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. $ apt-get install python-dev & pip install bokeh >> "gcc: error trying to exec 'cc1plus': execvp: No such file or directory $ apt-get install g++ $ pip install bokeh

RuntimeError: bokeh sample data directory does not exist, please execute bokeh.sampledata.download()

$ python >>> import bokeh.sampledata

Page 7: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

A simple application$ python plot.py Kentucky

Louisville

Page 8: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Let’s share

#!/bin/bash !# Make sure we have ‘pip’ installed apt-get install python-pip !# Install packages in right order apt-get —-yes install g++ python-dev pip install bokeh !# Now download the data python geography.py data/ python population economic Kentucky data/ !# Start the web server python webserver data/

• Your script didn’t work • Oh, I was supposed to run this as

sudo? • Ok, it still didn’t work • I get this funny error • Oh yeah, I’m running Redhat • Ok I’m at my desk, just use my

computer

Page 9: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

• Encapsulates applications in isolated containers • Makes it easy and safe to distribute applications • Easy to get started

Page 10: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Our DockerfileStart from a clean Precise image

Install stuff

Add our files

Run this when starting

$ docker build -t ferry/pydata . $ docker push ferry/pydata

Page 11: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Sharing made simple

$ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata

p1

Kernel

Hardware

Page 12: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Sharing made simple

$ docker pull ferry/pydata $ docker run -p 8000:8000 -name p1 —d ferry/pydata $ docker run -p 8001:8000 -name p2 —d ferry/pydata $ docker run -p 8002:8000 -name p3 —d ferry/pydata

p1 p2 p3

Kernel

Hardware

• Containers share basic kernel and H.W. capabilities

• No virtualization

• Containers are isolated • Access via port forwarding

You can run these commands now!

Page 13: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

• Highly scalable and fault-tolerant • Great for storing streaming data (sensors,

messages)

CREATE KEYSPACE census WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; !USE census; !CREATE TABLE acs_economic_data ( state_cd TEXT, state_name TEXT, county_cd TEXT, county_name TEXT, median INT, mean INT, capita INT, PRIMARY KEY(count_cd, state_cd) );

Page 14: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Orchestration

Web DB

Web + DB

• Simple • Full control • More work for you

• Simpler Dockerfile • More extensible • How to orchestrate?

Page 15: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

• Specify the containers that constitute your application in YAML

• Support for Hadoop, Cassandra, GlusterFS, and OpenMPI

• It’s a little bit like pip for your Docker-based runtime environment

Ferry

http://ferry.opencore.io

Page 16: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Our Application

backend: - storage: personality: "cassandra" instances: 1 connectors: - personality: "ferry/pydata-cassandra" ports: ["8000:8000"]

# The cassandra-client base comes with the various drivers # pre-installed. FROM ferry/cassandra-client NAME ferry/pydata-cassandra !# Place the start scripts in the events directories so they # are started when the connector is brought up. ADD ./scripts/startcas.sh /service/runscripts/start/ ADD ./scripts/restartcas.sh /service/runscripts/restart/ RUN chmod a+x /service/runscripts/start/startcas.sh RUN chmod a+x /service/runscripts/restart/restartcas.sh

+

Page 17: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Easy to share (again)

$ ferry start cassandra.yml sa-df8d0aa6 $ ferry ps UUID Storage Compute Connectors Status Base Time ---- ------- ------- ---------- ------ ---- ---- sa-df8d0aa6 se-54ed4e93 se-a5350a8d running cassandra.yml

$ ferry ssh sa-df8d0aa6 root@client-se-a5350a8d:~# ps -eaf | grep python root 144 1 0 19:49 ? 00:00:00 python /home/ferry/pydata/bokeh/webserver.py /home/ferry/pydata/data

Page 18: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

What’s it doing?$ ferry start cassandra.yml

Web C* C*

root@client-se-a5350a8d:~# env | grep BACK BACKEND_STORAGE_TYPE=cassandra BACKEND_STORAGE_IP=10.1.0.12

Generate!Config

Page 19: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

What’s it doing?$ ferry start yarn

Client

Y Y

root@client-se-b597cb21:~# env | grep BACK BACKEND_STORAGE_TYPE=gluster BACKEND_STORAGE_IP=10.1.0.18 BACKEND_COMPUTE_TYPE=yarn BACKEND_COMPUTE_IP=10.1.0.15

G G

Page 20: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

What’s it doing?$ ferry stop sa-c6cbb572

Client

Y Y

G G

Page 21: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Next steps$ ferry share sa-df8d0aa6

w c* c*

Hardware

w c* c*

Hardware

w c* c*

Hardware

Page 22: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Next steps$ ferry deploy sa-df8d0aa6

w c* c*

Hardware

w

c* c*

Hardware

Hardware Hardware

VPC

EC2

S3

Page 23: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

• Even simple applications can be complicated to install and run

• Docker helps quite a bit with this

• Ferry helps build out big data applications

Page 24: James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with Docker

Thank you! !

James [email protected]

!

Ferry ferry.opencore.io @open_core_io