openshift yarn - strata 2014
DESCRIPTION
Learn about the exciting integration work that has been done with YARN, Red Hat OpenShift and Kurbernetes Docker container orchestration. During this presentation we will cover the basics of this exciting YARN integration effort and then launch into a demo. You won’t want to miss seeing web application docker container, Storm, and Hive SQL queries all running in the same HDP cluster!TRANSCRIPT
Page 1 © Hortonworks Inc. 2014
OpenShift scheduling docker containers in YARN via Kubernetes
Page 2 © Hortonworks Inc. 2014
Static website Web frontend User DB Queue Analytics DB
Development VM QA server Public Cloud Contributor’s laptop
Docker: a shipping container system for code Mul$p
licity
of Stacks
Mul$p
licity
of
hardware
environm
ents
Production Cluster
Customer Data Center
Do services and apps interact
appropriately?
Can I migrate
smoothly and quickly
…that can be manipulated using standard operations and run consistently on virtually any hardware platform
An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container…
Page 3 © Hortonworks Inc. 2014
Why are Docker containers lightweight?
HDP 2.1 Hortonworks Data Platform
I/O performance comparable to Bare Metal
Page 4 © Hortonworks Inc. 2014
Kubernetes – Container Orchestrator
• Service for container cluster management • Allows deploying and managing applications running
on multiple hosts using docker • Open sourced by Google • Supports GCE, CoreOS, Azure, vSphere • Used to manage Docker containers as a default
implementation • Master – maintain state of Kubernetes Server runtime
• Scheduler, API server, registries, storage • Minions – represent the host were containers created
• Kubelet – manage pod and container lifecycle
Page 5 © Hortonworks Inc. 2014
OpenShift
• Red Hat’s platform as a service for applications in the cloud that supports both public and private cloud • Provides high level abstraction for applications on top
of containers allowing easy scaling, service discovery, and deployment
• Enable Docker image authors to easily deliver reusable application components, including highly available databases, monitoring and log aggregation tools, service discovery platforms, and prepackaged web applications
• Allow developers to deeply customize their runtime environments while preserving operational support at scale for those applications
Page 6 © Hortonworks Inc. 2014 6
Kubernetes/YARN/Docker Integration
YARN Node Manager
YARN Node Manager
Kubernetes AppMaster
Page 7 © Hortonworks Inc. 2014
Understanding Storm via a Real-World Use Case
A large truck fleet company wants to, in real-time, capture events of drivers in their trucks on the road across the US. Sensor devices on trucks captures all kinds of events varying from vehicle diagnostics to driver infractions. § E.g.: Excessive breaking/acceleration, speeding, start/stop, etc..
Initial Business Requirement: § Stream these events in, filter on violations and do real-time alerting
on “lots” of erratic behavior over a short period of time..
Page 8 © Hortonworks Inc. 2014
High Level Architecture Truck Streaming Data
T(N)T(2)T(1)
Interactive Query
TEZ
Perform Ad Hoc Queries on driver/truck
events and other related data
sources
Messaging Grid(WMQ, ActiveMQ, Kafka)
truckeventsTOPIC
Stream Processing with Storm
Kafka Spout
HBase BoltMonitoring
Bolt
HDFS Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to HDFS
Alerts
ActiveMQAlert Topic
Create Alerts
Real-time Serviing with HBase
driver dangerous
events
driver dangerous events
count
Write to HBase
Update Alert Thresholds
Spring WebApp with SockJS WebSockets
Real-Time Streaming Driver Monitoring App
Query driver events in real-time
Consume alerts in real-time
Batch Analytics
MR2
Do batch analysis/models & update HBase
with right thresholds for
alerts
Page 9 © Hortonworks Inc. 2014
HDP Provides a Single Data Platform Truck Streaming Data
T(N)T(2)T(1)
Interactive Query
TEZ
Perform Ad Hoc Queries on driver/truck
events and other related data
sources
Messaging Grid(WMQ, ActiveMQ, Kafka)
truckeventsTOPIC
Stream Processing with Storm
Kafka Spout
HBase BoltMonitoring
Bolt
HDFS Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to HDFS
Alerts
ActiveMQAlert Topic
Create Alerts
Real-time Serviing with HBase
driver dangerous
events
driver dangerous events
count
Write to HBase
Update Alert Thresholds
Spring WebApp with SockJS WebSockets
Real-Time Streaming Driver Monitoring App
Query driver events in real-time
Consume alerts in real-time
Batch Analytics
MR2
Do batch analysis/models & update HBase
with right thresholds for
alerts YARN Enables 4 different apps/workloads on a single cluster
HDP Data Lake
Page 10 © Hortonworks Inc. 2014
HDP Provides a Single Data Platform Truck Streaming Data
T(N)T(2)T(1)
Interactive Query
TEZ
Perform Ad Hoc Queries on driver/truck
events and other related data
sources
Messaging Grid(WMQ, ActiveMQ, Kafka)
truckeventsTOPIC
Stream Processing with Storm
Kafka Spout
HBase BoltMonitoring
Bolt
HDFS Bolt
High Speed Ingestion
Distributed Storage
HDFS
Write to HDFS
Alerts
ActiveMQAlert Topic
Create Alerts
Real-time Serviing with HBase
driver dangerous
events
driver dangerous events
count
Write to HBase
Update Alert Thresholds
Spring WebApp with SockJS WebSockets
Real-Time Streaming Driver Monitoring App
Query driver events in real-time
Consume alerts in real-time
Batch Analytics
MR2
Do batch analysis/models & update HBase
with right thresholds for
alerts YARN Enables 4 different apps/workloads on a single cluster
HDP Data Lake
Page 11 © Hortonworks Inc. 2014
Demo: OpenShift scheduling Docker container in YARN
Running in Docker