fully fault tolerant real technical lead rahul …...fully fault tolerant real time data pipeline...
TRANSCRIPT
![Page 1: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/1.jpg)
Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos
Rahul KumarTechnical Lead
LinuxCon / ContainerCon - Berlin, Germany
![Page 2: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/2.jpg)
Agenda
● Data Pipeline
● Mesos + Docker
● Reactive Data Pipeline
![Page 3: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/3.jpg)
Goal
Analyzing data always have great benefits and is one of the greatest challenge for an organization.
![Page 4: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/4.jpg)
Today’s business generates massive amount of digital data.
![Page 5: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/5.jpg)
which is cumbersome to store, transport and analyze
![Page 6: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/6.jpg)
Making distributed system and off-loading workload to commodity clusters is one of the better approach to solve data problem
![Page 7: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/7.jpg)
![Page 8: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/8.jpg)
Characteristics Of a distributed system❏ Resource Sharing
❏ Openness
❏ Concurrency
❏ Scalability
❏ Fault Tolerance
❏ Transparency
![Page 9: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/9.jpg)
Collect
Store
Process
Analyze
![Page 10: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/10.jpg)
Data Center
![Page 11: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/11.jpg)
Manually Scale Frameworks & Install services
![Page 12: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/12.jpg)
Complex
Very Limited
Inefficient
Low Utilization
![Page 13: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/13.jpg)
Static Partitioning Blocker for Fault Tolerant data pipeline
![Page 14: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/14.jpg)
Failure make it even more complex to manage
![Page 15: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/15.jpg)
Apache Mesos
“Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems
to easily be built and run effectively.”
![Page 16: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/16.jpg)
![Page 17: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/17.jpg)
Mesos Features● Scalability: scale up to 10,000s of nodes
● Fault-tolerant: replicated master and slaves using ZooKeeper
● Docker support: Support for Docker containers
● Native Container: Linux Native isolation between tasks with Linux
Containers
● Scheduling: Multi-resource scheduling (memory, CPU, disk, and
ports)
● API supports: Java, Python and C++ APIs for developing new parallel
applications
● Monitoring: Web UI for viewing cluster state
![Page 18: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/18.jpg)
![Page 19: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/19.jpg)
Resource Isolation
![Page 20: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/20.jpg)
![Page 21: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/21.jpg)
![Page 22: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/22.jpg)
Docker Containerizer
Mesos adds the support for launching tasks that contains Docker images
Users can either launch a Docker image as a Task, or as an Executor.
To run the mesos-agent to enable the Docker Containerizer, “docker” must be set as one of the containerizers option
mesos-agent --containerizers=docker,mesos
![Page 23: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/23.jpg)
![Page 24: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/24.jpg)
Mesos Frameworks
● Aurora: Aurora was developed at Twitter and the migrated to Apache Project later. Aurora is a framework that keeps service running across a shared pool of machines, and responsible for keeping them running forever.
● Marathon: It is a framework for container orchestration for Mesos. Marathon helps to run other framework on Mesos. Marathon also runs other application container such as Jetty, JBoss Server, Play Server.
● Chronos: Fault tolerance job scheduler for Mesos, It was developed at Airbnb as replacement of cron.
![Page 25: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/25.jpg)
Resilient Distributed Datasets (RDDs)
- Big collection of data
which is:
- Immutable
- Distributed
- Lazily evaluated
- Type Inferred
- Cacheable
Spark Stack
![Page 26: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/26.jpg)
Many big-data applications need to process large data streams in near-real time
Monitoring Systems
Alert SystemsComputing Systems
Why Spark Streaming?
![Page 28: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/28.jpg)
Framework for large scale stream processing
➔ Created at UC Berkeley
➔ Scales to 100s of nodes
➔ Can achieve second scale latencies
➔ Provides a simple batch-like API for implementing complex algorithm
➔ Can absorb live data streams from Kafka, Flume, ZeroMQ, Kinesis etc.
What is Spark Streaming?
![Page 29: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/29.jpg)
Run a streaming computation as a series of very small, deterministic batch jobs
- Chop up the live stream into batches of X seconds
- Spark treats each batch of data as RDDs and processes them using RDD operations
- Finally, the processed results of the RDD operations are returned in batches
Spark Streaming
![Page 30: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/30.jpg)
Point of Failure
Simple Streaming Pipeline
![Page 31: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/31.jpg)
![Page 32: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/32.jpg)
● To use Mesos from Spark, you need a Spark binary package available in a place accessible (http/s3/hdfs) by Mesos, and a Spark driver program configured to connect to Mesos.
● Configuring the driver program to connect to Mesos:
val sconf = new SparkConf() .setMaster("mesos://zk://10.121.93.241:2181,10.181.2.12:2181,10.107.48.112:2181/mesos") .setAppName("MyStreamingApp") .set("spark.executor.uri","hdfs://Sigmoid/executors/spark-1.3.0-bin-hadoop2.4.tgz") .set("spark.mesos.coarse", "true") .set("spark.cores.max", "30") .set("spark.executor.memory", "10g")
val sc = new SparkContext(sconf) val ssc = new StreamingContext(sc, Seconds(1)) ...
Spark Streaming over a HA Mesos Cluster
![Page 33: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/33.jpg)
Real-time stream processing systems must be operational 24/7, which requires them to recover from all kinds of failures in the system.
● Spark and its RDD abstraction is designed to seamlessly handle failures of any worker nodes in the cluster.
● In Streaming, driver failure can be recovered with checkpointing application state.● Write Ahead Logs (WAL) & Acknowledgements can ensure 0 data loss.
Spark Streaming Fault-tolerance
![Page 34: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/34.jpg)
Simple Fault-tolerant Streaming Infra
![Page 35: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/35.jpg)
![Page 36: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/36.jpg)
● Figure out the bottleneck : CPU, Memory, IO, Network
● If parsing is involved, use the one which gives
high performance.
● Proper Data modeling
● Compression, Serialization
Creating a scalable pipeline
![Page 37: Fully Fault Tolerant Real Technical Lead Rahul …...Fully Fault Tolerant Real Time Data Pipeline with Docker and Mesos Rahul Kumar Technical Lead LinuxCon / ContainerCon - Berlin,](https://reader033.vdocuments.site/reader033/viewer/2022042302/5ecd08577ca86d16fe6f9da3/html5/thumbnails/37.jpg)
Thank You@rahul_kumar_aws
LinuxCon / ContainerCon - Berlin, Germany