building distributed systems from scratch - part 1

Distributed Systems from Scratch - Part 1

Motivation and Introduction to Apache Mesos

https://github.com/phatak-dev/distributedsystems



● Madhukara Phatak

● Big data consultant and trainer at datamantra.io

● Consult in Hadoop, Spark and Scala

● www.madhukaraphatak.com

http://datamantra.io/

http://www.madhukaraphatak.com

http://www.madhukaraphatak.com

Agenda● Idea● Motivation● Architecture of existing big data system● What we want to build?● Introduction to Apache Mesos● Distributed Shell● Function API● Custom executor

Idea

“What it takes to build a distributed processing system

like Spark?”

Motivation● First version of Spark only had 1600 lines of Scala code● Had all basic pieces of RDD and ability to run

distributed system using Mesos● Recreating the same code with step by step

understanding ● Ample of time in hand

Distributed systems from 30000ft

Distributed Storage(HDFS/S3)

Distributed Cluster management(YARN/Mesos)

Distributed Processing Systems(Spark/MapReduce)

Data Applications

Standardization of frameworks● Building a distributed processing system is like building

a web framework● Already we have excellent underneath frameworks like

YARN,Mesos for cluster management and HDFS for distributed storage

● We can build on these frameworks rather than trying to do everything from scratch

● Most of third generation systems like Spark, Flink do the same

Conventional wisdom● To build distributed system you need to read complex

papers● Understand the details of how distribution is done using

different protocols● Need to care about complexities of concurrency ,

locking etc● Need to do everything from scratch

Modern wisdom● Read spark code to understand how to build a

distributed processing system● Use Apache Mesos and YARN to tedious cluster

resource management● Use AKKA to do distributed concurrency● Use excellent proven frameworks rather inventing your

own

Why this talk in Spark meetup?

YARN/Mesos

Applications Experience sharing

Introduction sessions

Anatomy Sessions

Spark on YARN

Spark

Runtime

Data abstraction( RDD/ Dataframe)

API’s

Top down approach

Top down approach● We started discussing Spark API’s about using

introductory sessions like Spark batch, Spark streaming● Once we understood the basic API’s, we have

discussed different abstraction layers like RDD, Dataframe in our anatomy sessions

● We have also talked about spark runtime like data sources in one of our anatomy session

● Last meetup we discussed cluster management in session Spark on YARN

Bottom up approach● Start at the cluster management layer using mesos and

YARN● Build

○ Runtime○ Abstractions○ API’s

● Build application using our own abstractions and runtime

● Use all we learnt in our top down approach

Design● Heavily influenced by the way Apache Spark is built● Lot of code and design comes from Spark code● No dependency on the spark itself● Only implements very basic distributed processing

pieces● Make it work on Apache mesos and Apache YARN● Process oriented not data oriented

Spark at it’s birth - 2010● Only 1600 lines of Scala code● Used Apache Mesos for cluster management● Used Mesos messaging API for concurrency

management (no AKKA)● Used scala functions as processing abstraction rather

than DAG● No optimizations

Steps to get there● Learn Apache Mesos● Implement a simple hello world on Mesos● Implement simple function oriented API on mesos● Support third party libraries● Support shuffle● Support aggregations and counters● Implement similar functionality on YARN

Apache Mesos● Apache mesos is an open source cluster manager● It "provides efficient resource isolation and sharing

across distributed applications, or frameworks● Built at UC Berkeley● YARN ideas are inspired by Mesos● Written in C++ ● Uses linux cgroups (aka Docker) for resource isolation

Why Mesos?● Abstracts out the managing resources from processing

application● Handles cluster setup and management● With help of zookeeper, can provide master fault

tolerance● Modular and simple API● Supports different distributed processing systems on the

same cluster● Provides API’s in multiple languages like C++,Java

Architecture of Mesos

Mesos Master

Mesos slave Mesos slave Mesos slave

Hadoop Scheduler Spark Scheduler

Hadoop Executor

Spark Executor

Custom Framework

Custom executor

Frameworks

Architecture of Mesos● Mesos master - Single master node of the mesos

cluster. Entry point to any mesos application.● Mesos slaves - Each machine in cluster runs mesos

slave which is responsible for running tasks● Framework - Distributed Application build using Apache

Mesos API○ Scheduler - Entrypoint to framework. Responsible

for launching tasks○ Executor - Runs actual tasks on mesos slaves

Starting mesos● Starting master

bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos

● Starting slave bin/mesos-slave.sh --master=127.0.0.1:5050

● Accessing UIhttp://127.0.0.1:5050

● http://blog.madhukaraphatak.com/mesos-single-node-setup-ubuntu/

http://127.0.0.1:5050

http://127.0.0.1:5050

http://blog.madhukaraphatak.com/mesos-single-node-setup-ubuntu/



Hello world on Mesos● Run a simple shell command in each mesos slave● We create our own framework which is capable of

running shell commands● Our framework should these three following

components○ Client○ Scheduler○ Executor

Client● Code that submits the tasks to the framework● Task is an abstraction used by mesos to indicate any

piece of work which takes some resources. ● It’s similar to driver program in Spark● It create an instance of the framework and submits to

mesos driver● Mesos uses protocol buffer for serialization● Example code

DistributedShell.scala

Scheduler● Every framework in the apache mesos, should extend

the scheduler interface● Scheduler is the entry point for our custom framework● It’s similar to Sparkcontext● We need to override

○ resourceoffers● It acts like Application master from the YARN

Offers● Each resource in the mesos is offered as the offer● Whenever there is resource (disk,memory and cpu)

mesos offers it to all the frameworks running on it● A framework can accept the offer and use it for running

it’s own tasks● Once execution is done, it can release that resource so

that mesos can offer to other framework● Quite different than the YARN model

Executor● Once a framework receives the offer, it has to specify

the executor which actually run a piece of code on work nodes

● Executor sets up environment to run each task given by client

● Scheduler uses this executor to run each task● In our distributed shell example, we use the default

executor provided by the mesos

Task● Task is an abstraction used by mesos to indicate any

piece of work which takes some resources. ● It’s basic unit of computation of processing on mesos● It has

○ Id ○ Offer (resources)○ Executor○ Slave Id - machine on which it’s has to run

Scala Scheduler example

Running hello world● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar -

Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.mesos.helloworld.DistributedShell "/bin/echo hello"

● Mesos needs the it’s library *.so files in the classpath to connect to the mesos cluster

● Once execution is done, we can look at the all tasks ran for a given framework from mesos UI

● Let’s look the ones for our distributed shell application

Custom executor● In last example, we ran shell commands● What if we want to run some custom code which is of

the type of Java/Scala?● We need to define our own executor which setups the

environment to run the code rather than using the built in command executor

● Executors are the way mesos supports the ability different language frameworks on same cluster

Defining function task API● We are going to define an abstraction of tasks which

wraps a simple scala function● This allows to run any given pure scala function on large

cluster● This is the spark started to support distributed

processing for it’s rdd in the initial implementation● This task will extend the serializable which allows us to

serialize the function over network● Example : Task.scala

Task scheduler● Similar to earlier scheduler but uses custom executor

rather default one● Creates the TaskInfo object which contains

○ Offer○ Executor○ Serialized function as data

● getExecutorInfo uses custom script to launch our own TaskExecutor

● TaskScheduler.scala

Task executor● Task executor is our custom executor which is capable

of running our function tasks● It creates an instance of mesos executor and overrides

launchTask● It deserializes the task from the task info object which

was sent by the task scheduler● Once it deserializes the object, it runs that function in

that machine● Example : TaskExecutor.scala

CustomTasks● Once we everything in place, we can run any scala

function in the distributed manner now.● We can create different kind of scala functions and wrap

inside our function task abstraction● In our client, we create multiple tasks and submit to the

task scheduler● Observe that the API also supports the closures● Example : CustomTasks.scala

Running custom executor● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar -

Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.mesos.customexecutor.CustomTasks localhost:5050 /home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resources/run-executor.sh

● We are passing the script which has the environment to launch our custom executor

● In our example, we are using local file system. You can use the hdfs for the same

References● http://blog.madhukaraphatak.com/mesos-single-node-

setup-ubuntu/● http://blog.madhukaraphatak.com/mesos-helloworld-

scala/● http://blog.madhukaraphatak.com/custom-mesos-

executor-scala/




http://blog.madhukaraphatak.com/mesos-helloworld-scala/



http://blog.madhukaraphatak.com/custom-mesos-executor-scala/



building distributed systems from scratch - part 1

Data & Analytics