building distributed systems from scratch - part 1
TRANSCRIPT
Distributed Systems from Scratch - Part 1
Motivation and Introduction to Apache Mesos
https://github.com/phatak-dev/distributedsystems
● Madhukara Phatak
● Big data consultant and trainer at datamantra.io
● Consult in Hadoop, Spark and Scala
● www.madhukaraphatak.com
Agenda● Idea● Motivation● Architecture of existing big data system● What we want to build?● Introduction to Apache Mesos● Distributed Shell● Function API● Custom executor
Idea
“What it takes to build a distributed processing system
like Spark?”
Motivation● First version of Spark only had 1600 lines of Scala code● Had all basic pieces of RDD and ability to run
distributed system using Mesos● Recreating the same code with step by step
understanding ● Ample of time in hand
Distributed systems from 30000ft
Distributed Storage(HDFS/S3)
Distributed Cluster management(YARN/Mesos)
Distributed Processing Systems(Spark/MapReduce)
Data Applications
Standardization of frameworks● Building a distributed processing system is like building
a web framework● Already we have excellent underneath frameworks like
YARN,Mesos for cluster management and HDFS for distributed storage
● We can build on these frameworks rather than trying to do everything from scratch
● Most of third generation systems like Spark, Flink do the same
Conventional wisdom● To build distributed system you need to read complex
papers● Understand the details of how distribution is done using
different protocols● Need to care about complexities of concurrency ,
locking etc● Need to do everything from scratch
Modern wisdom● Read spark code to understand how to build a
distributed processing system● Use Apache Mesos and YARN to tedious cluster
resource management● Use AKKA to do distributed concurrency● Use excellent proven frameworks rather inventing your
own
Why this talk in Spark meetup?
YARN/Mesos
Applications Experience sharing
Introduction sessions
Anatomy Sessions
Spark on YARN
Spark
Runtime
Data abstraction( RDD/ Dataframe)
API’s
Top down approach
Top down approach● We started discussing Spark API’s about using
introductory sessions like Spark batch, Spark streaming● Once we understood the basic API’s, we have
discussed different abstraction layers like RDD, Dataframe in our anatomy sessions
● We have also talked about spark runtime like data sources in one of our anatomy session
● Last meetup we discussed cluster management in session Spark on YARN
Bottom up approach● Start at the cluster management layer using mesos and
YARN● Build
○ Runtime○ Abstractions○ API’s
● Build application using our own abstractions and runtime
● Use all we learnt in our top down approach
Design● Heavily influenced by the way Apache Spark is built● Lot of code and design comes from Spark code● No dependency on the spark itself● Only implements very basic distributed processing
pieces● Make it work on Apache mesos and Apache YARN● Process oriented not data oriented
Spark at it’s birth - 2010● Only 1600 lines of Scala code● Used Apache Mesos for cluster management● Used Mesos messaging API for concurrency
management (no AKKA)● Used scala functions as processing abstraction rather
than DAG● No optimizations
Steps to get there● Learn Apache Mesos● Implement a simple hello world on Mesos● Implement simple function oriented API on mesos● Support third party libraries● Support shuffle● Support aggregations and counters● Implement similar functionality on YARN
Apache Mesos● Apache mesos is an open source cluster manager● It "provides efficient resource isolation and sharing
across distributed applications, or frameworks● Built at UC Berkeley● YARN ideas are inspired by Mesos● Written in C++ ● Uses linux cgroups (aka Docker) for resource isolation
Why Mesos?● Abstracts out the managing resources from processing
application● Handles cluster setup and management● With help of zookeeper, can provide master fault
tolerance● Modular and simple API● Supports different distributed processing systems on the
same cluster● Provides API’s in multiple languages like C++,Java
Architecture of Mesos
Mesos Master
Mesos slave Mesos slave Mesos slave
Hadoop Scheduler Spark Scheduler
Hadoop Executor
Spark Executor
Custom Framework
Custom executor
Frameworks
Architecture of Mesos● Mesos master - Single master node of the mesos
cluster. Entry point to any mesos application.● Mesos slaves - Each machine in cluster runs mesos
slave which is responsible for running tasks● Framework - Distributed Application build using Apache
Mesos API○ Scheduler - Entrypoint to framework. Responsible
for launching tasks○ Executor - Runs actual tasks on mesos slaves
Starting mesos● Starting master
bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/mesos
● Starting slave bin/mesos-slave.sh --master=127.0.0.1:5050
● Accessing UIhttp://127.0.0.1:5050
● http://blog.madhukaraphatak.com/mesos-single-node-setup-ubuntu/
Hello world on Mesos● Run a simple shell command in each mesos slave● We create our own framework which is capable of
running shell commands● Our framework should these three following
components○ Client○ Scheduler○ Executor
Client● Code that submits the tasks to the framework● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources. ● It’s similar to driver program in Spark● It create an instance of the framework and submits to
mesos driver● Mesos uses protocol buffer for serialization● Example code
DistributedShell.scala
Scheduler● Every framework in the apache mesos, should extend
the scheduler interface● Scheduler is the entry point for our custom framework● It’s similar to Sparkcontext● We need to override
○ resourceoffers● It acts like Application master from the YARN
Offers● Each resource in the mesos is offered as the offer● Whenever there is resource (disk,memory and cpu)
mesos offers it to all the frameworks running on it● A framework can accept the offer and use it for running
it’s own tasks● Once execution is done, it can release that resource so
that mesos can offer to other framework● Quite different than the YARN model
Executor● Once a framework receives the offer, it has to specify
the executor which actually run a piece of code on work nodes
● Executor sets up environment to run each task given by client
● Scheduler uses this executor to run each task● In our distributed shell example, we use the default
executor provided by the mesos
Task● Task is an abstraction used by mesos to indicate any
piece of work which takes some resources. ● It’s basic unit of computation of processing on mesos● It has
○ Id ○ Offer (resources)○ Executor○ Slave Id - machine on which it’s has to run
Scala Scheduler example
Running hello world● java -cp target/scala-2.11/distrubutedsystemfromscratch_2.11-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.mesos.helloworld.DistributedShell "/bin/echo hello"
● Mesos needs the it’s library *.so files in the classpath to connect to the mesos cluster
● Once execution is done, we can look at the all tasks ran for a given framework from mesos UI
● Let’s look the ones for our distributed shell application
Custom executor● In last example, we ran shell commands● What if we want to run some custom code which is of
the type of Java/Scala?● We need to define our own executor which setups the
environment to run the code rather than using the built in command executor
● Executors are the way mesos supports the ability different language frameworks on same cluster
Defining function task API● We are going to define an abstraction of tasks which
wraps a simple scala function● This allows to run any given pure scala function on large
cluster● This is the spark started to support distributed
processing for it’s rdd in the initial implementation● This task will extend the serializable which allows us to
serialize the function over network● Example : Task.scala
Task scheduler● Similar to earlier scheduler but uses custom executor
rather default one● Creates the TaskInfo object which contains
○ Offer○ Executor○ Serialized function as data
● getExecutorInfo uses custom script to launch our own TaskExecutor
● TaskScheduler.scala
Task executor● Task executor is our custom executor which is capable
of running our function tasks● It creates an instance of mesos executor and overrides
launchTask● It deserializes the task from the task info object which
was sent by the task scheduler● Once it deserializes the object, it runs that function in
that machine● Example : TaskExecutor.scala
CustomTasks● Once we everything in place, we can run any scala
function in the distributed manner now.● We can create different kind of scala functions and wrap
inside our function task abstraction● In our client, we create multiple tasks and submit to the
task scheduler● Observe that the API also supports the closures● Example : CustomTasks.scala
Running custom executor● java -cp target/scala-2.11/DistrubutedSystemFromSatch-assembly-1.0.jar -
Djava.library.path=$MESOS_HOME/src/.libs com.madhukaraphatak.mesos.customexecutor.CustomTasks localhost:5050 /home/madhu/Dev/mybuild/DistrubutedSystemFromScratch/src/main/resources/run-executor.sh
● We are passing the script which has the environment to launch our custom executor
● In our example, we are using local file system. You can use the hdfs for the same
References● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/● http://blog.madhukaraphatak.com/mesos-helloworld-
scala/● http://blog.madhukaraphatak.com/custom-mesos-
executor-scala/