mesos: the operating system for your datacenter
DESCRIPTION
Maybe you’ve heard of Mesos—that thing that you can run Hadoop on. I think it powers Twitter? Isn’t it an Apache project, or something? In this talk, we’ll learn all about Mesos—what it is, how you can leverage it to simplify your infrastructure and reduce AWS/cloud computing costs, and why you should develop your next application on top of it. This talk will give you the tools you need to understand whether Mesos is the right fit for your infrastructure, and several starting points for learning more about Mesos.TRANSCRIPT
Mesos: The Datacenter Opera1ng System
David Greenberg Two Sigma
Who am I?
• Architected project to build a massive Mesos cluster
• Building custom framework and leveraging open source
The Plan
What is Mesos?
How can I use Mesos?
How can I build on Mesos?
What is Mesos?
A long 1me ago…
Are you done with the
machine? I need to load my cards.
Lol no; maybe tomorrow.
1957
Oh man! Let’s all share the
computer, AT THE SAME TIME!
John McCarthy Popularized Timesharing
A long 1me ago…
Are you done with the Hadoop cluster? I need to run my analy1cs job.
Lol no; maybe tomorrow.
2010
Oh man! Let’s all share the cluster,
AT THE SAME TIME!
Ben Hindman Popularized Mesos
Good ideas today mirror good ideas of yesteryear
Mesos: an Opera1ng System
Isola1on
Resource Sharing
Common Infrastructure
• read(), write(), open() • bind(), connect() • apt-‐get, yum
• launchTask(), killTask(), statusUpdate()
• Docker
Distributed System* Anatomy
Workers
Coordinator
* Excluding peer-‐to-‐peer systems
Sta1c Par11oning
Coordinator (Hadoop) Coordinator (Storm)
Mesos (slaves)
Mesos: a Level of Indirec1on
Coordinator
Mesos (master)
Coordinator
Mesos (slaves)
Mesos: a Level of Indirec1on
Coordinator
Mesos (master)
Coordinator
Mesos (slaves)
Mesos: a Level of Indirec1on
Coordinator
Mesos (master)
Coordinator
Mesos (slaves)
Mesos: a Level of Indirec1on
Coordinator
Mesos (master)
Coordinator
Mesos (slaves)
Mesos: a Level of Indirec1on
Coordinator
Mesos (master)
Coordinator
Coordina1ng Execu1on
≈ Scheduling
Mesos (slaves)
Coordinator
Mesos (master)
s/Coordinator/Scheduler/
Mesos (slaves)
Scheduler
Mesos (master)
s/Coordinator/Scheduler/
Mesos (slaves)
JobTracker (Scheduler)
Mesos (master)
Apache Hadoop
Distributed System
≈ (Mesos) framework
a Mesos framework is a distributed system that has a coordinator
a Mesos framework is a distributed system that has a coordinator
a Mesos framework is a distributed system that has a scheduler a
a Mesos framework is an app for your cluster
How can I use Mesos?
Tons of Flexibility!
Jenkins
• Con1nuous build server
• Just install a plugin!
Hadoop
• Mul1-‐cluster isola1on • Fast startup
• Just run the repacked Cloudera CDH 4.2.1 MR1 distribu1on for Mesos
Marathon
• PaaS on Mesos • init.d for the cluster • Docker support • Scales at the click of a budon
• Manages edge routers -‐ HAProxy
Chronos
• Distributed cron • Supports job dependencies
• REST API
Aurora
• Advanced PaaS on Mesos • Powers Twider • Supports phased rollouts • Supports complex deployments
Spark
• In memory Map Reduce, built for “Medium Data”
• Supports SQL as well as Java, Python, and Scala
• Designed for interac1ve analysis via REPL
How do I use these?
• Free online interac1ve tutorials! – hdp://mesosphere.io/learn
• Covers all of the previously men1oned and many more
How can I build on Mesos?
Cluster Manager Status Quo
Cluster Manager
Applica?on/Human
Specifica1on
The specifica1on includes as much informa1on as possible to assist the cluster manager in scheduling and execu1on
Cluster Manager Status Quo
Cluster Manager
Applica?on/Human Wait for task to be executed
Cluster Manager Status Quo
Cluster Manager
Applica?on/Human
Result
Problems with Specifica1ons ① Hard to specify certain desires or constraints ② Hard to update specifica1ons dynamically as
tasks execute and finish/fail
An Alterna1ve Model
Mesos
Scheduler
request 3 CPUs 2 GB RAM
• A request is purposely simplified subset of a specifica1on
• It is just the required resources at that point in )me
What should you do if you can’t sa1sfy a request?
What should you do if you can’t sa1sfy a request?
① Wait un?l you can …
What should you do if you can’t sa1sfy a request?
① Wait un?l you can …
② Offer best you can immediately
What should you do if you can’t sa1sfy a request?
① Wait un?l you can …
② Offer best you can immediately
Mesos Model
Mesos
Scheduler
offer hostname 4 CPUs 4 GB RAM
• Resources are allocated via resource offers
• A resource offer represents a snapshot of available resources that a scheduler can use to run tasks
An Analogue: non-‐blocking sockets
Kernel
Applica?on
write(s, buffer, size);!
An Analogue: non-‐blocking sockets
Kernel
Applica?on
42 of 100 bytes written!!
offer hostname 4 CPUs 4 GB RAM
offer hostname 4 CPUs 4 GB RAM
offer hostname 4 CPUs 4 GB RAM
Mesos Model
Mesos
Scheduler
offer hostname 4 CPUs 4 GB RAM
Scheduler uses the offers to decide what tasks to run
Mesos Model
Mesos
Scheduler
Scheduler uses the offers to decide what tasks to run “Two-‐level scheduling”
task 3 CPUs 2 GB RAM
Two-‐level Scheduling
• Mesos: controls resource alloca+ons to schedulers
• Schedulers: make decisions about what tasks to run given allocated resources
Two-‐level Scheduling Elsewhere
• Mesos influenced by opera1ng system supported user-‐space scheduling – E.g. green threads, gorou1nes
• Mesos is designed less like a “cluster manager” and more like an opera1ng system (or kernel)
Language Bindings
Should I build it on Mesos?
• Theme of MesosCon: it’s easy to build frameworks
• Open source and proprietary frameworks are being created all the 1me – Two Sigma – Neplix – Twider – Hubspot
But should I really build it on Mesos?
• Most users just use Marathon, Hadoop, Spark, and Chronos
• Why did we build our own? – Exo1c workload
The Plan, redux
What is Mesos?
How can I use Mesos?
How can I build on Mesos?
Ques1ons?
Thank you