diego: re-envisioning the elastic runtime (cloud foundry summit 2014)
TRANSCRIPT
Cloud Controller
What is being rewritten?
Stage App
Run n App Instances(and keep them running)
http://…
Push App> cf
Route to App
DEA Pool (Droplet Execution Agent)
What is being rewritten?
http://…
Push App> cf Cloud
Controller
Router
(API)
What is being rewritten?
http://…
Push App> cf Cloud
Controller
Router
DEA Pool (Droplet Execution Agent)
(API)
What is being rewritten?
http://…
Push App> cf Cloud
Controller
Router
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps(API)
What is being rewritten?
http://…
Push App> cf Cloud
Controller
Router
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization(API)
What is being rewritten?
http://…
Push App> cf Cloud
Controller
Router
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
Health Manager
(API)
What is being rewritten?
Push App
http://…
> cf Cloud Controller
Router
Health Manager
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
NATS (message bus)
(API)
What is being rewritten?
Push App
http://…
> cf Cloud Controller
Router
Health Manager
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
NATS (message bus)
(API)
Why rewrite?
Push App
http://…
> cf Cloud Controller
Router
Health Manager
NATS (message bus)
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
Why rewrite?
Cloud Controller
Router
Health Manager
NATS (message bus)
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
Tight CouplingPoor separation of concerns
Orchestration
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
“Make it so”
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
start/stop
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startstart
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startstart
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startstart
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startfails
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startfails
Why rewrite?
Cloud ControllerTight Coupling
Poor separation of concerns
Orchestration
> cf scale
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
startfails
Too much responsiblity
Why rewrite?
Tight CouplingPoor separation of concerns
Cloud Controller
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
TriangularDependencies
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
DEA
Warden
Cloud Controller
DEA
Warden
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
DEA
Warden
Cloud Controller
DEA
WardenWhen it’s time to
upgrade the DEAsWhen it’s time to
upgrade the DEAs
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
DEA
Warden
Cloud Controller
DEA
WardenWhen it’s time to
upgrade the DEAs we perform a rolling deploy
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
Cloud Controller
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
DEA
Warden
DEA
Warden
Cloud Controller
“bye!”
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
“bye!”DEA
Warden
DEA
Warden
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
Warden
Cloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
??
??
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
start!
all clear!
Problematic
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
all clear!
Problematic
start!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
all clear!
Problematic
start!
Why rewrite?
Tight CouplingPoor separation of concerns
TriangularDependencies
Health Manager
DEA
Warden
DEA
WardenCloud Controller
start!
“bye!”DEA
Warden
DEA
Warden
all clear!
Problematic
start!
Why rewrite?
Tight Coupling Poor separation of concernsTriangularDependenciesOrchestration
complex interactions
Why rewrite?
Tight Coupling Poor separation of concernsTriangularDependenciesOrchestration
hard to testcomplex interactions
Why rewrite?
Tight Coupling Poor separation of concerns
hard to testhard to reason through
complex interactions
TriangularDependenciesOrchestration
Why rewrite?
Domain Specific (app, app, app, app)
Push App
http://…
> cf Cloud Controller
Router
Health Manager
NATS (message bus)
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
App
Push App
http://…
> cf Cloud Controller
Router
Health Manager
NATS (message bus)
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
App
Why rewrite?
Domain Specific (app, app, app, app)
App AppAppsApps
App
AppApp
AppApp
AppApp
AppApp
App
App
AppApp
App
AppApp
AppApp
Why rewrite?
Domain Specific (app, app, app, app)
Hard to extend to new domains (e.g. cron-like jobs)
Push App
http://…
> cf Cloud Controller
Router
Health Manager
NATS (message bus)
DEA Pool (Droplet Execution Agent)
DEA
Staging Apps
Running Apps
Warden
Containerization
App
App AppAppsApps
App
AppApp
AppApp
AppApp
AppApp
App
App
AppApp
App
AppApp
AppApp
DEA
Staging Apps
Running Apps
Warden
Containerization
Why rewrite?
Platform Specific
DEA
Staging Apps
Running Apps
Warden
Containerization
DEA
Staging Apps
Running Apps
Warden
Containerization
DEA
Staging Apps
Running Apps
Warden
Containerization
Why rewrite?
Platform Specific
DEA
Staging Apps
Running Apps
Warden
Containerization
DEA
Staging Apps
Running Apps
Warden
Containerization
Why rewrite?
Platform Specific
hard to maintain
DEA
Staging Apps
Running Apps
Warden
Containerization
Why rewrite?
Long-lived processesTons of concurrency
Low-level os interactions
Why rewrite?
Platform SpecificDomain Specific (app, app, app, app)
Tight Coupling Poor separation of concernsOrchestration
TriangularDependencies
Hard to add new features
to maintain existing features
Show me Diego
Strong concurrency support
Written in Golang
Strongly typed
Explicit error handling
Promotes developer discipline
Strong low-level OS support
Show me Diego
Domain Specific (app, app, app, app) One-off Tasks
(guaranteed to only run once)
Long Running Processes(n monitored instances)
The Right(?) Abstraction
Cloud Controller
Show me Diego
The Right(?) Abstraction
Executor Pool
Run Tasks
Launch Long Running
Processes
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
Run Tasks
Launch Long Running
Processes
StagerStage App Run Task
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
Run Tasks
Launch Long Running
Processes
App-ManagerRun App Launch LRP
StagerStage App Run Task
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-ManagerRun App Launch LRP
Run Tasks
Launch Long Running
Processes
StagerStage App Run Task
Express specific domain
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-ManagerLaunch LRP
Run Tasks
Launch Long Running
Processes
StagerRun Task
Express specific domain In terms of generic recipes
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-Manager
Stager
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Launch LRP
Run Task
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-Manager
Stager
Express specific domain In terms of generic recipes
Exec Recipes
Exec
Run Tasks
Launch LRPs
Rep
Launch LRP
Run Task
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-Manager
Stager
Express specific domain In terms of generic recipes
Exec Recipes
Exec Garden
Manage Containers
Run Tasks
Launch LRPs
Rep
Launch LRP
Run Task
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
The Right(?) Abstraction
App-Manager
Stager
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
Launch LRP
Run Task
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
App-Manager
Stager
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
GenericSpecific
Launch LRP
Run Task
Run App
Stage App
Cloud Controller
Executor Pool
Show me Diego
App-Manager
Stager
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
GenericSpecific
Launch LRP
Run Task
Run App
Stage App
New features go here!(e.g. cron-like tasks)
Cloud Controller
Executor Pool
Show me Diego
App-Manager
Stager
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
GenericSpecific
Flexibility
Launch LRP
Run Task
Run App
Stage App
New features go here!(e.g. cron-like tasks)
Show me Diego
Platform Independent ✓
Cloud Controller
Executor Pool
App-ManagerRun App Launch LRP
StagerStage App Run Task
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
Cloud Controller
Executor Pool
App-ManagerRun App Launch LRP
StagerStage App Run Task
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
Show me Diego
Platform Independent ✓
✓ ✓
✓
✓ ✓ ✓
Cloud Controller
Executor Pool
App-ManagerRun App Launch LRP
StagerStage App Run Task
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
Show me Diego
✓ ✓
✓
✓ ✓ ✓
Platform Independent ✓
Show me Diego
Linux Backend
Run Containers
Win Backend
Run Containers
Just 2 Things:
Platform Independent ✓
Show me Diego
Linux Backend
Run Containers
Win Backend
Run Containers
Just 2 Things:
Platform Independent ✓
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Start!
Start!
Stop!
Orchestration
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3
Orchestration
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3Hold auctions…
Orchestration
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3Hold auctions…… to distribute LRPs
Orchestration
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3Hold auctions…… to distribute LRPs
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3Hold auctions…… to distribute LRPs
TriangularDependencies
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3
TriangularDependencies
self managingmonitoringhealing
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3
self managingmonitoringhealing
TriangularDependencies
Health Manager
Cloud Controller
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3
self managingmonitoringhealing
eventually consistent
TriangularDependencies
Show me Diego
Cloud Controller
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
Want 3 self managingmonitoringhealing
eventually consistent
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
self managingmonitoringhealing
eventually consistent
robust
Cloud Controller Want 3
Show me Diego
Rep
Exec
Rep
Exec
Rep
Exec
Rep
Exec
distributed auction is complex
emergent behavior
Simulation-Driven Development
complex interactions hard to test hard to reason through
Show me Diego
simulation drivenCloud
Controller
Executor Pool
App-ManagerRun App Launch LRP
StagerStage App Run Task
Express specific domain In terms of generic recipes
Run Tasks
Launch LRPs
Rep
Exec Recipes
Exec Garden
Manage Containers
Linux Backend
Run Containers
Show me Diego
executor
rep
stager14 small single-responsibility components! app-manager
auctioneer
converger
etcd-metrics-server
etcdfile-server
gardenlinux-circus
metricz
route-emitter
tps
simulation driven
complex interactions hard to test hard to reason through
Show me Diego
executor
rep
stager app-manager
auctioneer
converger
etcd-metrics-server
etcdfile-server
gardenlinux-circus
metricz
route-emitter
tps
✓
✓✓
✓ ✓✓✓ ✓
✓
✓✓✓✓
✓
unit-tested✓simulation driven
complex interactions hard to test hard to reason through
Show me Diego
executor
rep
stager app-manager
auctioneer
converger
etcd-metrics-server
etcdfile-server
gardenlinux-circus
metricz
route-emitter
tps
✓
✓✓
✓ ✓✓✓ ✓
✓
✓✓✓✓
✓
?unit-tested✓simulation driven
complex interactions hard to test hard to reason through
Show me Diego
rep✓
garden✓linux-circus✓
auctioneer✓ metricz✓route-emitter✓
stager✓ app-manager✓executor✓
file-server✓tps✓etcd✓converger✓
etcd-metrics-server✓
unit-tested✓simulation driven
Actors
complex interactions hard to test hard to reason through
Show me Diego
unit-tested✓simulation driven
Diego is a playActors
rep✓
garden✓linux-circus✓
auctioneer✓ metricz✓route-emitter✓
stager✓ app-manager✓executor✓
file-server✓tps✓etcd✓converger✓
etcd-metrics-server✓
complex interactions hard to test hard to reason through
Show me Diego
rep✓
garden✓linux-circus✓
auctioneer✓
metricz✓
route-emitter✓stager✓
app-manager✓
executor✓
file-server✓
tps✓
etcd✓converger✓
etcd-metrics-server✓
communication and role encoded via shared library
script
shared narrativeunit-tested✓simulation driven
Diego is a playActors
complex interactions hard to test hard to reason through
Show me Diego
executorrep
stager
app-manager
auctioneer
converger
etcd-metrics-server etcd
file-server
gardenlinux-circus
metricz
route-emitter
tps✓
✓✓
✓
✓
✓✓
✓
✓
✓
✓
✓✓
✓
communication and role encoded via shared library
script
✓integration tests✓
Diego is a playActors
shared narrativeunit-tested✓simulation driven
complex interactions hard to test hard to reason through
Show me Diegocomplexity in a distributed system
of this scope is real and necessary
Diego embraces this and tries to make its complexity:
explicittransparent
∴ easier to reason about
integration tests✓shared narrativeunit-tested✓simulation driven
complex interactions hard to test hard to reason through
Show me Diego
flexible abstractionextensiblerobustagile
Tasks/LRPs
Platform-Independent
SELFManaging
Handle on Complexity
The futurestaging
running
+ buildpacks
placement pools
.NETprocess types
auto-rebalancing0-downtime deploys
dockerfiles
custom health-checks
shell access persistent disk