dockercon eu 2015: placing a container on a train at 200mph

Placing a container on a train at 200mph

Casper S. JensenSoftware Engineer, Uber

About Me

● Joined Uber January 2015,Compute PlatformDenmark, Aarhus office

● PhD, CSOn a completely unrelated topic

● Linux aficionado

● Docker “user” since February

About UBERWhy all the fuzz?

The UBER app

4

339 Cities

5

61 Countries

6

2.000.000+ Trips/day

7

4000+ Employees

8

Not that hard...

10

You just have to handle

● 24/7 availability across the globe● Very different markets● 1000s of developers and teams● Adding new features like there’s no tomorrow

UberPOOL, UberKITTEN, UberICECREAM, UberEATS, UberWHATEVERYOUCANIMAGINE

● Hypergrowth in all dimensions● Datacenters, servers, infrastructure, etc

Basically, you have to make magic happen every time a user opens the application

Software DevelopmentThe old UBER way

A fair amount of frustration

12

1)Write service RFC2)Wait for feedback3)Do all necessary scaffolding by hand4)Start developing your service5)Wait for infra team to write service scaffolding6)Wait for IT to allocate servers7)Wait for infra team to provision servers8)Deploy to development servers and test9)Deploy to production10)Monitor and iterate

Steps 5–7 could take days or weeks...

It's just not scalable

13

But you have to start somewhere

—Internal e-mail, February 2015

“Make it easier for service owners to manage their local

service environments.”

14

New development process

16

1)Write service RFC2)Wait for feedback3)Do all necessary scaffolding using tools4)Start developing your service

5)Deploy to development servers and test6)Deploy to production7)Monitor and iterate

No silver bullets

All the things you did not consider

19

● Routing● Dynamic service discovery● Deployment● Placement engine● Logging and tracing● Dual build environments● Handling of secrets● Security updates● Private repositories● Replicating images across multiple datacenters

Also, how much freedom do you really want to give your developers?

Change all the things!Let's go through some examples

uDeploy

21

● Rolling upgrades● Automatic rollbacks on failure● Health checks, stats, exceptions,

○ Load-, and system-tests● Service building● Build replication● 4.000+ upgrades/week● 3.000+ builds/week● 300+ rollbacks/week● 600+ managed services

Our in-house deployment/cluster management system

Moving to docker with zero downtime

22

Build multiplexing

We want to keep on trucking while migrating to docker

Build process & scaffolding

23

Declarative build scripts

● Service configuration in git● Preset service frameworks● Many options● Generator creating

○ Dockerfile○ Health checks○ Entry point scripts inside container○ In general, all glue between host and service

● Possible to supply custom Dockerfile

service_name: test-uber-serviceowning_team: udeploybackend_port: 123frontend_port: 456service_type: clay_wheelclay_wheel: celeries: - queue: test-uber-service

has_celerybeat: true

Image replication

24

● Multiple datacenters● Images must be stored within DCs● Build once, replicate everywhere● Traffic restrictions, push but not pull

Current setup● Stock docker registry● File back-end● Docker-mover● Syncing images using pull/push● Use notification API to speed up replication

Service discovery & routing

25

● Previously, we used HAProxy + scripts to do this● Now, we use Hyberbahn + TChannel RPC

https://github.com/uber/{hyperbahn|tchannel}○ Used for docker and legacy services○ Required in order to move containers around in seconds○ Dynamic routing, circuit breaking, retries, rate limiting,

load balancing○ Completely dynamic, no fixed ports

Key Take-Aways

27

● Remove team dependencies● More freedom● Not tied to specific frameworks

or versions (hi, Python 3)● Easy to experiment with new

technologies

● Too much freedom● Non-trivial integrating with a

large running system● Infrastructure must be dynamic

throughout● Containers are only a minor

part of the infrastructure, don't forget that

The good & the bad

Current and future wins● Today, 30% of all services in docker● Soon-ish, 100%

● Great improvements in provisioning time (done)● Framework and service owners can manage their own

environment (done)● Faster and automatic scaling of capacity (in progress)

Thank you!Casper S. [email protected]

dockercon eu 2015: placing a container on a train at 200mph

Technology