containercon 2016 - jimenez, arya @ijimene isabel ... · process migration move a running process...

Post on 02-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016 Mesosphere, Inc. All Rights Reserved. 1

Process Migration in the Orchestration WorldContainerCon 2016 - Jimenez, Arya

Isabel JimenezDistributed Systems Engineer

DC/OS Security Team & Apache Mesos Contributor

isabel@mesosphere.io@ijimene

Kapil AryaDistributed Systems Engineer

Apache Mesos Committer & DMTCP Developer

kapil@mesosphere.io@karya0

© 2016 Mesosphere, Inc. All Rights Reserved. 2

Overview

➢ Motivation

➢ Process Migration

➢ Apache Mesos

➢ Process/Container Migration for Mesos

➢ Demo

Overview

© 2016 Mesosphere, Inc. All Rights Reserved. 3

Motivation

© 2016 Mesosphere, Inc. All Rights Reserved. 4

● Stateless applications:○ No local state○ Start from a (relatively) vanilla state○ Perform transaction(s)○ Kill when no longer needed

● Stateful application:○ Some local state○ Start from vanilla state and compute “work” state○ Non-graceful shutdown results in loss of compute time

Stateless vs. Stateful Applications

© 2016 Mesosphere, Inc. All Rights Reserved. 5

● Stateless applications:○ Scale up: “on-demand” deployment by launching clones as needed○ Scale down: kill unused instances without loss of computation time○ Making room for high-priority task without significant penalty

● Stateful application:○ Scale up: longer initialization times for new instances○ Scale down: wait for instances to reach a “safe” state to preserve compute cycles.○ Making room for high-priority tasks results in significant compute-time penalty

Similarly for moving applications from one node/cluster to another!

Scheduling Stateless vs. Stateful Applications

© 2016 Mesosphere, Inc. All Rights Reserved. 6

Modern container orchestration tools are optimized for stateless applications!

Scheduling Stateless vs. Stateful Applications

© 2016 Mesosphere, Inc. All Rights Reserved. 7

Make them stateless!

● How?○ Rewrite ‘em!

● Alternatively○ Use process/container checkpointing and migration!

How to Better Schedule Stateful Applications?

© 2016 Mesosphere, Inc. All Rights Reserved. 8

Process Migration

© 2016 Mesosphere, Inc. All Rights Reserved. 9

● Process Migration○ Move a running process from one node to another

● Container Migration○ Move a running container from one node to another

● Virtual machine migration (e.g., vMotion)○ Move a running virtual machine from one node to another

Terminology

© 2016 Mesosphere, Inc. All Rights Reserved. 10

1. Pause the running process/container/VM2. Take a snapshot of the current state a.k.a. checkpointing3. Move the snapshot to the target node4. Restart from the snapshot on the target node

Do this transparently to the outside world!

● Ensure minimal downtime○ Reduce time required for stages (2) and (3)○ Ideally on the order of milliseconds!

How to Migrate a Process/Container/VM?

© 2016 Mesosphere, Inc. All Rights Reserved.

Checkpoint-Restart is the ability to save a set of running processes to a checkpoint-image on disk, and to later restart it from disk.

● A quick demo!

What is Checkpointing?

© 2016 Mesosphere, Inc. All Rights Reserved.

● Fault tolerance● Scheduling and process migration● Debugging (an executable bug report)● Faster startup times (checkpoint after initialization)● Save/restore workspace (for interactive sessions)● Speculative execution (what-if scenarios)● Managing long tails (single thread continues to run after other threads have

exited)

Checkpointing Use Cases

© 2016 Mesosphere, Inc. All Rights Reserved. 13

Stateful Application + Checkpointing ≈ Stateless Application

● Scale up: start from pre-initialized snapshot● Scale down: checkpoint and kill● Migrate: checkpoint, kill, and restart

Stateful Applications with Checkpointing

© 2016 Mesosphere, Inc. All Rights Reserved.

Checkpoint-restart involves saving and restoring:

● all of user-space memory● state of all threads● kernel state● network state● …

All this while ensure the state doesn’t change while taking a checkpoint!

● Quiesce the process(es) before saving the state!

How to Checkpoint/Restart a Process?

© 2016 Mesosphere, Inc. All Rights Reserved. 15

● Application-level○ Embed checkpointing code inside the application itself○ Optimal○ Burden on the application developer

● Virtual machine level○ Complete state○ Higher cost

● System-level○ No modification to application source/binary○ Can be done at the kernel-level or in the user-space

Different types of Checkpointing

© 2016 Mesosphere, Inc. All Rights Reserved.

● CRIU (Checkpoint Restart In Userspace)○ Single-node checkpointing○ Recent kernels (3.9+)○ Container-level○ http://criu.org/

● DMTCP (Distributed MultiThreaded CheckPointing)○ User-space libraries with LD_PRELOAD○ Distributed processes across multiple nodes○ http://dmtcp.sourceforge.net

16

Modern Checkpointing Systems

© 2016 Mesosphere, Inc. All Rights Reserved. 17

Apache Mesos:The datacenter kernel

© 2016 Mesosphere, Inc. All Rights Reserved. 18

Why can’t we run applications on our datacenters just like we run applications on our mobile phones?

We’re all building distributed systems.

Why?

© 2016 Mesosphere, Inc. All Rights Reserved. 19

The datacenter abstraction

© 2016 Mesosphere, Inc. All Rights Reserved. 20

Operating system

“a collection of software that manages the computer hardware resources and provides common services for computer programs”

- Wikipedia

The datacenter computer needs an operating system

© 2016 Mesosphere, Inc. All Rights Reserved. 21

Mesos can’t run applications on its own

A Mesos framework is a distributed system

that has a scheduler.

Schedulers like Marathon keeps your application running. A bit like a distributed “init.d”.

Resource offersOffer based model

© 2016 Mesosphere, Inc. All Rights Reserved. 22

High utilizationApache Mesos

time

© 2016 Mesosphere, Inc. All Rights Reserved. 23

Mesos mechanics

master

agent

scheduler

RESOURCES(cpu, mem, disk, etc)

© 2016 Mesosphere, Inc. All Rights Reserved. 24

Mesos mechanics

master

agent

scheduler

OFFER(cpu, mem, disk, etc)

© 2016 Mesosphere, Inc. All Rights Reserved. 25

Mesos mechanics

master

agent

scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "instances": 1, "mem": 128}

© 2016 Mesosphere, Inc. All Rights Reserved. 26

Mesos mechanics

master

agent

scheduler

ACCEPT OFFER(cpu, mem, disk, etc)

© 2016 Mesosphere, Inc. All Rights Reserved. 27

Mesos mechanics

master

agent

scheduler

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 28

Mesos mechanics

master

agent

scheduler

UPDATE STATE(STAGING, RUNNING, etc)

© 2016 Mesosphere, Inc. All Rights Reserved. 29

Mesos mechanics

master

agent

scheduler

UPDATE STATE(STAGING, FAILED, etc)

© 2016 Mesosphere, Inc. All Rights Reserved. 30

Mesos mechanics: Custom executor

master

agent

scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "executor": demo-executor, "mem": 128}

© 2016 Mesosphere, Inc. All Rights Reserved. 31

Mesos mechanics

Executor

Task

Agent

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 32

Mesos mechanics

Executor

Task

Agent

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 33

Mesos mechanics

Executor

Task

Agent

TASK STATE

© 2016 Mesosphere, Inc. All Rights Reserved. 34

Mesos mechanics

Executor

Task

Agent

UPDATE STATE

© 2016 Mesosphere, Inc. All Rights Reserved. 35

Mesos mechanics

Executor

Task

Agent

UPDATE STATE

© 2016 Mesosphere, Inc. All Rights Reserved. 36

Mesos mechanics

Executor

Task

Agent

ISOLATION

© 2016 Mesosphere, Inc. All Rights Reserved. 37

Mesos mechanics are fair

master

agent

scheduler C scheduler Dscheduler B scheduler Escheduler A

agentagent agent agent

© 2016 Mesosphere, Inc. All Rights Reserved. 38

Mesos mechanics are HA

master 2

agent

scheduler C scheduler Dscheduler B scheduler Escheduler A

agentagent agent agent

master 3master 1

ZooKeeper

© 2016 Mesosphere, Inc. All Rights Reserved. 39

APACHE MESOS: Putting it all together

m 2

scheduler C scheduler Dscheduler B scheduler Escheduler A

m 1

ZooKeeper

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

aa

m 3 m 4 m 5 m 6 m 7 m 8 m 9

scheduler C scheduler Dscheduler B scheduler Escheduler Ascheduler C scheduler Dscheduler B scheduler Escheduler A

scheduler C scheduler Dscheduler B scheduler Escheduler Ascheduler C scheduler Dscheduler B scheduler Escheduler A

© 2016 Mesosphere, Inc. All Rights Reserved. 40

Mesos Container Migration

© 2016 Mesosphere, Inc. All Rights Reserved. 41

RUNC

● OCI specification

● Well integrated with CRIU

● Lightweight universal runtime container

● Compatible with Docker

© 2016 Mesosphere, Inc. All Rights Reserved. 42

Mesos mechanics: Custom executor

Mesos

agent

Volt Scheduler{ "container": { "docker": { "image": "busybox", }, "type": "DOCKER" }, "cpus": 0.1, "id": "demo", "executor": volt-executor, "mem": 128}

Volt Executor

© 2016 Mesosphere, Inc. All Rights Reserved. 43

Mesos mechanics

VOLT Executor

RunC

Agent

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 44

Mesos mechanics

VOLT Executor

RunC

Agent

RunC

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 45

Mesos mechanics

VOLT Executor

RunC

Agent

RunCRunC

LAUNCH TASK

© 2016 Mesosphere, Inc. All Rights Reserved. 46

Demo!

© 2016 Mesosphere, Inc. All Rights Reserved. 47

First class integration with Mesos

○ Transparent to the scheduler and executor

○ New tasks states (CHECKPOINTED, RESTORING, etc)

○ Support multiple checkpoint-service providers (DMTCP, CRIU, etc)

Future Work: Checkpointing as a Service

© 2016 Mesosphere, Inc. All Rights Reserved.

THANK YOU!

48

top related