mesoscon 2015 aug 20th netflix senior software engineer ... · reactive stream processing, mantis...

60
Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks Sharma Podila Senior Software Engineer Netflix Aug 20th MesosCon 2015

Upload: others

Post on 25-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma PodilaSenior Software Engineer

Netflix

Aug 20th MesosCon 2015

Page 2: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Agenda

● Context, motivation● Fenzo scheduler library● Usage at Netflix● Future direction

Page 3: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Why use Apache Mesos in a cloud?

Resource granularity

Application start latency

Page 4: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

A tale of two frameworks

Reactive stream processing

Container deployment and management

Page 5: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Reactive stream processing, Mantis

● Cloud native● Lightweight, dynamic jobs

○ Stateful, multi-stage○ Real time, anomaly detection, etc.

● Task placement constraints○ Cloud constructs○ Resource utilization

● Service and batch style

Page 6: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Mantis job topology

Worker

Sou

rce

App

Sta

ge 1

Sta

ge 2

Sta

ge 3

Sin

k

A job is set of 1 or more stagesA stage is a set of 1 or more workersA worker is a Mesos task

Job

Page 7: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Container management, Titan

● Cloud native● Service and batch workloads● Jobs with multiple sets of container tasks● Container placement constraints

○ Cloud constructs○ Resource affinity○ Task locality

Page 8: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Job 2Set 0

Set 1

Container scheduling model

Job 1

Co-locate tasks from multiple task sets

Page 9: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Why develop a new framework?

Page 10: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

Page 11: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

What about scale?Performance?Fault tolerance?Availability?

Page 12: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Easy to write a new framework?

What about scale?Performance?Fault tolerance?Availability?

And scheduling is a hard problem to solve

Page 13: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Long term justification is needed to create a new Mesos framework

Page 14: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Our motivations for new framework

● Cloud native(autoscaling)

Page 15: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Our motivations for new framework

● Cloud native(autoscaling)

● Customizable task placement optimizations(Mix of service, batch, and stream topologies)

Page 16: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4

Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

Page 17: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling challenge

Host 1 Host 2 Host 3 Host 4

Host 1 Host 2 Host 3 Host 4

vs.

For long running stateful services

Page 18: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Page 19: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Page 20: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Compute resource assignments for tasks

Page 21: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Components of a mesos framework

API for users to interact

Be connected to Mesos via the driver

Compute resource assignments for tasks

Page 22: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo

A common scheduling library for Mesos frameworks

Page 23: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo usage in frameworksMesos master

Mesos framework

Tasksrequests

Availableresource

offers

Fenzo taskscheduler

Task assignment result• Host1

• Task1• Task2

• Host2• Task3• Task4

Persistence

Page 24: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo scheduling library

Heterogeneous resources

Autoscaling of cluster Visibility of

scheduler actions

Plugins forConstraints, Fitness

High speed

Heterogeneous task requests

Page 25: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Announcing availability of Fenzo in Netflix OSS suite

Page 26: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo details

Page 27: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling problem

Fitness

Pending

Assigned

Urg

ency

N tasks to assign from M possible slaves

Page 28: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling optimizations

Speed Accuracy

First fit assignment Optimal assignment

Real world trade-offs~ O (1) ~ O (N * M)1

1 Assuming tasks are not reassigned

Page 29: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduling strategy

For each task

On each host

Validate hard constraintsEval fitness and soft constraints

Until fitness good enough, and

A minimum #hosts evaluated

Page 30: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task constraints

SoftHard

Page 31: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task constraints

SoftHardExtensible

Page 32: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Built-in Constraints

Page 33: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Host attribute value constraint

TaskHostAttrConstraint:instanceType=r3

Host1Attr:instanceType=m3

Host2Attr:instanceType=r3

Host3Attr:instanceType=c3

Fenzo

Page 34: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Unique host attribute constraint

TaskUniqueAttr:zone

Host1Attr:zone=1a

Host2Attr:zone=1a

Host3Attr:zone=1b

Fenzo

Page 35: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Balance host attribute constraint

Host1Attr:zone=1a

Host2Attr:zone=1b

Host3Attr:zone=1c

Job with 9 tasks, BalanceAttr:zone

Fenzo

Page 36: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fitness evaluation

Degree of fitness

Composable

Page 37: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for

Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

Page 38: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

Page 39: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing fitness calculator

Fitness for 0.25 0.5 0.75 1.0 0.0

Host1 Host2 Host3 Host4 Host5

fitness = usedCPUs / totalCPUs

Host1 Host2 Host3 Host4 Host5

Page 40: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Composable fitness calculatorsFitness

= ( BinPackFitness * BinPackWeight +RuntimePackFitness * RuntimeWeight) / 2.0

Page 41: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscaling in Fenzo

ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360

ASG/Cluster: mantisagentMinIdle: 8MaxIdle: 20CooldownSecs: 360

ASG/cluster: computeCluster

MinIdle: 8MaxIdle: 20CooldownSecs: 360

Fenzo

ScaleUp action:

Cluster, N

ScaleDown action:Cluster, HostList

Page 42: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Rules based cluster autoscaling

● Set up rules per host attribute value○ E.g., one autoscale rule per ASG/cluster, one cluster for network-

intensive jobs, another for CPU/memory-intensive jobs● Sample:

#Idlehosts

Trigger down scale

Trigger up scalemin

max

Cluster Name Min Idle Count Max Idle Count Cooldown Secs

NetworkClstr 5 15 360

ComputeClstr 10 20 300

Page 43: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Shortfall analysis based autoscaling

● Rule-based scale up has a cool down period○ What if there’s a surge of incoming requests?

● Pending requests trigger shortfall analysis○ Scale up happens regardless of cool down period○ Remembers which tasks have already been covered

Page 44: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Usage at Netflix

Page 45: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Cluster autoscalingN

umbe

r of M

esos

Sla

ves

Time

Page 46: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler run time (milliseconds)

Scheduler run time in milliseconds(over a week)

Average 2 mS

Maximum 38 mS

Note: times can vary depending on # of tasks, # and types of constraints, and # of hosts

Page 47: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Experimenting with Fenzo

Note: Experiments can be run without requiring a physical cluster

Page 48: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

A bin packing experiment

Host 3

Host 2Host 1

Mesos Slaves

Host 3000

Tasks with cpu=1

Tasks with cpu=3

Tasks with cpu=6

Fenzo

Iteratively assignBunch of tasks

Start with idle cluster

Page 49: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

Page 50: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Bin packing sample results

Bin pack tasks using Fenzo’s built-in CPU bin packer

Page 51: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Task runtime bin packing sampleBin pack tasks based on custom fitness calculator to pack short vs. long run time jobs separately

Page 52: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experiment

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

Page 53: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experiment

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

1,000 1 3 mS 3 mS 1 mS 188 mS 9 s

1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s

Hosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

Page 54: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Scheduler speed experimentHosts: 8-CPUs eachTask mix: 20% running 1-CPU jobs, 40% running 4-CPU, and 40% running 6-CPU jobsGoal: starting from an empty cluster, assign tasks to fill all hostsScheduling strategy: CPU bin packing

# of hosts # of tasks to assign each time

Avg time Avg time per task

Min time

Max time Total time

1,000 1 3 mS 3 mS 1 mS 188 mS 9 s

1,000 200 40 mS 0.2 mS 17 mS 100 mS 0.5 s

10,000 1 29 mS 29 mS 10 mS 240 mS 870 s

10,000 200 132 mS 0.66 mS 22 mS 434 mS 19 s

Page 55: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Accessing Fenzo

Code at https://github.com/Netflix/Fenzo

Wiki athttps://github.com/Netflix/Fenzo/wiki

Page 56: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Future directions

● Task management SLAs● Support for newer Mesos features● Collaboration

Page 57: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

To summarize...

Page 58: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo: scheduling library for frameworks

Heterogeneous resources

Autoscaling of cluster Visibility of

scheduler actions

Plugins forConstraints, Fitness

High speed

Heterogeneous task requests

Page 59: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Fenzo is now available in Netflix OSS suite at https://github.com/Netflix/Fenzo

Page 60: MesosCon 2015 Aug 20th Netflix Senior Software Engineer ... · Reactive stream processing, Mantis Cloud native Lightweight, dynamic jobs Stateful, multi-stage Real time, anomaly detection,

Questions?

Heterogeneous Resource Scheduling Using Apache Mesos for Cloud Native Frameworks

Sharma Podilaspodila @ netflix . com

@podila