![Page 1: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/1.jpg)
Elastic Efficient Execution of Varied Containers
Sharma PodilaNov 7th 2016, QCon San Francisco
![Page 2: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/2.jpg)
How do we efficiently run heterogeneous workloadson an elastic pool ofheterogeneous resources,with capacity guarantees?
In other words...
![Page 3: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/3.jpg)
Topics
● Containers, Mesos, Fenzo - where are we today?
● Modeling an elastic Mesos cluster
● Capacity guarantees for varied applications
● Network resource and security groups
● Ongoing and future work
![Page 4: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/4.jpg)
About Me
● Software engineer○ Resource scheduling, stream processing,
distributed systems○ Netflix Edge Engineering○ Sun Microsystems + Oracle Corp.
● Author of Fenzo scheduling libraryhttps://github.com/Netflix/Fenzo
![Page 5: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/5.jpg)
Source: https://www.sandvine.com/news/global_broadband_trends.asp
81 Million subscribers worldwide and growing!
![Page 6: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/6.jpg)
Microservices architecture on AWS EC2
![Page 7: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/7.jpg)
Containers, Apache Mesos, Fenzo - where are we today?
![Page 8: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/8.jpg)
Reactive stream processing: Mantis
Zuul Cluster
API Cluster
MantisStream processing
Cloud native service
● Configurable message delivery guarantees● Heterogeneous workloads
○ Real-time dashboarding, alerting○ Anomaly detection, metric generation○ Interactive exploration of streaming data
AnomalyDetection
![Page 9: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/9.jpg)
Current Mantis usage
● Peak of 1,800 EC2 instances○ M3.2xlarge instances
● Peak of 3,700 concurrent containers○ Trough of 2,700 containers
● Mix of perpetual and interactive exploratory jobs● Peak of 11 Million events / sec
![Page 10: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/10.jpg)
EC2
VPC
VMVM
Titu
s Jo
b C
ontro
l
Containers
AppCloud Platform
(metrics, IPC, health)
VMVM
BatchContainers
Eureka Edda
Container deployment: Titus
Atlas & Insight
![Page 11: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/11.jpg)
Current Titus usage
#Containers (tasks) for the week of 10/24 in one of the regions
● Peak of ~1,800 instances○ Mix of m4.4xl, r3.8xl, g2.8xl○ ~800 instances at trough
● Mix of batch, stream processing, and some microservices
![Page 12: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/12.jpg)
Core architectural components
AWS EC2
Apache Mesos
Titus/Mantis Framework
Fenzo
Fenzo at https://github.com/Netflix/Fenzo
Apache Mesos at http://mesos.apache.org/
![Page 13: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/13.jpg)
Jobs, tasks, instances, containers
Jobs can be one of batch, service, or stream processing type of jobs
A jobs has one or more tasks to runAn instance is equivalent to a task
A task runs one container
![Page 14: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/14.jpg)
A few common themes
Heterogeneous mix of jobs and resources
Resource Task request Agent sizes
CPU 1 - 32 CPUs 8 - 32 CPUs
Memory 2 - 200+ GB 32 - 244 GB
Network bandwidth
10 - 2000 Mbps 1024 - 10240
Resource affinity based on task typeTask locality
![Page 15: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/15.jpg)
A few common themes
Large variation in peak to trough resource requirements
Mantis events/sec
11M
2M
Titus concurrent containers
1000s
10s
![Page 16: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/16.jpg)
Can we resize agent cluster based on demand?
Modeling an elastic Mesos cluster
![Page 17: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/17.jpg)
Task assignments in a cluster
Consider a cluster with 4-slot hosts
![Page 18: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/18.jpg)
“Random” assignments in a cluster
An EC2 instance with 4 slots
Used slot
Idle slot
Cluster starts random assignments of resources to tasks
![Page 19: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/19.jpg)
“Random” assignments in a cluster
Cluster starts to fill up...
![Page 20: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/20.jpg)
“Random” assignments in a cluster
Cluster somewhat full.But, only 1 agent can be terminated for scale down without losing jobs
About 50% utilized
![Page 21: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/21.jpg)
“Random” assignments in a cluster
Cluster is now full
100% utilized
![Page 22: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/22.jpg)
“Random” assignments in a cluster
Cluster partially used as jobs finish...
About 65% utilized
![Page 23: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/23.jpg)
“Random” assignments in a cluster
Cluster partially used, but, can’t terminate any instance without losing jobs
About 25% utilized
![Page 24: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/24.jpg)
Ideal assignments in a cluster
Cluster utilized to the same level as previous, but, can now terminate 9 of the 12 instances!
Similarly, 25% utilized
![Page 25: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/25.jpg)
Ideal assignments in a cluster
Cluster scaled down easily due to “bin packing”
![Page 26: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/26.jpg)
EC2 ASG attributes for setting number of servers in cluster
EC2 AutoScalingGroups have three attributes to set● Min - minimum number of instances to have● Max - maximum number of instances● Desired - current number of instances to have
Fenzo sets the “Desired” count based on demand
![Page 27: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/27.jpg)
EC2 AutoScalingGroup for Mesos agents
Min
Desired
Max
![Page 28: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/28.jpg)
Min
Desired
Max
EC2 AutoScalingGroup for Mesos agents
![Page 29: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/29.jpg)
MinDesired
Max
EC2 AutoScalingGroup for Mesos agents
![Page 30: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/30.jpg)
Using multiple instance types
![Page 31: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/31.jpg)
Amazon EC2 provides a variety of servers a.k.a “instance types”https://aws.amazon.com/ec2/instance-types/
Algorithm model training jobs run well on memory optimized instances of R3 type
Typical services run well on balanced compute instances of M4 type
Using multiple instance types
![Page 32: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/32.jpg)
How do we use multiple EC2 instance types in the same Mesos agent cluster?
Using multiple instance types
![Page 33: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/33.jpg)
Using multiple EC2 instance types
m4.4xlarge agent ASG r3.8xlarge agent ASG
Titus
Grouping agents by instance type let’s us autoscale them independently
![Page 34: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/34.jpg)
Using multiple EC2 instance types
m4.4xlarge agent ASG r3.8xlarge agent ASG
Titus
User job: 2 CPUs, 5GB memory
User job: 8 CPUs, 8GB memory User job: 1 CPUs,
1GB memory
![Page 35: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/35.jpg)
Continuous deployment of agents
![Page 36: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/36.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v1
A new version of agent introduces a new ASG
![Page 37: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/37.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v1 m4.4xlarge agent ASG v2
A new version of agent introduces a new ASG
![Page 38: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/38.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v1 m4.4xlarge agent ASG v2
Disable
A new version of agent introduces a new ASG
![Page 39: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/39.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v1 m4.4xlarge agent ASG v2
Disable
Migrate tasks
A new version of agent introduces a new ASG
![Page 40: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/40.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v1 m4.4xlarge agent ASG v2
Disable
A new version of agent introduces a new ASG
![Page 41: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/41.jpg)
Continuous deployment of agents
m4.4xlarge agent ASG v2
Old agent ASG removed
A new version of agent introduces a new ASG
![Page 42: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/42.jpg)
Bringing it all together...
m4.4xlarge agent ASG r3.8xlarge agent ASG
Titus
v2
v1
v2
v1
![Page 43: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/43.jpg)
Capacity guarantees for varied applications
![Page 44: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/44.jpg)
The capacity guarantee challenge
Demand for resources
Supply>
![Page 45: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/45.jpg)
New batch of tasks
Running #tasks
Tasks launched
An execution sample from a cluster
![Page 46: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/46.jpg)
New batch of tasks
Running #tasks
Tasks launched
An execution sample from a cluster
Waiting for agents to free up…Or, for new agents from scale up
![Page 47: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/47.jpg)
New batch of tasks
Running #tasks
Tasks launched
Scale up and freed agents satisfy all new pending tasks
An execution sample from a cluster
![Page 48: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/48.jpg)
New batch of tasks
Running #tasks
Tasks launched
What if a service was launched at this time? Waiting for agents
to free up…Or, new agents from scale up
An execution sample from a cluster
![Page 49: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/49.jpg)
Capacity guarantees
Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^
Agreed upon
![Page 50: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/50.jpg)
Capacity guarantees
Guarantee capacity for timely job startsMesos support for quotas, etc. evolving^
Agreed upon
Generally, optimize throughput for batch jobs and start latency for service jobs
![Page 51: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/51.jpg)
Capacity guarantees
Some service style jobs may be less important
Categorize by expected behavior instead
![Page 52: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/52.jpg)
Capacity guarantees
Some service style jobs may be less important
Categorize by expected behavior instead
Critical versus Flex (flexible) scheduling requirements
![Page 53: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/53.jpg)
Capacity guarantees
Critical
Flex
Quotas
![Page 54: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/54.jpg)
Capacity guarantees
Critical
FlexCritical
Flex
ResourceAllocationOrder
Quotas Prioritiesvs.
![Page 55: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/55.jpg)
AppC
1
AppC
2
AppC
3
AppC
N
AppF1
AppF2
AppFN
AppF3
ResourceAllocationOrder
Capacity guarantees: hybrid view
Critical
Flex
![Page 56: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/56.jpg)
Capacity guarantees via Fenzo
Fenzo supports multi-tiered task queues
Multiple “buckets” per tier with “fair sharing” by dominant resource usage
Tier 0
Tier 1
![Page 57: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/57.jpg)
Translating application capacity to EC2 instances
● Define per application capacity guarantees● Define per tier capacity guarantees● Translate to number of EC2 instances
![Page 58: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/58.jpg)
Defining application capacity
App1-cap = num_app_instances *app_instance_dimensions
app_instance_dimensions:{ #cpus, memory, disk, network}
Agnostic to EC2 instance types
![Page 59: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/59.jpg)
Defining application capacity
Applications specify resource needs, not EC2 instance types● Can manage capacity guarantees using a variety of
instance types● Eases migration to new instance types, thereby helps
capacity procurement teams
![Page 60: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/60.jpg)
Tier Capacity = SUM (App1-cap + App2-cap + … + AppN-cap)+ BUFFER
BUFFER:● Accommodate some new or ad hoc jobs with no guarantees● Red-black pushes of services temporarily double capacity
Defining Tier capacity
![Page 61: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/61.jpg)
#EC2_instances = Tier_capacity /EC2_instance_dimensions
A tier may use multiple instance types
Translate to number of instances
Critical
Flex
= { m4.4xlarge, m3.2xlarge }
= { r3.8xlarge, g2.8xlarge }
![Page 62: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/62.jpg)
Network resource and security groups
![Page 63: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/63.jpg)
Container executor
+ <Augment missing pieces:
IP per containerSecurity - Security Groups, IAM rolesIsolation for networking b/w, disk I/O
MULTI-TENANT
![Page 64: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/64.jpg)
Elastic Network Interfaces (ENI)
AWS EC2 Instance
ENI0
IP0IP1IP2IP3
ENI1
IP4IP5IP6IP7
ENI2
IP8IP9
IP10IP11
ENI0
IP0IP1IP2IP3
● Each EC2 instance in VPC has 2 or more ENIs
● Each ENI can have 2 or more IPs
● Security Groups are set on the ENI
![Page 65: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/65.jpg)
ENI+IP resource allocation modelA two level resource modeled in FenzoEach agent reports #ENIs and #IPs per ENI via custom attributeFenzo does allocation and usage tracking
ENI 1
Assigned Security Group: SG1 Used IPs Count: 2 of 7
ENI 2
Assigned Security Group: SG1,SG2 Used IPs Count: 1 of 7
ENI 3
Assigned Security Group: SG3 Used IPs Count: 7 of 7
![Page 66: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/66.jpg)
Plumbing VPC Networking into Docker
No IP, SecGrp A
Task 0
SecGrp Y,Z
Task 1 Task 2 Task 3
Titus EC2 Host VMeth1
ENI1SecGrp=A
eth2
ENI2SecGrp=X
eth3
ENI3SecGrp=Y,Z
IP 1IP 2
IP 3
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
app
SecGrp X
pod root
veth<id>
appapp
veth<id>
Linux Policy BasedRouting + Traffic Control
TitusEC2
Metadata Proxy
169.254.169.254IPTables NAT (*)
* **
169.254.169.254Non-routable IP
*
![Page 67: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/67.jpg)
Network bandwidth isolation
Each container gets an IP on one of the ENIs
Linux tc policies used on virtual EthernetFor both incoming and outgoing traffic
Bandwidth limited to the requested valueNo borrowing of unused bandwidthEasy to reason about
![Page 68: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/68.jpg)
Ongoing and future work
![Page 69: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/69.jpg)
Current and future work
● Fine grain capacity guarantees○ Hierarchical sharing policies○ Preemptions to satisfy priority tiers and sharing policies
● Execution environment security hardening● Onboarding new applications● Looking forward to working with the
community
![Page 70: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/70.jpg)
In Summary...
![Page 71: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/71.jpg)
Mesos and Fenzo help us run lots of containers
● In an elastic fashion● With guaranteed capacity for varied
applications● Custom AWS integration gives us network
resource isolation and security groups
In summary...
![Page 72: Varied Containers Elastic Efficient Execution of · 2017-02-02 · Reactive stream processing: Mantis Zuul Cluster API Cluster Mantis Stream processing Cloud native service Configurable](https://reader030.vdocuments.site/reader030/viewer/2022041019/5ece2837f6bb9c0f49301aa0/html5/thumbnails/72.jpg)
Questions?
Elastic Efficient Execution of Varied ContainersSharma Podila spodila @ netflix . com
@podila linkedin . com / in / spodila