building an on premise kubernetes cluster - usenix · 17 multi-master • cname your master •...
TRANSCRIPT
![Page 1: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/1.jpg)
Building an on premise Kubernetes cluster
D A N N Y T U R N E R
![Page 2: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/2.jpg)
2
Outline
What is K8s?
Why (not) run k8s?
Why run our own cluster?
Building what the public cloud provides
![Page 3: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/3.jpg)
3
Kubernetes
• Open-Source Container Management Platform
• Deploying
• Scaling
• Share Hardware
• Service Discovery
• Configuration Management
![Page 4: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/4.jpg)
4
Kubernetes Terms• Node
• Server
• Pod
• 1 or more containers
• Redis
• Rails & nginx
• Service
• DNS name for 1 or more pods
• Ingress
• Bridge into the cluster
![Page 5: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/5.jpg)
• Node
• Server
• Pod
• 1 or more containers
• Redis
• Rails & nginx
• Service
• DNS name for 1 or more pods
• Ingress
• Bridge into the cluster5
Kubernetes Terms
Node
Pod1
Redis
Rails
Nginx
My-App
![Page 6: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/6.jpg)
6
Kubernetes Terms• Node
• Server
• Pod
• 1 or more containers
• Redis
• Rails & nginx
• Service
• DNS name for 1 or more pods
• Ingress
• Bridge into the cluster
Node1
Pod1
Redis
Rails
Nginx
My-App
Node2
Pod1
Redis
Rails
Nginx
My-App
my-app.my-namespace
![Page 7: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/7.jpg)
7
Kubernetes Terms• Node
• Server
• Pod
• 1 or more containers
• Redis
• Rails & nginx
• Service
• DNS name for 1 or more pods
• Ingress
• Bridge into the cluster
Node1
Pod1
Redis
Rails
Nginx
My-App
Node2
Pod1
Redis
Rails
Nginx
My-App
Internet
Cluster
Ingress
my-app
![Page 8: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/8.jpg)
8
Why Kubernetes
• We already use containers
• We have our container management system
• Only runs our monolith
• Scaling unit is a host
• Not open source
![Page 9: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/9.jpg)
9
Why not run K8s
• Long running Jobs
• DB migration
• Fixed scheduling assumptions
• Number of workers per server
• Exposing internal services to external tools
• Stateful services like redis/DBs
![Page 10: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/10.jpg)
10
Why build our own
• We have 2 data centers filled with hardware
• Cloud Pricing might not be competitive at scale
• DC network is closed to the outside world
• don’t have secure communication between servers
• One change at at time
• Stay co-located with databases
![Page 11: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/11.jpg)
• We have 2 data centers filled with hardware
• Cloud Pricing might not be competitive at scale
• Hard to determine op-ex of running a DC
• DC network is closed to the outside world
• don’t have secure communication between servers
• One change at at time
• Stay co-located with databases
11
Why build our own
![Page 12: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/12.jpg)
12
Why build our own
• We have 2 data centers filled with hardware
• Cloud Pricing might not be competitive at scale
• One change at at time
• Easy to connect to resources outside of k8s but in the DC
![Page 13: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/13.jpg)
13
Why build our own
• We have 2 data centers filled with hardware
• Cloud Pricing might not be competitive at scale
• One change at at time
• Security & Privacy
• DC doesn’t need secure communication between servers
• Trusting our data in 3rd party hands
![Page 14: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/14.jpg)
14
On Premise work
• Master Node
• ETCD
• Networking & Ingress
• Persistent Storage
![Page 15: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/15.jpg)
15
Master Components
• Assigns pods to nodes
• IPs to pods and services
• Health Checks
• Cluster is frozen w/o master node
• cluster wont change itself
• external forces can still happen
Master Node
API Server
Controller Manager
Scheduler
![Page 16: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/16.jpg)
16
(High) Availability Strategies
• Start a new one after detecting a failure
• Bottleneck: time to spin up a new master node
• Run multiple at once
• Components are stateless and have leader election built-in
• Bottleneck: failover strategy
![Page 17: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/17.jpg)
17
Multi-Master
• CNAME your master
• Bottleneck: DNS propagation / timeouts
Send requests to all the masters
• ECMP to a Virtual-IP via an A-Record
• Health checks on your masters!
• Bottleneck: time to withdraw from ECMP group
API
Controller
Scheduler
API
Controller
Scheduler
master.k8s.example.com
master1.k8s… master2.k8s…
![Page 18: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/18.jpg)
18
Multi-Master
• CNAME your master
• Bottleneck: DNS propagation / timeouts
• Send requests to all the masters
• ECMP to a Virtual-IP via an A-Record
• Health checks on your masters!
• Bottleneck: time to withdraw from ECMP group
API
Controller
Scheduler
API
Controller
Scheduler
master.k8s.example.com
![Page 19: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/19.jpg)
19
• K8s data lives here
• Quorum is life
• k8s frozen when quorum is lost
• Can be run on the master nodes
• Limits scaling
• Makes the servers pets not cattle
API
Controller
Scheduler
Master Node
![Page 20: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/20.jpg)
20
• Member discovery
• Static configs
• chef searches
• SRV Records --discovery-srv etcd.example.com
![Page 21: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/21.jpg)
21
• Member discovery
• Static configs
• chef searches
• SRV Records
• Backups
• Live snapshots
--discovery-srv etcd.example.com
![Page 22: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/22.jpg)
22
Ingress
• Bridge between the internet and a service
• Ingress Controller + nginx
• Each deploy caused nginx to reload
• We already have a load balancing tier
Node1
Pod1
Redis
Internet
Ingress
Pod1
Redis Rails
Nginx
My-AppNode2
my-app
Load Balancers
![Page 23: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/23.jpg)
Ingress
• Services can be exposed on every host at a known port
Redis Rails
Nginx
My-AppNode2
Redis
kube-proxy kube-proxy
Load Balancers
![Page 24: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/24.jpg)
Ingress
• Services can be exposed on every host at a known port
• Route directly to pods
Redis Rails
Nginx
My-AppNode2
Redis
kube-proxy kube-proxy
Load Balancers
![Page 25: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/25.jpg)
25
Persistent Storage (Volumes)
• Persistent Volume Claims
• Distributed Storage System
• GlusterFs / Ceph RBS
• Same nodes as k8s Cluster
• Better use of hardware
• Servers are pets once again
• Just buy a SAN?
![Page 26: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/26.jpg)
26
Successful Failure
• We ran production traffic on our on-premise cluster
• Yet, we decided to use the cloud instead
• Upgrades were painful
• Solving a lot of problems ourselves
• We were becoming experts at more things not less
![Page 27: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/27.jpg)
27
Q U E S T I O N S ?
D A N N Y T U R N E R
Check out our blog at engineering.shopify.comFollow us on Twitter at @shopifyeng
![Page 28: Building an on premise Kubernetes cluster - USENIX · 17 Multi-Master • CNAME your master • Bottleneck: DNS propagation / timeouts Send requests to all the masters • ECMP to](https://reader030.vdocuments.site/reader030/viewer/2022041100/5ed79f9b9661ae43ff66a3d6/html5/thumbnails/28.jpg)
28
Networking
• All to all communication
• Pod & Service IPs
• Routing
• Calico (Software BGP)
• BGP Peer with top of rack switches
• 1 peer per server
• Calico custom filters