networking approaches in a container world...focus on docker swarm and kubernetes remember: the...
TRANSCRIPT
Networking approaches in a Container World
Antoni Segura PuimedonPrincipal Sw. Engineer
Flavio CastelliEngineering Manager
Neil JerramSenior Sw. Engineer
Who we are
Disclaimer
● There a many container engines, we’ll focus on Docker● Multiple networking solutions are available:
○ Introduce the core concepts○ Many projects → cover only some of them
● Container orchestration engines:○ Often coupled with networking○ Focus on Docker Swarm and Kubernetes
● Remember: the container ecosystem moves at a fast pace, things can suddenly change
The problem
● Containers are lightweight● Containers are great for microservices● Microservices: multiple distributed processes communicating● Lots of containers that need to be connected together
Single host
host networking
Containers have full access to the host interfaces!!!
host
container-a
eth0lo ...
host networking
Containers able to:
● See all host interfaces● Use all host interfaces
Containers can’t (without CAPS)
● Modify their IP addresses● Modify their IP routes● Create virtual devices● Interact with iptables/ebtables
$ docker run --net=host -it --rm alpine /bin/sh/ # ip link show1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:002: wlp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether e4:b3:18:d2:f6:ea brd ff:ff:ff:ff:ff:ff3: enp0s31f6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c8:5b:76:36:b6:0b brd ff:ff:ff:ff:ff:ff4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff/ #
Bridged networking
● Linux bridge● Containers connected to the
bridge with veth pairs● Each container gets its own IP
and kernel networking namespace
● Containers can talk to each other and to the host via IP
hosteth0
docker0172.17.0.0/16
Forwarding
container-a
veth0
veth1
container-b
veth2
veth3
Bridged networking
● Outwards connectivity via IP forwarding and masquerading
● The bridge and containers use a private subnet
$ ip address show dev docker04: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:7e:62:3d:37 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever$ sudo iptables -t nat -L POSTROUTINGChain POSTROUTING (policy ACCEPT)target prot opt source destination MASQUERADE all -- 172.17.0.0/16 anywhere
$ docker run --net=bridge -it --rm alpine /bin/sh -c '/sbin/ip -4 address show dev eth0; ip -4 route show'50: eth0@if51: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state UP inet 172.17.0.2/16 scope global eth0 valid_lft forever preferred_lft foreverdefault via 172.17.0.1 dev eth0 172.17.0.0/16 dev eth0 src 172.17.0.2
Bridged networking
● Services are exposed with iptables DNAT rules
● Iptables performance deteriorates as rule amount increases
● Limited to how many host ports are free to be bound
$ docker run --net=bridge -d --name nginx -p 8000:80 nginx$ sudo iptables -t nat -n -L Chain PREROUTING (policy ACCEPT)target prot opt source destination DOCKER all -- 0.0.0.0/0 0.0.0.0/0 ADDRTYPE match dst-type LOCAL
Chain OUTPUT (policy ACCEPT)target prot opt source destination DOCKER all -- 0.0.0.0/0 !127.0.0.0/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)target prot opt source destination MASQUERADE all -- 172.17.0.0/16 0.0.0.0/0 MASQUERADE tcp -- 172.17.0.2 172.17.0.2 tcp dpt:80
Chain DOCKER (2 references)target prot opt source destination RETURN all -- 0.0.0.0/0 0.0.0.0/0 DNAT tcp -- 0.0.0.0/0 0.0.0.0/0 tcp dpt:8000 to:172.17.0.2:80
Multi host
host-A host-B host-C
container-02
container-01
container-03
container-04
container-05
container-06
frontendnetwork
applicationnetwork
databasenetwork
eth0 eth0
Multi host networking scenarios
eth0
a big host-A
frontendnetwork
container-02
container-01
container-03
container-04
container-05
container-06
applicationnetwork
databasenetwork
VM-1 VM-2 VM-3
Multi host networking scenarios
Multi host routing solutions
Routing approach
● Managed common IP space at the container level
● Assigns /24 subnet to each host
● Inserts routes to each host /24 into the routing table of each host
● Main implementations○ Calico○ Flannel○ Romana○ Kuryr
■ Calico
host-aeth0
docker010.0.8.1/24
container-a10.0.8.2/24
container-b10.0.8.3/24
host-beth0
docker010.0.9.1/24
container-c10.0.9.2/24
container-d10.0.9.3/24
172.16.0.0/16
172.16.0.4/16
172.16.0.5/16
Calico’s approach
● Felix agent agent per node that sets up a vRouter:
○ Kernel’s L3 forwarding○ Handles ACLs with
iptables○ Uses BIRD’s BGP to keep
/32 or /128 routes to each container updated
○ Etcd as data store○ Replies container ARP
reqs with host hwaddr
host-aeth0
docker010.0.8.1/24
container-a10.0.8.2/24
container-b10.0.8.3/24
host-beth0
docker010.0.9.1/24
container-c10.0.9.2/24
container-d10.0.9.3/24
172.16.0.0/16
172.16.0.4/16
172.16.0.5/16
BGP vRouter
BGP vRouter
Flannel approach
● Flanneld agent○ Etcd as data store○ Keeps /24 routes to hosts up to date○ No ACLs/isolation
Canal
● Developed by Tigera● Announced on May 9th 2016
Multi host overlay solutions
● Encapsulates multiple networks over the physical networking
○ UDP○ vxlan○ geneve○ GRE
● Connect containers to virtual networks
● Main projects○ Docker’s native overlay○ Flannel○ Weave○ Kuryr
■ OVS (OVN, Dragonflow)
■ MidoNet■ PLUMgrid
Overlay approach
host-aeth0
net-x10.0.8.1/24
container-a10.0.8.2/24
container-b10.0.8.3/24
host-beth0
net y10.0.9.1/24
container-c10.0.7.4/24
container-d10.0.7.3/24
172.16.0.0/16
172.16.0.4/16
172.16.0.5/16
net-y10.0.7.1/24
container-c10.0.7.2/24
encapsulation
encapsulation
encapsulated container traffic
OpenStack & containers with Kuryr
● Allows you to have both VMs, containers and containers-in-VMs in the same overlay
● Allows reusing VM nets for containers and viceversa
● Allows you to have separate overlay nets routed to each other
● Isolation from the host networking
● Can have Swarm and Kubernetes on the same overlay
Overlay
underlay
Routing vs Overlay
Good Bad
Routing ● Native performance● Easy debugging
● Requires control over the infrastructure
● Hybrid cloud more complicated (requires VPN)
● Can run out of addresses (mitigation: IPv6)
Overlay ● Easier inter-cloud● Easier hybrid workloads● Doesn’t require control over the
infrastructure● More implementation choice
● Inferior performances (mitigation: hw acceleration and jumbo frame)
● Debugging more complicated
Competing COE-Networking interaction
Container Network Model (CNM)
● Implemented by Docker’s Libnetwork● Separated IPAM and Remote Drivers● Docker ≥ 1.12 Swarm mode only works
with native overlay driver● Some of the Libnetwork remote drivers:
○ OpenStack Kuryr○ Calico○ Weave
Container Network Interface (CNI)
● Implemented by Kubernetes, rkt, Mesos, Cloud Foundry and Kurma
● Plugins:○ Calico○ Flannel○ Weave○ OpenStack Kuryr (unreleased)
More challenges
Service discovery
● Producer: A container that runs a service● Consumer: A container when consuming a service● Need a way for consumers to find producer endpoints
Service discovery challenges
#1 Finding the producer
host-A
web-01
eth0
Where is redis?
host-B
redis-01
eth0Lacks SD
host-A
web-01
eth0host-B
redis-01
eth0host-C
redis-02
eth0
#2 Moving services
Service discovery challenges
#3 Multiple choice
host-A
web-01
eth0
Which redis?
host-B
redis-01
eth0host-C
redis-02
eth0host-D
redis-03
eth0
Addressing service discovery
Use DNS
● Problematic for highly dynamic deployments:○ Containers can die/be moved somewhere more often than DNS caches expire○ If we try to improve it by reducing DNS TTL → more load on the server○ Some clients ignore TTL → old entries are cached
Note well:● Docker < 1.11: updates /etc/hosts dynamically● Docker ≥1.11: integrates a DNS server
Key-value store
● Rely on a k/v store○ etcd,○ consul,○ zookeeper
● Producer register its IP and port● Orchestration engine handles this data to the consumer● At run time either:
○ Change your application to read data straight from the k/v○ Rely on some helper that exposes the values via environment file or configuration file
Changes, multiple choices & ingress traffic
Orchestration engine services
● Services get a unique and stable Virtual IP Address
● VIP always points to one of the service containers
● Consumers are pointed to the VIP
● Offered by Kubernetes and Docker 1.12+
● Can run in parallel to DNS for legacy apps
host-B
redis-01
eth0host-C
redis-02
eth0host-A
web-01
eth0
redisservice
VIP
33
Ingress traffic: Routing requests to ever-changing container topology
Kubernetes has three service modes:
● ClusterIP: VIP internal to cluster communication only. (can use externalIP)
● NodePort: Like Docker 1.12+● Loadbalancer: Uses NodePort at the
cluster level and uses one of its pluggable load balancer drivers to instantiate and update external load balancers (gce, aws, OSt)
Docker 1.12+ service approach
● Define services using the --publish/-p flag● Services get exposed on all cluster nodes
on specific port mappings node_IP:service_port
Load balancer
host-B
guestbook-01
8081
blog-01
8080
host-A
guestbook-01
80818080
host-C
8081
blog-01
8080
Load balanced ingress traffic flow
● Load balancer picks a host● Traffic is handled by cluster
service● Works even when the node
chosen by the LB is not running the container
Recap
Not just a matter of connecting containers:
● Service discovery● Handling changes & multiple
choices● Handling ingress traffic
Approach Spec
Calico routing CNI, CNM
Docker overlay CNM
Flannel routing, overlay CNI, CNM
Kuryr routing, overlay CNI, CNM
Weave overlay CNI, CNM
Q&A