infrastructure - nginx · dropbox traffic infrastructure oleg guba sre, traffic-team...

43
Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team [email protected]

Upload: others

Post on 17-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Dropbox traffic infrastructure

Oleg GubaSRE, [email protected]

Page 2: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Half a billion of users180+ countriesExabytes of users dataPetabytes of metadataMillions requests per secondTerabits of traffic

Dropbox scale

Page 3: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Our Edge network

Page 4: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Our Edge network

Page 5: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Why latency matters?

Page 6: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Why latency matters?150 ms

4RTT + server time = 700ms

Page 7: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

20 ms

Why latency matters?

150 ms

4RTT + 1RTT + server time = 330ms

Page 8: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

time

CWND

High latency Low latency

Why latency matters?

Page 9: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Reduce latency for interactive traffic

Increase throughput for bulk traffic

Why we want to be close to users

Page 10: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Global server load balancing(GSLB)

Page 11: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB techniques

BGP Anycast

DNS

URL-based

Page 12: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Anycast

Page 13: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

1.2.3.0/24

GeoDNS

www.dropbox.com?

1.2.3.1

1.2.3.0/24

1.2.3.0/24

GSLB: Anycast

Page 14: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Anycast issues

Page 15: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Anycast performance

Page 16: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Anycast

SimpleAutomatic failover

BGP is not latency awareAlmost no traffic controlNo graceful draining

Page 17: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Geo-DNS

Page 18: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Geo-DNS

1.2.3.0/24

1.2.2.0/24

GeoDNS

www.dropbox.com?

1.2.1.1www.dropbox.com?1.2.2.11.2.1.0/24

Page 19: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Geo-DNS: TTL is a lie

t0

TTL=60 seconds

t0+15min t0+60min

Page 20: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

country A

country B

GSLB: Geo-DNS

Page 21: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Geo-DNS

1.2.3.0/24

1.2.2.0/24

GeoDNS

www.dropbox.com?

1.2.1.1

1.2.0.0/19

1.2.0.0/19

1.2.0.0/191.2.1.0/24

Page 22: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Geo-DNS

Better routing decision than anycast

Traffic control

Graceful draining

Geo-DNS is not latency aware

Precise GeoIP database required

Page 23: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Real user monitoring(RUM)

Page 24: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: Real user monitoring (RUM)

Page 25: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata
Page 26: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: RUM Data processing

Live latency data

Resolver to IP

BGP/AS data

Monitoring

Overrides

RUM routing map

1.0.100.0/24 → NRT

… → …

RUM

data

processor 99.99.9.0/24 → ORD

Page 27: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: RUM-DNS performance

Page 28: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

GSLB: RUM

Best routing decisionsFull control over traffic

Infrastructure requiredComplicated data processing

Page 29: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Inside the Point of Presence

Page 30: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

peering peering

InternetDropbox datacenters

PoP network architecture

Page 31: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

ECMP

PoP software architectureL4 LBIPVS

L7 LBNGINX

Connection table + chash

Page 32: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

ECMP

PoP software architectureL7 LBNGINX

Page 33: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

PoP software architecture: IPVSConnection table

?

Consistent hash

Page 34: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

PoP software architectureL4 LBIPVS

L7 LBNGINX

Connection table + chashECMP

Page 35: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

PoP software architecture: failure modesL4 LBIPVS

L7 LBNGINX

Connection table + chashECMP

Page 36: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

ECMP

PoP software architecture: failure modesL4 LBIPVS

L7 LBNGINX

Connection table + chash

Page 37: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

NGINX lifecycle

squashfs.img

torrentconfigs

tools

Page 38: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

UMS

topology

service discovery

monitoring

UMS

Lua

Page 39: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

TCP: Fair queueing

FQsched

Page 40: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

TCP: Pacing

Δt

PacerData

tosend

Page 41: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

Summary

Dropbox Edge network

Why latency matters

GSLB approaches

PoP architecture

Page 42: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

We are hiring!

Page 43: infrastructure - NGINX · Dropbox traffic infrastructure Oleg Guba SRE, Traffic-team oleg@dropbox.com. Half a billion of users 180+ countries Exabytes of users data Petabytes of metadata

fb: oleg.guba email: [email protected]