infrastructure - nginx · dropbox traffic infrastructure oleg guba sre, traffic-team...
TRANSCRIPT
Dropbox traffic infrastructure
Oleg GubaSRE, [email protected]
Half a billion of users180+ countriesExabytes of users dataPetabytes of metadataMillions requests per secondTerabits of traffic
Dropbox scale
Our Edge network
Our Edge network
Why latency matters?
Why latency matters?150 ms
4RTT + server time = 700ms
20 ms
Why latency matters?
150 ms
4RTT + 1RTT + server time = 330ms
time
CWND
High latency Low latency
Why latency matters?
Reduce latency for interactive traffic
Increase throughput for bulk traffic
Why we want to be close to users
Global server load balancing(GSLB)
GSLB techniques
BGP Anycast
DNS
URL-based
GSLB: Anycast
1.2.3.0/24
GeoDNS
www.dropbox.com?
1.2.3.1
1.2.3.0/24
1.2.3.0/24
GSLB: Anycast
GSLB: Anycast issues
GSLB: Anycast performance
GSLB: Anycast
SimpleAutomatic failover
BGP is not latency awareAlmost no traffic controlNo graceful draining
GSLB: Geo-DNS
GSLB: Geo-DNS
1.2.3.0/24
1.2.2.0/24
GeoDNS
www.dropbox.com?
1.2.1.1www.dropbox.com?1.2.2.11.2.1.0/24
Geo-DNS: TTL is a lie
t0
TTL=60 seconds
t0+15min t0+60min
country A
country B
GSLB: Geo-DNS
GSLB: Geo-DNS
1.2.3.0/24
1.2.2.0/24
GeoDNS
www.dropbox.com?
1.2.1.1
1.2.0.0/19
1.2.0.0/19
1.2.0.0/191.2.1.0/24
GSLB: Geo-DNS
Better routing decision than anycast
Traffic control
Graceful draining
Geo-DNS is not latency aware
Precise GeoIP database required
GSLB: Real user monitoring(RUM)
GSLB: Real user monitoring (RUM)
GSLB: RUM Data processing
Live latency data
Resolver to IP
BGP/AS data
Monitoring
Overrides
RUM routing map
1.0.100.0/24 → NRT
… → …
RUM
data
processor 99.99.9.0/24 → ORD
GSLB: RUM-DNS performance
GSLB: RUM
Best routing decisionsFull control over traffic
Infrastructure requiredComplicated data processing
Inside the Point of Presence
peering peering
InternetDropbox datacenters
PoP network architecture
ECMP
PoP software architectureL4 LBIPVS
L7 LBNGINX
Connection table + chash
ECMP
PoP software architectureL7 LBNGINX
PoP software architecture: IPVSConnection table
?
Consistent hash
PoP software architectureL4 LBIPVS
L7 LBNGINX
Connection table + chashECMP
PoP software architecture: failure modesL4 LBIPVS
L7 LBNGINX
Connection table + chashECMP
ECMP
PoP software architecture: failure modesL4 LBIPVS
L7 LBNGINX
Connection table + chash
NGINX lifecycle
squashfs.img
torrentconfigs
tools
UMS
topology
service discovery
monitoring
UMS
Lua
TCP: Fair queueing
FQsched
TCP: Pacing
Δt
PacerData
tosend
Summary
Dropbox Edge network
Why latency matters
GSLB approaches
PoP architecture
We are hiring!
fb: oleg.guba email: [email protected]