![Page 1: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/1.jpg)
INFRASTRUCTUREINFRASTRUCTURE
Edge Fabric:Steering Oceans of Content to the world
Robel KitabaNetwork Engineer, Facebook
![Page 2: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/2.jpg)
Locations just for visualization purposes, it does not reflect current configuration.
Global Load BalancerManages ingress traffic
![Page 3: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/3.jpg)
Locations just for visualization purposes, it does not reflect current configuration.
Latency based telemetry (SONAR)
![Page 4: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/4.jpg)
PX
Network
Bac
kbon
e
TransitPNI
PoP: Point of Presence (colo facilities)
PNI Links: Direct peering with user networks
PX Links: Peering with networks over shared infrastructure
Transit Links: Peering with intermediate networks that provide global reachability
![Page 5: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/5.jpg)
Total egress capacity at PoP
Total traffic at PoP
1 Day
![Page 6: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/6.jpg)
Total egress capacity at PoP
Total traffic at PoPCapacity for iface@PoPDemand for iface@PoP
1 Day
>250%
Drops
![Page 7: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/7.jpg)
Why demands exceeds capacity
Peering with other networks using BGP
Local Preference
Med
AS Path length
Communities
BGP (STATIC)
best BGP path
POP
![Page 8: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/8.jpg)
Why demands exceeds capacity
Peering with other networks using BGP
Local Preference
Med
AS Path length
Communities
Traffic demand changes
Limited capacity
Performance variations
Transient failures
BGP (STATIC) REALITY (DYNAMIC)
best BGP path UnusedOverloaded
POP
![Page 9: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/9.jpg)
Local Edge ControllerEdge Fabric
"Engineering Egress with Edge Fabric: Steering Oceans of Content to the World", Brandon Schlinker et al, SIGCOMM 2017
![Page 10: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/10.jpg)
LOCAL CONTROLLER’S JOURNEY
PNI Transit 1PX
Manual interventions to change BGP policy when there were failures in PNIs
Setup MPLS paths from end hosts to PRs in order to choose egress links
Use DSCP marking at the end hosts to indicate egress link
not scalable, too slow, error prone
Restrictions on hw
Not scalable, coordination of config, rigid assumptions
V0
V1
V2
V0 V1 V2 V3 V4
Rack
Rack
Rack
Transit 2
Network 1
Use GRE tunnels from end hosts to PRsV3 Coordination of config, vendor bugLOCAL
CONTROLLER
PEERING ROUTER
EDGE CLUSTER
![Page 11: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/11.jpg)
LOCAL CONTROLLER’S JOURNEY
Network 1
PNITransitPX
Manual interventions to change BGP policy when there were failures in PNIs
Setup MPLS paths from end hosts to PRs in order to choose egress links
Use DSCP marking at the end hosts to indicate egress link
Use GRE tunnels from end hosts to PRs
Use BGP injections at PRs
not scalable, too slow, error prone
Restrictions on hw
Not scalable, coordination of config, rigid assumptions
Coordination of config, vendor bug
Flexible, dynamic, decouples decisions from PoP architecture
V0
V1
V2
V3
V4Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
V0 V1 V2 V3 V4
LOCAL CONTROLLER
EDGE CLUSTER
PEERING ROUTER
![Page 12: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/12.jpg)
Dest 1.2.3.0/24LocalPref 500
ASPath 100
Nexthop 42.1.3.1
Community 100:1
Dest 1.2.3.0/24LocalPref 200
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
1.2.3.0/24
BGP INJECTION MODE
PEERING ROUTER TRANSIT
PNI
![Page 13: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/13.jpg)
Dest 1.2.3.0/24LocalPref 500
ASPath 100
Nexthop 42.1.3.1
Community 100:1
Dest 1.2.3.0/24LocalPref 200
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
1.2.3.0/24
EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
BGP INJECTION MODE
PEERING ROUTER TRANSIT
PNIBGP Session
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
![Page 14: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/14.jpg)
1.2.3.0/24
BGP INJECTION MODE
PEERING ROUTER TRANSIT
PNI
Dest 1.2.3.0/24LocalPref 500
ASPath 100
Nexthop 42.1.3.1
Community 100:1
Dest 1.2.3.0/24LocalPref 200
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
BGP Session
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
![Page 15: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/15.jpg)
1.2.3.0/24
EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
BGP INJECTION MODE
PEERING ROUTER TRANSIT
PNI
Dest 1.2.3.0/24LocalPref 500
ASPath 100
Nexthop 42.1.3.1
Community 100:1
Dest 1.2.3.0/24LocalPref 200
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
BGP Session
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Dest 1.2.3.0/24LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
![Page 16: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/16.jpg)
Dest 1.2.3.0/24LocalPref 50000
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
1.2.3.0/24
EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
BGP INJECTION MODEDest 1.2.3.0/24LocalPref 500
ASPath 100
Nexthop 42.1.3.1
Community 100:1
Dest 1.2.3.0/24LocalPref 200
ASPath 7018,100
Nexthop 201.2.4.12
Community 7018:1
PEERING ROUTER TRANSIT
PNIBGP Session
![Page 17: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/17.jpg)
Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Dest 1:2400::/34LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
PEERING
TRANSIT
1:2400::/24EF CONTROLLER
Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Split prefix traffic
![Page 18: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/18.jpg)
Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Dest 1:2400::/34LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
PEERING 1:2400::/34
TRANSIT
1:2400::/24EF CONTROLLER
Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1
Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1
Split prefix traffic
![Page 19: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/19.jpg)
SYSTEM ARCHITECTURE
prefix via v.x.y.z
Interface Info (SNMP)
Traffic Rates (Netflow/Sflow)
BGP Routes (BMP)
Policy & Config
Topology Info (FBNet)
Controller
Peering Routers
Route Overrides
BGP Injector
w/ Audits to make it more robust
BMP Audit Netflow Audit
Injector AuditRoute Audit
![Page 20: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/20.jpg)
Total egress capacity at PoP
Total traffic at PoPCapacity for iface@PoPDemand for iface@PoP
1 Day
![Page 21: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/21.jpg)
Total egress capacity at PoP
Total traffic at PoP
Capacity for iface@PoPDemand for iface@PoP
1 DayTraffic on iface@PoP w/Edge Fabric
Avoid packet drops while maintaining high link utilization
![Page 22: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/22.jpg)
Looking beyond Facebook's network
Local Preference
Med
AS Path length
Communities
Traffic demand changes
Limited capacity
Performance variations
Transient failures
BGP (STATIC) REALITY (DYNAMIC)
Best BGP Path
POP
Facebook’s Network
?
![Page 23: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/23.jpg)
Performance RoutingAlternative Path Measurements
![Page 24: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/24.jpg)
Network 1
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Collect TCP stats for transactions (RTT, packet loss, throughput)
![Page 25: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/25.jpg)
Network 1
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Collect TCP stats for transactions (RTT, packet loss, throughput)
Allow us to monitor performance only to the primary path
![Page 26: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/26.jpg)
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Collect TCP stats for transactions (RTT, packet loss, throughput)
Allow us to monitor performance only to the primary path
Send a very small portion of traffic over alternate paths
Network 1
![Page 27: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/27.jpg)
Mark random flows with special DSCP values
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Network 1
![Page 28: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/28.jpg)
Mark random flows with special DSCP values
Configure alternate routing tables per DSCP value
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Network 1
![Page 29: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/29.jpg)
Mark random flows with special DSCP values
Insert routes into the alternate routing tables
APM CONTROLLER
PNITransitPX
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Rack
Configure alternate routing tables per DSCP value
Network 1
![Page 30: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/30.jpg)
Temporary congestion of the primary path
Interesting Examples
1 Day
thro
ughp
ut
Alternate path 2
Alternate path 1
Primary path
![Page 31: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/31.jpg)
Public Exchange Performance problem
AS 300 AS 400
AS 32934
AS 100 AS 200
Peer’s capacity is unknown
PX
??
? ?
![Page 32: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/32.jpg)
Public Exchange Performance problem
AS 300 AS 400
AS 32934
AS 100 AS 200
Peer’s capacity is unknown
PX
![Page 33: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/33.jpg)
Path Performance Monitoring Service
Computes effective Peer’s capacity on PX
HTTP TCP Stats
BGP Routes
Stats Aggregator
Traffic Rates
Capacity limit computation
![Page 34: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/34.jpg)
Public Exchange Performance problem
AS 300 AS 400
AS 32934
AS 100 AS 200
Infer how much traffic to send without overwhelming the peer
PX
![Page 35: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/35.jpg)
ENHANCE EDGE FABRIC W/ PERFORMANCE
prefix via v.x.y.z
Interface Info (SNMP)
Traffic Rates (Netflow/Sflow)
BGP Routes (BMP)
Policy & Config
Topology Info (FBNet)
Performance Limits
Controller
Peering Routers
Route Overrides
BGP Injector
BMP Audit Netflow Audit
Injector AuditRoute Audit
![Page 36: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018 · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM](https://reader035.vdocuments.site/reader035/viewer/2022063010/5fc32f3b1a508c71bc44b3e7/html5/thumbnails/36.jpg)
Thanks