niky r networking at scale v6 sl - afpif · 03/08/2018  · niky_r_networking_at_scale_v6_sl.key...

36
INFRASTRUCTURE INFRASTRUCTURE Edge Fabric : Steering Oceans of Content to the world Robel Kitaba Network Engineer, Facebook

Upload: others

Post on 19-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

INFRASTRUCTUREINFRASTRUCTURE

Edge Fabric:Steering Oceans of Content to the world

Robel KitabaNetwork Engineer, Facebook

Page 2: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Locations just for visualization purposes, it does not reflect current configuration.

Global Load BalancerManages ingress traffic

Page 3: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Locations just for visualization purposes, it does not reflect current configuration.

Latency based telemetry (SONAR)

Page 4: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

PX

Network

Bac

kbon

e

TransitPNI

PoP: Point of Presence (colo facilities)

PNI Links: Direct peering with user networks

PX Links: Peering with networks over shared infrastructure

Transit Links: Peering with intermediate networks that provide global reachability

Page 5: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Total egress capacity at PoP

Total traffic at PoP

1 Day

Page 6: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Total egress capacity at PoP

Total traffic at PoPCapacity for iface@PoPDemand for iface@PoP

1 Day

>250%

Drops

Page 7: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Why demands exceeds capacity

Peering with other networks using BGP

Local Preference

Med

AS Path length

Communities

BGP (STATIC)

best BGP path

POP

Page 8: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Why demands exceeds capacity

Peering with other networks using BGP

Local Preference

Med

AS Path length

Communities

Traffic demand changes

Limited capacity

Performance variations

Transient failures

BGP (STATIC) REALITY (DYNAMIC)

best BGP path UnusedOverloaded

POP

Page 9: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Local Edge ControllerEdge Fabric

"Engineering Egress with Edge Fabric: Steering Oceans of Content to the World", Brandon Schlinker et al, SIGCOMM 2017

Page 10: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

LOCAL CONTROLLER’S JOURNEY

PNI Transit 1PX

Manual interventions to change BGP policy when there were failures in PNIs

Setup MPLS paths from end hosts to PRs in order to choose egress links

Use DSCP marking at the end hosts to indicate egress link

not scalable, too slow, error prone

Restrictions on hw

Not scalable, coordination of config, rigid assumptions

V0

V1

V2

V0 V1 V2 V3 V4

Rack

Rack

Rack

Transit 2

Network 1

Use GRE tunnels from end hosts to PRsV3 Coordination of config, vendor bugLOCAL

CONTROLLER

PEERING ROUTER

EDGE CLUSTER

Page 11: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

LOCAL CONTROLLER’S JOURNEY

Network 1

PNITransitPX

Manual interventions to change BGP policy when there were failures in PNIs

Setup MPLS paths from end hosts to PRs in order to choose egress links

Use DSCP marking at the end hosts to indicate egress link

Use GRE tunnels from end hosts to PRs

Use BGP injections at PRs

not scalable, too slow, error prone

Restrictions on hw

Not scalable, coordination of config, rigid assumptions

Coordination of config, vendor bug

Flexible, dynamic, decouples decisions from PoP architecture

V0

V1

V2

V3

V4Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

V0 V1 V2 V3 V4

LOCAL CONTROLLER

EDGE CLUSTER

PEERING ROUTER

Page 12: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Dest 1.2.3.0/24LocalPref 500

ASPath 100

Nexthop 42.1.3.1

Community 100:1

Dest 1.2.3.0/24LocalPref 200

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

1.2.3.0/24

BGP INJECTION MODE

PEERING ROUTER TRANSIT

PNI

Page 13: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Dest 1.2.3.0/24LocalPref 500

ASPath 100

Nexthop 42.1.3.1

Community 100:1

Dest 1.2.3.0/24LocalPref 200

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

1.2.3.0/24

EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

BGP INJECTION MODE

PEERING ROUTER TRANSIT

PNIBGP Session

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Page 14: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

1.2.3.0/24

BGP INJECTION MODE

PEERING ROUTER TRANSIT

PNI

Dest 1.2.3.0/24LocalPref 500

ASPath 100

Nexthop 42.1.3.1

Community 100:1

Dest 1.2.3.0/24LocalPref 200

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

BGP Session

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Page 15: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

1.2.3.0/24

EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

BGP INJECTION MODE

PEERING ROUTER TRANSIT

PNI

Dest 1.2.3.0/24LocalPref 500

ASPath 100

Nexthop 42.1.3.1

Community 100:1

Dest 1.2.3.0/24LocalPref 200

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

BGP Session

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Dest 1.2.3.0/24LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Page 16: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Dest 1.2.3.0/24LocalPref 50000

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

1.2.3.0/24

EF CONTROLLERDest 1.2.3.0/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1.2.3.0/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

BGP INJECTION MODEDest 1.2.3.0/24LocalPref 500

ASPath 100

Nexthop 42.1.3.1

Community 100:1

Dest 1.2.3.0/24LocalPref 200

ASPath 7018,100

Nexthop 201.2.4.12

Community 7018:1

PEERING ROUTER TRANSIT

PNIBGP Session

Page 17: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Dest 1:2400::/34LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

PEERING

TRANSIT

1:2400::/24EF CONTROLLER

Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Split prefix traffic

Page 18: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Dest 1:2400::/34LocalPref 50000ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

PEERING 1:2400::/34

TRANSIT

1:2400::/24EF CONTROLLER

Dest 1:2400::/24LocalPref 500ASPath 100Nexthop 42.1.3.1Community 100:1

Dest 1:2400::/24LocalPref 200ASPath 7018,100Nexthop 201.2.4.12Community 7018:1

Split prefix traffic

Page 19: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

SYSTEM ARCHITECTURE

prefix via v.x.y.z

Interface Info (SNMP)

Traffic Rates (Netflow/Sflow)

BGP Routes (BMP)

Policy & Config

Topology Info (FBNet)

Controller

Peering Routers

Route Overrides

BGP Injector

w/ Audits to make it more robust

BMP Audit Netflow Audit

Injector AuditRoute Audit

Page 20: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Total egress capacity at PoP

Total traffic at PoPCapacity for iface@PoPDemand for iface@PoP

1 Day

Page 21: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Total egress capacity at PoP

Total traffic at PoP

Capacity for iface@PoPDemand for iface@PoP

1 DayTraffic on iface@PoP w/Edge Fabric

Avoid packet drops while maintaining high link utilization

Robel Kitaba
Robel Kitaba
Page 22: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Looking beyond Facebook's network

Local Preference

Med

AS Path length

Communities

Traffic demand changes

Limited capacity

Performance variations

Transient failures

BGP (STATIC) REALITY (DYNAMIC)

Best BGP Path

POP

Facebook’s Network

?

Page 23: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Performance RoutingAlternative Path Measurements

Page 24: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Network 1

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Collect TCP stats for transactions (RTT, packet loss, throughput)

Page 25: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Network 1

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Collect TCP stats for transactions (RTT, packet loss, throughput)

Allow us to monitor performance only to the primary path

Page 26: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Collect TCP stats for transactions (RTT, packet loss, throughput)

Allow us to monitor performance only to the primary path

Send a very small portion of traffic over alternate paths

Network 1

Page 27: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Mark random flows with special DSCP values

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Network 1

Page 28: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Mark random flows with special DSCP values

Configure alternate routing tables per DSCP value

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Network 1

Page 29: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Mark random flows with special DSCP values

Insert routes into the alternate routing tables

APM CONTROLLER

PNITransitPX

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Rack

Configure alternate routing tables per DSCP value

Network 1

Page 30: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Temporary congestion of the primary path

Interesting Examples

1 Day

thro

ughp

ut

Alternate path 2

Alternate path 1

Primary path

Page 31: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Public Exchange Performance problem

AS 300 AS 400

AS 32934

AS 100 AS 200

Peer’s capacity is unknown

PX

??

? ?

Page 32: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Public Exchange Performance problem

AS 300 AS 400

AS 32934

AS 100 AS 200

Peer’s capacity is unknown

PX

Page 33: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Path Performance Monitoring Service

Computes effective Peer’s capacity on PX

HTTP TCP Stats

BGP Routes

Stats Aggregator

Traffic Rates

Capacity limit computation

Page 34: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Public Exchange Performance problem

AS 300 AS 400

AS 32934

AS 100 AS 200

Infer how much traffic to send without overwhelming the peer

PX

Page 35: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

ENHANCE EDGE FABRIC W/ PERFORMANCE

prefix via v.x.y.z

Interface Info (SNMP)

Traffic Rates (Netflow/Sflow)

BGP Routes (BMP)

Policy & Config

Topology Info (FBNet)

Performance Limits

Controller

Peering Routers

Route Overrides

BGP Injector

BMP Audit Netflow Audit

Injector AuditRoute Audit

Page 36: Niky R Networking at scale V6 SL - AfPIF · 03/08/2018  · Niky_R_Networking_at_scale_V6_SL.key Author: Niky Riga Created Date: 8/21/2018 3:26:11 AM

Thanks