b4:experience with a globally deployed software defined wan › sites › default › files ›...
Post on 10-Jun-2020
3 Views
Preview:
TRANSCRIPT
B4:Experience with a Globally Deployed Software Defined WAN
Why?
• To save money! – WAN hardware and links are over-provisioned
– But this hardware is expensive!
– And Google’s traffic between DC’s is increasing Rapidly!!
Assumptions/Insights
• Control over applications, servers, switches
• Only few dozen Datacenters
• Applications can: – handle failures
– adapt to changing bandwidth
– class and priority tells traffic patterns/importance
Implementation
• Full control over WAN routing – WAN scale SDN deployment
• Managing the links in smart way – Traffic Engineering (TE)
TAKING CONTROL OVER WAN LINKS Step-1
DECIDING WHO GETS RESOURCES Step-2
Traffic Engineering (TE)
• Goal: Share bandwidth among competing applications possibly using multiple paths.
• Sharing bandwidth is defined by Google as max-min fairness.
• Basics of max-min fairness: – No source gets a resource share larger than its
demand. – Sources with unsatisfied demands get an equal share
of the resource.
TE Optimization Algorithm
• Traditional solutions are expensive • Google’s solution
– Aggregate flows into flow-groups, tunnel-groups – 25x faster, and utilizes at least 99% of the bandwidth
• Three Steps – Tunnel selection: select tunnels for flow group (FG) – Tunnel Group Generation: Allocation of bandwidth to
FGs – Tunnel Group Quantization: Changing split ratios in
each TG to match the granularity supported by switch hardware.
TE State and OpenFlow
• B4 switches operate in 3 roles:
1. Encapsulating switch initiates tunnels and splits traffic between them.
2. Transit switch forwards packets based on their outer header.
3. Decapsulating switch terminates tunnels then forwards packets using regular routes.
Using TE and shortest path together
RESULTS Step-3
Link Utilization
Link Utilization
Failures Google conducted experiments to test the recovery time from different types of failures. Their results are summarized below:
Experience with an outage
• During a move of switches from one physical location to another, two switches became manually configured with the same ID.
• Resulted in network state never converging to the topology.
• System recovered after all traffic was stopped, buffers emptied and OFCs restarted from scratch.
Backup
top related