VNF Chain Allocation and Management at Datacenter Scale
Nodir KodirovSam Bayless, Fabian Ruffy, Swati Goswami
Ivan Beschastnikh, Holger Hoos, Alan Hu, Margo Seltzer
Internet…
TenantsCloud Provider
• Security• Firewall, DDoS protection, DPI
• Monitoring• QoE monitor, Network Stats
• Services• Ad insertion, Transcoder
• Network optimization• NAT, Load-balancer, WAN accelerator
# middleboxes ≈ # L2/L3 devices [Sherry et al. SIGCOMM’12]
Network Functions (NF) are useful and widespread
2Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM'12
DDoS protection
carrier-grade NAT
ad insertion
transcoder
BRAS
session border controller
WAN accelerator IDS
load balancer
DPI
QoE monitor
firewall
• Elasticity• Quick scale up and down NFs
• Fast upgrades• No need to wait for new hardware
• Quick configuration, recovery• Failover to the backup NF instance
•Outsourcing
Benefits of Virtualized Network Functions (VNF)
3
Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM’12Rajagopalan et al., Split/Merge: System Support for Elastic Execution in Virtual Middleboxes, NSDI’13Martins et al., ClickOS and the Art of Network Function Virtualization, NSDI'14
DDoS protection
carrier-grade NAT
ad insertion
transcoder
BRAS
session border controller
WAN accelerator IDS
load balancer
DPI
QoE monitor
firewall
Outsourcing VNFs to the Cloud
4
Internet
…
Cloud Provider Tenants
Outsourcing VNFs to the Cloud
5
Internet
…
Cloud Provider Tenants
Outsourcing VNF Chains to the Cloud
6
chain
Internet
…
Cloud Provider Tenants
Challenges of outsourcing VNF Chains
7
chain
How can cloud providers achieve high datacenter utilization?
…
How can tenants allocate and manage their VNF chains?
Our contributions: API and algorithm
8
How can cloud providers achieve high datacenter utilization?
How can tenants allocate and manage their VNF chains?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018
NAT FW IDS VPN2 1 22 1
1
IDS’1 1
VNF Chain: six API with use-cases
9
NAT FW IDS VPN2 1 22 1
1
NAT FW IDS VPN3 2 33 2
1
Chain scale-out Element upgrade
cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)add-node(f, cid)
remove-link-bandwidth(a, b, bw, cid)remove-node(f, cid)remove-e2e-bandwidth(cid, bw)
…
Initial chain
VNF Chain: six API with use-cases
10
NAT FW IDS VPN2 1 22 1
1
Chain scale-out Element upgrade Chain expand …
Initial chain
A graph can be transformed arbitrarily by manipulating individual nodes and edges.
N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018
Scale-out beyond single physical resource capacity
11
NAT FW IDS VPN2 1 22 1
1
Chain scale-out
cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)
(f, cid)
(a, b, bw, cid)(f, cid)
(cid, bw)
Initial chain
NAT FW IDS VPN50 50 40 40
10
50 ToR2
40
40 40
ToR1
40
Gateway
100
• Abstract VNF chain• what tenant requires to allocate
and operates on
• Concrete VNF chain• cloud provider’s implementation
of the abstract chain
• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees
Chain Abstraction: Abstract-Concrete VNF Chains
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
NAT FW IDS VPN50 50 40 40 50
10
…
12
NAT FW IDS VPN50 50 40 40 50
10
10×
Our contributions: API and algorithm
13
How can tenants allocate and manage their VNF chains?
How can cloud providers achieve high datacenter utilization?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
Algorithm inputs: DC topology and chain
14
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
NAT FW IDS VPN2 1 22 1
1
1/8 core1/2 GB
3/8 core1/2 GB
1/2 core2 GB
1/4 core1/2 GB
Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)
Palkar et al., E2: A Framework for NFV Applications, SOSP’15Naik et al., NFVPerf: Online performance monitoring and bottleneck detection for NFV, IEEE NFV-SDN 2016. Nam et al., Probius: Automated Approach for VNF and Service Chain Analysis in Software-Defined NFV, SOSR'18
Algorithms for Chain Allocation and Management
15
1/8 core1/2 GB
3/8 core1/2 GB
1/2 core2 GB
1/4 core1/2 GB
Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
NAT FW IDS VPN2 1 22 1
1
• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs
Algorithms for Chain Allocation and Management
16
NAT IDS2 1 22 1
1
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
VPNFW
• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs
• NetPack: Random + 3 simple heuristics• Consider the chain NFs in a topological order• Re-use the same server when allocating consecutive NFs• Gradually increase the network scope: rack, cluster, etc.
• VNFSolver: how optimal is NetPack?• Constraint-solver based chain allocation algorithm• Slow, but complete: finds a solution when one exists
Algorithms for Chain Allocation and Management
17
10-node
E2 Commercial Facebook
# of
allo
cate
d ch
ains ? ? ? ? ? ?
Palkar et al., E2: A Framework for NFV Applications, SOSP’15Bayless et al., SAT Modulo Monotonic Theories, AAAI'15
R R Rand
om
R
Rand
om
R
Net
Pack
Net
Pack
Net
Pack
Net
Pack
Net
Pack
N
Our contributions: API and algorithm
18
How can tenants allocate and manage their VNF chains?
How can cloud providers achieve high datacenter utilization?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
• How good is the datacenter utilization?• Evaluate Random, NetPack, and VNFSolver• Consider three different datacenter topologies • Use five different VNF chains with varying length (2-10)
• How fast is chain allocation?• Measure time it takes to saturate the datacenter
• Does API reliably implement the use-cases?• Prototype scale-out and chain upgrade in Daisy• Use two different racks, two sources of packet traces
Evaluation: Objectives
19
Datacenter utilization evaluation
20Palkar et al., E2: A Framework for NFV Applications, SOSP'15
NAT IDS2 1 22 1
1
VPNFW
Datacenter utilization evaluation
21
NetPack achieves at least 96% of VNFSolver allocations.Chain allocation time: Random ≲ NetPack ≪ VNFSolver.
Palkar et al., E2: A Framework for NFV Applications, SOSP'15
NetPack Utilization and Speed
22
NetPack achieves at least 96% of VNFSolver allocations while being 82x faster than VNFSolver (optimal) and only up to
54% slower (per chain) than Random (baseline) on average.
Qualitatively similar results with Facebook and Commercial DC topologies with chains of up to 10 nodes.
(see the paper for details)
• Daisy builds on Sonata framework• Mininet to build DC topology• OVS for switches, and Dockers for NFs
• Runs on a single Azure VM• 64 cores, 432 GB RAM
• Emulates use-cases and chain arrivals• scale-out and upgrade use-cases• continuous arrival of tenant chains in rack-scale
Feasibility check: does API work?
23
mininet
DAISY
Peuster et al., Sonata NFV SDK, github.com/sonata-nfv/son-emu, 2017
VNF Chain scale-out
24
Chain scale-out
NAT IDS3 2 33 2
1
VPNFW
Daisy implements scale-out with no packet drops.
Thro
ughp
ut (M
bps)
NAT IDS2 1 22 1
1
VPNFW
Initial chain
Daisy: emulate continuous chain arrival
25
NAT IDS2 1 22 1
1
VPNFW
VNFSolver allocated 75 chains (687 Mbps)NetPack allocated 67 chains (633 Mbps)Random allocated 61 chains (561 Mbps)
(throughput with iperf generated packets is precise)
Agg
rega
te c
hain
th
roug
hput
(Mbp
s)
Time (s)
26
Daisy implements scale-out with no packet drops and element upgrade with 1s packet drop at most.
We also emulated continuous chain arrival case where different tenants make chain allocation requests one-by-one.
Daisy Contributions
• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization
• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver
(optimal) while being 82x faster on average
• Daisy prototype• Demonstrates feasibility of API and algorithms
•Ongoing work: chain abstraction
Snapshot of complete work
27
How can cloud providers achieve high data center utilization?
How can tenants allocate and manage their VNF chains?
Internet
TenantsCloud Provider
…
• Abstract VNF chain• what tenant requires to allocate
and operates on
• Concrete VNF chain• cloud provider’s implementation
of the abstract chain
• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees
• Challenges• low-latency, packet loss,
state synchronization, efficiency loss, hotspots
Chain abstraction challenges
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
…
28
NAT FW IDS VPN50 50 40 40 50
10
10×
Kodirov et al., VNF Chain Abstraction for Cloud Service Providers, ANCS’18 poster
Problem: Hotspots
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
…
29
NAT FW IDS VPN50 50 40 40 50
10
10×
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
Problem: Hotspots
Abstract chain (for Tenants)
30
NAT FW IDS VPN50 50 40 40 50
10
NF(s)bw bw
NIC
Rx co
re-n
core
1co
re0 N
IC T
xNF(s)bw bw
NF(s)bw bw
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
• Hotspots in different layers:
CPU cores, NIC ports, ToR switch ports, etc.
Problem: Hotspots
Abstract chain (for Tenants)
31
NAT FW IDS VPN50 50 40 40 50
10
Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets'18
NF(s)bw bw
NIC
Rx co
re-n
core
1co
re0 N
IC T
xNF(s)bw bw
NF(s)bw bw
Microbenchmarks with up to 3 chains
Concrete chains
32OpenNetVM, github.com/sdnfv/openNetVM, 2016
• tNF: tunable NF• computes N prime numbers, per packet
(<50K in our experiments)• Throughput: 0.83 Mpps
(9900 Mbps with 1500 byte packet)
• NF runs on OpenNetVM• Varied the number of concrete chains
from one to three• Each concrete chain processes a
separate flow (5-tuple)
tNF9.9 9.9
Abstract chain
tNFbw bw
1 co
re, b
w=9
.9
tNFbw bw
2 co
res,
2bw
=4.4
5 x2
tNFbw bw 3 co
res,
3bw
=3.3
x3
Microbenchmarks with up to 3 chains
Concrete chains
33
tNFbw bw
tNFbw bw
tNFbw bwFlow
-Dire
ctor
core
0
core
3co
re2
core
1
Server
9.9Mbps
tNF9.9 9.9
Abstract chain• Tail latencies (99 percentile) in microseconds
• 1 concrete chain (uses 1 core for 9.9Gbps): 336 us• 2 concrete chains (uses 2 cores for 9.9Gbps): 182 us• 3 concrete chains (uses 3 cores for 9.9Gbps): 126 us
1.0
0.8
0.6
0.4
0.2
0
CDF
100
1 c.chain 2 c.chains 3 c.chains
Packet processing latency in log scale (microseconds)
Observations and Ongoing Work
34
Concrete chains
tNFbw bw
tNFbw bw
tNFbw bw
• Latency grows proportional to the core load
• Need CPU load-aware chain-splitting mechanism• N flows to 1 core (N-to-1) for mice flows• 1-to-N for elephant flows
• Splitting should happen in multiple levels• CPU cores, NIC ports, ToR ports
• Need to support wide-range of NFs• Bounded tail latencies are particularly challenging
for stateful NFs, such as DPI
tNF9.9 9.9
Abstract chain
Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets’18Khalid and Akella, Correctness and Performance for Stateful Chained Network Functions, NSDI’19
• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization
• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver
(optimal) while being 82x faster on average• Ongoing work: chain abstraction
• Need load-aware chain-splitting mechanism
Conclusion
35
How can cloud providers achieve high data center utilization?
How can tenants allocate and manage their VNF chains?
Internet
TenantsCloud Provider
…
Thank you!