mobile packet core performance increases on 2nd generation … · 2019-12-24 · next generation of...
TRANSCRIPT
1
Mobile Packet core PerforMance increases on 2nd Generation intel® Xeon® scalable Processors
Contents
01
02
02
04
06
07
08
08
09
10
11
12
13
15
15
16
1.0 Executive Summary
2.0 Test Environment
2.1 Core Network Overview
2.2 Test Infrastructure
2.3 Test Methodology
2.4 Device Under Test (DUT)
3.0 Results
3.1 Intel® Xeon® Gold 6230 Processor: vEPC and eMBB
3.2 Intel® Xeon® Gold 6230N Processor: vEPC and eMBB
3.3 Intel® Xeon® Gold 6230N Processor: vUPF and eMBB
3.4 Intel® Xeon® Gold 6230N Processor: vFWA
3.5 Intel® Xeon® Platinum 8280 Processor: vFWA
4.0 Summary
5.0 Conclusion
6.0 Appendix: Platform Details
7.0 Glossary
1
Intel® Processor Cores/CPU Cores Used/CPU ForwardingThroughput
Virtual Machine Utilization Rate
Use Case
Intel® Xeon® Gold 6230 Processor 20 17.5 212 Gbps on
2 CPU 78% eMBB vEPC
Intel® Xeon® Gold 6230N Processor 20 17.5 211 Gbps on
2 CPU 76% eMBB vEPC
Intel Xeon Gold 6230N Processor 20 19 210 Gbps on
2 CPU 69% eMBB vUPF
Intel Xeon Gold 6230N Processor 20 19 222 Gbps on
1 CPU 63% vFWA
Intel® Xeon® Platinum 8280
Processor28 24
253 Gbps on 1 CPU
73% vFWA
1.0 Executive Summary
This white paper presents performance data for several virtualized evolved packet core (EPC) and 5G Core Network (5GCN) functions running on three SKUs of 2nd Gen Intel® Xeon® Scalable processors, formally codenamed Cascade Lake. Intel engineers tested against typical enhanced Mobile Broadband (eMBB) and fixed wireless access (FWA) traffic models and pipelines. The testing was focused on user plane performance for both control and user plane separation (CUPS)-based EPC implementations and the 5G user plane functions (UPFs). The results demonstrate performance continues to improve steadily in this domain, with mid-range or Intel® Xeon® Gold processors delivering throughput in the range of 100 gigabits-per-second (Gbps) per socket for typical eMBB workloads and in excess of 250 Gbps per socket for FWA workloads. The forwarding throughput and virtual machine (VM) utilization rate for five use cases are summarized in Table 1.
Table 1. Throughput Measurements for Three Intel® Xeon® Scalable Processors Running Various Packet Forwarding Workloads
This high performance is achieved entirely with software, meaning no hardware acceleration or offload of the packet core pipeline was employed. The packets per second and aggregate throughput performance have a very high level of determinism for all use cases, including both EPC and 5G core network (5GCN) pipelines. Moreover, the forwarding function throughput is linear with core counts (i.e., between 2 and 20 cores). In the future, Intel will publish more packet core benchmarks with the next generation of Intel® Xeon® processors and additional use case pipelines.
In addition, the forwarding throughput of 2nd Gen Intel Xeon Scalable processors (Cascade Lake) was approximately 25% higher than those measured on prior generation Intel Xeon Scalable processors (Skylake) for equivalent core count, as described in section 4.0.
2
UTRAN
GERN
SGSN
HSS
PCRFMME
UE
S3
S1-MME
LTE-Uu
S1-U
S10S11
S6a
S4
S12
Gx Rx
SGiServingGateway
PDNGatewayE-UTRAN
Operator’s IPServices
(e.g. IMS, PSS, etc.)
2.0 Test Environment
ASTRI (Applied Science and Telecoms Research Institute - www.astri.org), based in Hong Kong, supplied the 4G EPC and 5GCN stacks used to test the five use cases. ASTRI gave Intel permission to use and modify their source code, which was used to demonstrate typical pipeline performance (via compiled binaries) and share best-known-methods with the communications service providers (CoSPs) and telecom vendor communities.
The test environment for characterizing the Intel Xeon processor-based platforms consisted of standard test equipment (Spirent Landslide). It tested the user plane performance for various call models and use cases, and not an entire EPC or 5GCN.
2.1 Core Network OverviewFigure 1 shows the 4G non-roaming architecture from the European Telecommunications Standards Institute (ETSI) in technical specification (TS) 23.401. The goal of this study was to maximize the performance of the serving gateway (SGW) and packet data network gateway elements (PGW). All other elements interfaced to the device under test (DUT) were either simulated or placed on external servers.
Figure 1. Non-Roaming 4G Architecture
The stack under test was Release 14 CUPS-enabled, as per TS 23.214 (Figure 2). More specifically, the DUT ran the user plane workloads, and the control plane workloads were external to the system.
3
Figure 2. CUPS EPC Architecture
Figure 3 shows the 5G non-roaming reference model from TS 23.501. User plane processing is shown along the bottom of the figure: from the user entity (UE) to the radio access network (RAN), to the N3 interface, to the UPF and N6 IP encapsulation, and to the external data network (DN). All other functions are control plane orientated, such as for managing authentication, mobility, charging, slicing, and policy.
Figure 3. Non-Roaming 5G Architecture
The testing focused on the performance of the UPF, as this is the main forwarding element in the core architecture. Other elements, such as the UE, RAN, and DN, were simulated by testers. The Authentication Management Function (AMF) and Session Management Function (SMF) were used to test the user plane as it established, deactivated, and moved session in response to mobility events.
Gn/Gp-C
S2a-CS2b-C
Gx Gy Gz Gw Sd Gyn Gwn GznS6b
S5/8-C
S12
S1 1-US4-US1-U
Operator’s IPServices
(e.g. IMS, PSS, etc.)S5/8-U
Gn/Gp-U
S2a-U
S2a-U
SGi SGi
Sxa
S11 S4-C
Sxb Sxc
ServingGateway-U
PDNGateway-U TDF-U
ServingGateway-C
TDF-CPDNGateway-C
DN
N14 N15
N5N7
N13
N22 N12 N8 N10
N11
N4N2
N3
N9
N5
N1
NSSF AUSF UDM
AMF SMF PCF AF
UE UPFRAN
4
2.2 Test Infrastructure 2.2.1 4G Harness
The 4G testbed is shown in Figure 4. At the bottom left of the figure, the Spirent Landslide generates uplink traffic. This traffic transits the top-of-rack (ToR) switch and crosses the EPC user plane DUT. The Gi traffic is then terminated in external servers on the bottom right. These servers terminate the Gi uplink and reflect the downlink reply to the bearers; they also sink the downlink after it has traversed the DUT (sGi to S1). This harness is used to stress the system for different uplink/downlink ratios, depending on the desired CoSP use case. The Mobility Management Entity (MME), Serving Gateway Control Plane Function (SGW-C), PDN Gateway Control Plane Function (PGW-C), and Traffic Detection Control Plane Function (TDF-C) are hosted on a separate control plane server, shown in the top right.
Figure 4. 4G Test Harness
For additional clarity, the traffic flows are shown in Figure 5. The uplink traffic (in red) is generated by the Spirent Landslide, and the downlink traffic (in blue) is generated by the reflection server.
5
Figure 5. 4G Test Harness Traffic Flows
2.2.2 5G Harness
The 5G system uses a similar infrastructure model, as shown in Figure 6. Here, the DUT hosts the UPF, and the AMF and SMF functions are hosted by the server shown in the upper right box.
Figure 6. 5G Test Harness
6
2.3 Test Methodology Two use cases using different Intel processors were run across the 4G and 5G harnesses: Enhanced Mobile Broadband (eMBB) and fixed wireless access (FWA). The two pipelines differ as follows:
1. eMBB: All user plane functions switched on for vEPC and 5GCN, as shown in Figures 7 and 8.
2. FWA: No charging, deep packet inspection (DPI), quality of service (QoS), access control lists (ACLs), or mobility.
Figure 7: 4G eMBB Pipeline
Figure 8: 5GCN eMBB Pipeline
The traffic model for the base setup is shown in Table 2. Changes to the base setup for a given test are indicated in the results.
7
Figure 9: Intel Reference Board
Table 2. Base Setup for the Traffic Model
2.4 Device under Test (DUT) During testing that was conducted and concluded in March of 2019, the EPC and 5GCN forwarding functionality ran on an Intel reference board (Figure 9), featuring balanced I/O system with 2x16 PCI Express Gen3 lanes and 96 GB of DRAM (6x16GB) per socket.
UE Count 100KUEs, 2Mbits per UE in connected state on user plane
Bearers 2 bearers per UE:1 default,1 dedicated
Flows 4 flows per bearer, 800Kflows per DUT
IMIX Packet size combination of 64B, 128B, 256B, 512B, 1024B, 1400B
Traffic Profiles
5:5 –50% uplink, 50% downlink
2.5:7.5 –25% uplink, 75% downlink
4:6 –40% uplink, 60% downlink
1:9 –10% uplink, 90% downlink
Pipeline SupportsMobility, Downlink Data notifications, fragmentation/reassembly, network address translation (NAT), control plane-based charging data record (CDR) generation, but not included specifically in test scenario.
Pipeline Does Not Support
IPSec or DPI –These functions are being integrated into future testing
8
3.0 Results
The following descriptions provide a high-level summary of testing of the five use cases. The platform details for the five use cases are shown in Appendix 1.
3.1 Intel® Xeon® Gold 6230 Processor: vEPC and eMBB The VM mapping for the DUT is shown in Figure 10.
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
20 60 21 61 22 62 23 63 24 64 25 65 26 66 27 67 28 68 29 69 30 70 31 71 32 72 33 73 34 74 35 75 36 76 27 77 28 78 29 79
7 VM6 VM7 VM8 VM9 VM10 6 8 9 10
Hos
t OS
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
0 40 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52 13 53 14 54 15 55 16 56 17 57 18 58 19 59
2 VM1 VM2 VM3 VM4 VM5 1 3 4 5
Hos
t OS
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Figure 10: VM Mapping for the Dual Intel® Xeon® Gold 6230 Processors
Figure 11 shows the Intel reference board delivered consistent performance per VM, with an average forwarding throughput of 21.2 Gbps per VM and 212 Gbps (as measured on S1 aggregate UL + DL) for the entire board.
Figure 11: Dual Intel® Xeon® Gold 6230 Processor Performance: Forwarding Throughput and Packets per Second per Virtual Machine (VM)
9
Figure 12: VM Mapping for the Dual Intel® Xeon® Gold 6230N Processors
Figure 13 shows the Intel reference board delivered consistent performance per VM, with an average forwarding throughput of 21.1 Gbps per VM and 211 Gbps (as measured on S1 aggregate UL + DL) for the entire board.
Figure 13: Dual Intel® Xeon® Gold 6230N Processor Performance: Forwarding Throughput and Packets per Second per Virtual Machine (VM)
Typically, we see about 10% performance increase from 6230 to 6230N as there is a frequency increase from 2.1 to 2.3GHz. It was not possible to show this performance increase in this scenario as the DUT was IO bound.
The packet per second performance was also consistent among VMs, and the CPU core utilization rate varied between 74 and 78% across all VMs.
3.2 Intel® Xeon® Gold 6230N Processor: vEPC and eMBB The VM mapping for the DUT is shown in Figure 12.
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
20 60 21 61 22 62 23 63 24 64 25 65 26 66 27 67 28 68 29 69 30 70 31 71 32 72 33 73 34 74 35 75 36 76 27 77 28 78 29 79
7 VM6 VM7 VM8 VM9 VM10 6 8 9 10
Hos
t OS
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
0 40 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52 13 53 14 54 15 55 16 56 17 57 18 58 19 59
2 VM1 VM2 VM3 VM4 VM5 1 3 4 5
Hos
t OS
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
10
3.3 Intel® Xeon® Gold 6230N Processor: vUPF and eMBB The VM mapping for the DUT is shown in Figure 14. In this test case, an entire CPU was assigned to a given UPF, and no performance degradation occurred. In other words, there was no penalty on Intel® architecture when scaling up VMs or container sizes from a low number to a high number of cores. The system consistently exhibited small deviation from minimum, average to maximum core utilization rate as traffic was scaled across the infrastructure.
SOCKET 0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 40 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52 13 53 14 54 15 55 16 56 17 57 18 58 19 59
0 1 20 2 21 3 22 4 23 5 24 6 25 7 26 8 27 9 28 10 29 11 30 12 31 13 32 14 33 15 34 16 35 17 36 18 37 19 38
VM1
SOCKET 1
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
20 60 21 61 22 62 23 63 24 64 25 65 26 66 27 67 28 68 29 69 30 70 31 71 32 72 33 73 34 74 35 75 36 76 37 77 38 78 39 79
0 1 20 2 21 3 22 4 23 5 24 6 25 7 26 8 27 9 28 10 29 11 30 12 31 13 32 14 33 15 34 16 35 17 36 18 37 19 38
VM2
Figure 14: VM Mapping for the Dual Intel® Xeon® Gold 6230N Processors
Figure 15 shows the Intel reference board delivered consistent performance per VM, with an average forwarding throughput of 105.2 Gbps on VM1 and 210.4 Gbps (as measured on S1 aggregate UL + DL) for the entire board.
Hos
t OS
Hou
se K
eep
ing
VM
1 H
ouse
Hos
t OS
Hou
se K
eep
ing
VM
2 H
ouse
11
Figure 16: VM Mapping for the Dual Intel® Xeon® Gold 6230N Processors
Figure 17 shows the Intel reference board delivered consistent performance for all 12 VMs, with an average forwarding throughput of 18.5 Gbps and 222 Gbps (as measured on S1 aggregate UL + DL) for the entire board.
Figure 15: Dual Intel® Xeon® Gold 6230N Processor Performance: Forwarding Throughput and Packets per Second per Virtual Machine (VM)
The packet per second performance was similar for both VMs, more than 21 million packets per second and CPU core utilization rates averaging at 89% for both VMs.
3.4 Intel® Xeon® Gold 6230N Processor: vFWA The VM mapping for the DUT is shown in Figure 16.
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
20 60 21 61 22 62 23 63 24 64 25 65 26 66 27 67 28 68 29 69 30 70 31 71 32 72 33 73 34 74 35 75 36 76 37 77 38 78 39 79
VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 VM9 VM10 VM11 VM12
1 4 7 10
2 5 8 11
3 6 9 12
NIC-1 NIC-2 NIC-3 NIC-4 NIC-5 NIC-6From CPU-0
8686 8888 afaf b1b1 dada 1818
0 1 2 3 4 8 9 10 11 12 16 17 18 19 20 24 25 26 27 28
0 40 1 41 2 42 3 43 4 44 5 45 6 46 7 47 8 48 9 49 10 50 11 51 12 52 13 53 14 54 15 55 16 56 17 57 18 58 19 59
Unused
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Hos
t OS
12
Figure 17: Dual Intel® Xeon® Gold 6230N Processor Performance: Forwarding Throughput and Packets per Second per Virtual Machine (VM)
The packet per second performance was consistent across the VMs in the system, and CPU core utilization rates averaged 63% across all VMs with very little variance.
3.5 Intel® Xeon® Platinum 8280 Processor: vFWA The VM mapping for the DUT is shown in Figure 18. As mentioned previously, the vFWA was a simplified pipeline. For this test, 120K user entities (UEs) were run (with just one bearer per UE with two flows) in order to achieve higher throughput. The number of EPC user plane VMs was increased to 12 on Socket1, and NICs were added to drive more capacity.
Figure 18: VM Mapping for the Dual Intel® Xeon® Platinum 8280 Processors
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
28 84 29 85 30 86 31 87 32 88 33 89 34 90 35 91 36 92 37 93 38 94 39 95 40 96 41 97 42 98 43 99 44 100 45 101 46 102 47 103 48 104 49 105 50 106 51 107 52 108 53 109 54 110 55 111
VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8 VM9 VM10 VM11 VM121 3 5 7 9 11
2 4 6 8 10 12
NIC-1 NIC-2 NIC-3 NIC-4 NIC-5 NIC-6From CPU-0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
0 56 1 57 2 58 3 59 4 60 5 61 6 62 7 63 8 64 9 65 10 66 11 67 12 68 13 69 14 70 15 71 16 72 17 73 18 74 19 75 20 76 21 77 22 78 23 79 248025 81 26 82 27 83
Unused
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Hos
t OS
Hou
se K
eep
ing
Un
used
Un
used
13
Figure 19 shows the performance on single CPU socket.
• Packet Rate: 31.7 MPPS (8.9 MPPS uplink, 22.8MPPS downlink)
• Aggregate TPT: 253 Gbps (26.7 Gbps uplink, 226 Gbps downlink)
• CPU utilization: 73%
Figure 19: Dual Intel® Xeon® Platinum 8280 Processor Performance: Forwarding Throughput and Packets per Second per Virtual Machine (VM)
Packet Rate Breakdown By Packet Size
14,000,000
12,000,000
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
16,000,000
64
0
0
128
2,787,735
2,447,183
256
0
0
512
205,517
327,647
1024
1,053,316
848,830
1400
684,861
15,107,390
0
UL Pkt/Sec
DL Pkt/Sec
Throughput Breakdown By Packet Size
140,000
160,000
100,000
120,000
60,000
40,000
80,000
20,000
180,000
64
0
0
128
2,855
2,506
256
0
0
512
842
1,342
1024
8,629
6,954
1400
7,670
169,203
0
UL TPT (Mbps)
DL TPT (Mbps)
Packet Rate Breakdown By Packet Size
18,000,000
16,000,000
14,000,000
12,000,000
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
20,000,000
64
0
0
128
3,345,282
2,936,620
256
0
0
512
246,620
393,177
1024
1,263,979
1,018,595
1400
821,833
18,128,868
0
UL Pkt/Sec
DL Pkt/Sec
Throughput Breakdown By Packet Size
200,000
150,000
100,000
50,000
250,000
64
0
0
128
3,426
3,007
256
0
0
512
1,010
1,610
1024
10,355
8,344
1400
9,205
203,043
0
UL TPT (Mbps)
DL TPT (Mbps)
14
LTE vEPC User Plane Performance
TPT
(GB
PS)
CPU
Util
izat
ion
Intel® Xeon® Platinum8180 Processor
28C/2.5 GHz,Skylake Family
Intel® Xeon® Gold6130 processor
16C/2.1 GHz,Skylake Family
Intel® Xeon® Gold6132 Processor
14C/2.6 GHz,Skylake Family
Intel® Xeon® Gold6230N Processor
20C/2.3 GHz,Cascade Lake Family
Intel® Xeon® Gold6230 Processor
20C/2.1 GHz,Cascade Lake Family
200
180
160
140
120
220 90
85
80
75
70
65
60
55
50
45
100 40
Figure 20: vEPC Performance for Various Intel® Xeon® Processors
The results demonstrated the tested Intel Xeon processors achieved a very high level of determinism across EPC and 5GCN UPF workloads, and this performance scaled linearly with the number of cores as they were scaled up from a few to many cores. The variation in CPU core utilization was also low. This performance consistency can help CoSPs and telecom equipment manufacturers (TEMs) better predict VNF performance and plan for capacity requirements.
4.0 Conclusion
Figure 20 shows throughput performance increased up to 25% from the Intel® Xeon® Gold 6130 processor (from the Skylake family) to the 2nd Generation Intel Xeon Gold 6230N processor (from the Cascade Lake family). Testing of the vEPC pipeline showed the Intel Xeon Gold 6230N processor delivered 25% more performance than the prior generation Intel Xeon Gold 6130 at a 7% lower CPU utilization rate. The Intel Xeon Gold 6230N processor also achieved similar performance as the Intel® Xeon® Platinum 8280 processor (also from the Cascade Lake family) at a slightly lower utilization rate.
15
5.0 Conclusion
This white paper shows 2nd Gen Intel Scalable processors offer around 25% more performance than predecessor generation processors without any additional platform tuning. The Intel reference board exhibited deterministic performance for 4G and 5G pipelines, even load distribution, and consistent throughput and latency. These results further verify the competitiveness and capability of running packet core workloads in software on Intel architecture without any hardware acceleration and associated orchestration.
6.0 Appendix: Platform Details
Use Cases
Workload vEPC, eMBB vWEPC, eMBB vEPC, eMBB vEPC, eMBB vEPC, eMBB vUPF, eMBB vFWA vFWA
CPU
Intel® Xeon® Platinum
8180 Processor
Intel® Xeon® Gold 6132 Processor
Intel® Xeon® Gold 6130 Processor
Intel® Xeon® Gold 6230 Processor
Intel® Xeon® Gold 6230N Processor
Intel Xeon Gold 6230N
Processor
Intel Xeon Gold 6230N
Processor
Intel® Xeon® Platinum
8280 Processor
Frequency 2.5 GHz 2.6 GHz 2.1 GHz 2.1 GHz 2.3 GHz 2.3 GHz 2.3 GHz 2.7 GHz
Cores 28 14 16 20 20 20 20 28
Sockets 2 2 2 2 2 22, 1 under
test2
NICs Intel® Ethernet Converged Network Adapters XXV710, 2x25G ports
Memory 192 GB: 6x16 GB per CPU
BIOS SE5C620.86B.0D.01.0286.011120190816
16
Term Description
CoSP Communication Service Provider
CUPS Control User Plane Separation
DPI Deep Packet Inspection
EPC Evolved Packet Core
FWA Fixed Wireless Access
QoS Quality of Service
NIC Network Interface Controller
RAN Radio Access Network
SP Service Providers
SR-IOV Single Root I/O Virtualization
TEM Telecom Equipment Manufacturers
VM Virtual Machine
7.0 Glossary
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit intel.com/benchmarks.
Performance results are based on testing as of March 2019 and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure.
Intel technologies' features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.
Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. © 2019 Intel Corporation