Introduction:
Cross-NUMA tests as part of OPNFV Plugfest (Gambia) - January 2019..............
- VSPERF-Scenarios: P2P and PVP.
- Workloads: vSwitchd, PMDs and VNF.
- VNF: L2 Forwarding
- vswitch: OVS and VPP.
Testcases Run:
Framesizes: 64, 128, 256, 512, 1024, 1280, 1518
- RFC2544 Throughput Test - NDR.
- Continuous traffic Test - 100%
Testbed:
Node-4 (DUT), Node-5 (Software Traffic Generators) and H/W Traffic Generator.
CPU Topology on DUT
V2V Scenarios
Summary of V2V Scenarios
Scenarios | Possible Core-allocations: | TGen Ports Info |
---|---|---|
1 | PMDs: 4, 5 (0x30) | 2 Virtual Ports 10G |
2 | PMDs: 22, 23 (0xC00000) | 2 Virtual Ports 10G |
3 | PMDs: 4, 22 (0x400010) | 2 Virtual Ports 10G |
P2P Scenarios
Summary of P2P Scenarios:
Scenario | Possible Core-allocations: | DUT Ports, TGen (Hardware) Ports |
---|---|---|
1 | PMDs: 4, 5 (0x30) | DUT: eno5, eno6 TGEN: 5, 6 |
2 | PMDs: 22, 23 (0xC00000) | DUT: eno5, eno6 TGEN: 5, 6 |
3 | PMDs: 4, 22 (0x400010) | DUT: eno5, eno6 TGEN: 5, 6 |
4 | PMDs: 4, 5 (0x30) | DUT: eno5, ens801f2 TGEN: 5, 7 |
5 | PMDs: 22, 23 (0xC00000) | DUT: eno5, ens801f2 TGEN: 5, 7 |
6 | PMDs: 4, 22 (0x400010) | DUT: eno5, ens801f2 TGEN: 5, 7 |
7 | PMDs: 4, 5 (0x30) | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
8 | PMDs: 22, 23 (0xC00000) | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
9 | PMDs: 4, 22 (0x400010) | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
PVP Scenarios
Summary of PVP Scenarios:
Scenario | Possible Core-allocations: Assumptions: Numa-0 (0-21) Numa-1 (22-43) vSwitch Core # : 02 | DUT Ports TGen Ports (Hardware) | |
1 | PMDs: 4, 5, 6, 7 (0xF0) | VNF: 8,9 | DUT: eno5, eno6 TGEN: 5, 6 |
2 | PMDs: 4, 5, 6, 7 (0xF0) | VNF: 22, 23 | DUT: eno5, eno6 TGEN: 5, 6 |
3 | PMDs: 4, 5, 6, 7 (0xF0) | VNF: 8, 22 | DUT: eno5, eno6 TGEN: 5, 6 |
4 | PMDs: 4,5,22,23 (0xC00030) | VNF: 8,9 | DUT: eno5, ens801f2 TGEN: 5, 7 |
5 | PMDs: 4,5, 22, 23 (0xC00030) | VNF: 24, 25 | DUT: eno5, ens801f2 TGEN: 5, 7 |
6 | PMDs: 4, 5, 22, 23 (0xC00030) | VNF: 8, 24 | DUT: eno5, ens801f2 TGEN: 5, 7 |
7 | PMDs: 22, 23, 24, 25 (0x3C00000) | VNF: 26, 27 | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
8 | PMDs: 22, 23, 24, 24 (0x3C00000) | VNF: 4,5 | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
9 | PMDs: 22, 23, 24, 25 (0x3C00000) | VNFs: 4,26 | DUT: ens801f2, ens802f3 TGEN: 7, 8 |
Results: V2V
RFC2544 Throughput Test Results
RFC2544 With Loss Verification Throughput Test Results
Continuous Throughput Test Results
Results: P2P
RFC2544 Throughput Test Results
Continuous Throughput Test Results (Max Received Frame Rate at 100% of Line rate offered load)
Results: PVP
RFC2544 Throughput Test Results
Continuous Throughput Test Results (Max Received Frame Rate at 100% of Line rate offered load)
PVP Latency Results
Inferences
Theme: What is expected, What is unexpected,
V2V:
- Performance differences upto 1024 bytes packets sizes can be seen.
- Single vCPU serving more interfaces is worse than CPU on the other numa serving the interfaces - This pattern is also seen in other (P2P, and PVP) scenarios.
- RFC2544 with Loss-Verification is more consistent across runs, compared with RFC2544 without loss verification.
P2P:
- Only the smaller (64 and 128) packet sizes matter. For packets sizes above 128 the throughput performance remains similar.
- Scenarios 2 and 7 can be seen as the worst case scenarios with both the PMD-cores running on different NUMA than the NIC. As expected, the performance is consistently low for both scenarios-2 and 7.
- Interesting cases are Scenario-3 and Scenario-9. Here a single pmd-core ends up serving both the NICs. This results in poorer performance than Scenario-2 and 7.
- Scenario 1, 6, and 8 can be seen as good cases where each of the NICs are served by single, separate PMD-cores.
- When one NIC is served by pmd-core on the same NUMA, whereas the other NIC is served by pmd-core on a different NUMA - Scenarios 4 and 5 - can be seen as average cases with lower performance than 1, 6 and 8 - but not as low as 3, 9, 2, and 7.
- There is no difference in performance between continuous and RFC2544-throughput traffic tests.
PVP:
Note: In these scenarios, we ensure there is always at least 1 PMD mapped to a NUMA to which a physical NIC is mapped to. That is, we will not encounter the case of Scenario-2 and 7 of the P2P here.
- Continuous traffic results are more consistent across runs compared to RFC2544-throughput test.
- The inconsistency across the runs in RFC2544 cases can be explained by the way the binary-search algorithm works - and, this can be used to argue about the importance of adaptive RFC2544 Binary-search algorithm in virtualized environments.
- Due to cross-numa traffic flow, scenarios 2, 3 and 8, as expected, performs poorer compared to other scenarios.
- When the NICs are mapped to both the NUMAs - with pmd-cores also present - the performance is similar across all movements of VNF cores. The scenarios 4, 5 and 6 represent these cases. However, among these, Scenario-6 is relatively poorer as its cores are split across NUMAs, and the chances are that only one of them would be used effectively.
- Scenarios 1, 7 and 9 are the best cases - with minimal to none cross-numa effects.
Generic:
- X-NUMA instantiation is a very realistic scenario. If we seek more realism, we might add a stressor load to a few of the interesting scenarios. This might enhance the effects of X-NUMA deloyment.
Observations
V2V Scenarios OVS_PMD and interfaces (virtual) mappings
Scenarios | Mappings |
---|---|
Virtual Interfaces | Bridge trex_br |
Scenario-1 | pmd thread numa_id 0 core_id 4: |
Scenario-2 | pmd thread numa_id 1 core_id 22: |
Scenario-3 | pmd thread numa_id 0 core_id 4: |
PVP Scenarios OVS-PMD and Interfaces (physical and virtual) mappings
Scenario | Mappings |
---|---|
1/2/3 | pmd thread numa_id 0 core_id 4: |
4 | pmd thread numa_id 0 core_id 4: pmd thread numa_id 1 core_id 23: |
5 | pmd thread numa_id 0 core_id 4: pmd thread numa_id 1 core_id 22: |
6 | pmd thread numa_id 0 core_id 4: pmd thread numa_id 1 core_id 23: |
7/8/9 | pmd thread numa_id 1 core_id 22: |
P2P Scenarios OVS-PMDs and Physical-Interface Mappings
Scenario | Mappings |
---|---|
1 | pmd thread numa_id 0 core_id 4: |
2 | pmd thread numa_id 1 core_id 22: |
3 | pmd thread numa_id 0 core_id 4: |
4 | pmd thread numa_id 0 core_id 4: |
5 | pmd thread numa_id 1 core_id 22: |
6 | pmd thread numa_id 0 core_id 4: |
7 | pmd thread numa_id 0 core_id 4: |
8 | pmd thread numa_id 1 core_id 22: |
9 | pmd thread numa_id 0 core_id 4: pmd thread numa_id 1 core_id 22: |
Possible Variations
- Increase the Number of CPUs to 4 for the VNF.
- Phy2phy case (no VNF).
- Try different forwarding VNF
- Different Virtual Switch (VPP)
- RxQ Affinity.
Notes on Documentation
- must view log files, qemu threads need to match the intended scenario for VM -
- Christian created qemu command (and documentation) - check this for VM mapping
- SR: CT's command is only the host
- qemu command line -smp 2 should do this - simulates two Numa Nodes - need to see how the VM see it's architecture: numactl -h