...
Next, the Node 4 and 5 configurations above were built (after stopping PROX on Node 4, VSPERF could not install ovs-vanilla with PROX running). Once the Node 4 ovs-vanilla data path was instantiated by VSPERF, we re-started and viewed interrupt activity with PROX again (with no traffic running).
Then, the Node 5 iPerf3 traffic was started (after a date timestamp), and PROX counts were zeroed-out as the traffic began.
Recall the host config is VSWITCH_VHOST_CPU_MAP = [4,5,8,11]
and here PROX observed interrupts on NUMA 0 cores 2, 7, 14, 16, 17 (many!) 17 and 18 in the 10-50usec range. Eventually, all cores 1-15 experienced 2 events in the 10-50usec range (when PROX was allowed to run for about 30 minutes after the 60 second test).
The data plane measurements indicate one second with 45 frame losses (see the very bottom of the figure). Message logging on Nodes 4 and 5 indicated nothing during the 60 sec iPerf3 test. Below we see an except of the iPerf3 output from the Server, showing the single second with 45 packet losses.
...
[ 5] 10.00-11.00 sec 59.6 MBytes 500 Mbits/sec 0.011 ms 0/43128 (0%)
...
[ 5] 13.00-14.00 sec 59.5 MBytes 499 Mbits/sec 0.018 ms 0/43068 (0%)
Next steps:
- OVS-DPDK with isolcpu rcu_nocbs
- and taskset the iPerf process to qa core with no interrupts.
iPerf3 example Output
As an example of the iPerf3 loss counting capability, we have the results of one 60 second test below (where each second would be a trial):
---------So, we haven't yet seen correlation between PROX interrupt counts and the single loss event, but it would be good to have second by second results from PROX in a flat file, and to confirm that we have the correct cores identified in Node 4.
In a follow-up round of testing (Jan 20), we increased the iPerf3 sending rate to 750Mbps, and found good correlation between:
Losses near the beginning of the test:
[ 5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 59043
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 81.9 MBytes 687 Mbits/sec 0.005 ms 918/60210 (1.5%)
[ 5] 1.00-2.00 sec 89.5 MBytes 751 Mbits/sec 0.011 ms 0/64798 (0%)
[ 5] 2.00-3.00 sec 87.1 MBytes 731 Mbits/sec 0.008 ms 1701/64770 (2.6%)
[ 5] 3.00-4.00 sec 88.2 MBytes 740 Mbits/sec 0.008 ms 891/64735 (1.4%)
[ 5] 4.00-5.00 sec 88.4 MBytes 741 Mbits/sec 0.007 ms 888/64883 (1.4%)
[ 5] 5.00-6.00 sec 89.5 MBytes 751 Mbits/sec 0.008 ms 41/64869 (0.063%)
[ 5] 6.00-7.00 sec 89.2 MBytes 748 Mbits/sec 0.006 ms 16/64618 (0.025%)
[ 5] 7.00-8.00 sec 88.2 MBytes 740 Mbits/sec 0.007 ms 1427/65283 (2.2%)
[ 5] 8.00-9.00 sec 90.5 MBytes 759 Mbits/sec 0.005 ms 0/65554 (0%)
... remainder of the test had 1 second with 3 losses.
Node 4 log entries (much Network Manager activity)
Jan 20 10:40:01 pod12-node4 systemd: Removed slice User Slice of root.
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5382] policy: auto-activating connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5387] policy: auto-activating connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5394] device (ens801f2): Activation: starting connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5397] device (ens801f3): Activation: starting connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5398] device (ens801f2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5404] device (ens801f3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5409] device (ens801f2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5415] device (ens801f3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
and PROX interrupts on Core 5 (two observed over the 60 second test)
Next steps:
- Obtain second-by-second output from PROX, and confirm Core occupation of OVS-Vanilla, etc.
- OVS-DPDK with isolcpu rcu_nocbs
- and taskset the iPerf process to qa core with no interrupts.
iPerf3 example Output
As an example of the iPerf3 loss counting capability, we have the results of one 60 second test below (where each second would be a trial):
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
...
[ 5] 57.00-58.00 sec 59.6 MBytes 500 Mbits/sec 0.004 ms 0/43152 43152 (0%)
[ 5] 58.00-59.00 sec 59.5 MBytes 499 Mbits/sec 0.003 ms 0/43055 (0%)
[ 5] 5859.00-5960.00 sec 59.5 7 MBytes 499 501 Mbits/sec 0.003 004 ms 0/43055 43267 (0%)
[ 5] 5960.00-60.00 sec 59.7 MBytes 501 Mbits/sec 0.004 ms 0/43267 (0%).04 sec 0.00 Bytes 0.00 bits/sec 0.004 ms 0/0 (0%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 60 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec 0.004 ms 0273/0 (0%)
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-60.04 sec 0.00 Bytes 0.00 bits/sec 0.004 ms 273/2586968 (0.011%)
This particular test exhibited 6 seconds (Trials) with loss during a 60 second Test, or one loss event every 10 seconds on average. However, many of the Trials with loss occur in a pattern of Loss, no-Loss, Loss over three consecutive Trials. Loss-free intervals were 9, 15, 17, 3, and 8 seconds in length. This testing was conducted at about 70% of the RFC2544 Throughput level.
Related papers:
https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/SPECTS15NAPIoptimization.pdf
https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/NetSys2015.pdf
...
2586968 (0.011%)
This particular test exhibited 6 seconds (Trials) with loss during a 60 second Test, or one loss event every 10 seconds on average. However, many of the Trials with loss occur in a pattern of Loss, no-Loss, Loss over three consecutive Trials. Loss-free intervals were 9, 15, 17, 3, and 8 seconds in length. This testing was conducted at about 70% of the RFC2544 Throughput level.
Related papers:
https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/SPECTS15NAPIoptimization.pdf
https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/NetSys2015.pdf
PROX config:
cd ./samplevnf/VNFs/DPPD-PROX/build
export RTE_SDK='/home/opnfv/dpdk/build'
export RTE_SDK='/home/opnfv/dpdk'
export RTE_TARGET=build
and
sudo ./prox -k -f ../config/irq.cfg
-=-=-=-=-=-
./samplevnf/VNFs/DPPD-PROX/config/irq.cfg
...