Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Loss characteristics are a topic for further investigation, and likely expansion of the ETSI NFV specification TST009, clause 11.

Next steps:

  1. OVS-DPDK with isolcpu rcu_nocbs     
  2. and taskset the iPerf process to qa core with no interrupts.

iPerf3 example Output

As an example of the iPerf3 loss counting capability, we have the results of one 60 second test below (where each second would be a trial):

-----------------------------------------------------------

Server listening on 5201

...

First tests with PROX monitoring

As a first step, PROX was started on Node 4 after reboot, with nothing else running:

Image Added There is fairly regular interrupt activity in the 5-10usec bin for all cores, but nothing in the 10-50usec range after 60 sec.

Next, the Node 4 and 5 configurations above were built (after stopping PROX on Node 4, VSPERF could not install ovs-vanilla with PROX running). Once the Node 4 ovs-vanilla data path was instantiated by VSPERF, we re-started and viewed interrupt activity with PROX again (with no traffic running).

Image Added

Then, the Node 5 iPerf3 traffic was started (after a date timestamp), and PROX counts were zeroed-out as the traffic began. 

Image Added Recall the host config is  VSWITCH_VHOST_CPU_MAP = [4,5,8,11]  and here PROX observed interrupts on NUMA 0 cores 2, 7, 14, 16, (many) 17 and 18 in the 10-50usec range. Eventually, all cores 1-15 experienced 2 events in the 10-50usec range (when PROX was allowed to run for about 30 minutes after the 60 second test).

The data plane measurements indicate one second with 45 frame losses (see the very bottom of the figure). Message logging on Nodes 4 and 5 indicated nothing during the 60 sec iPerf3 test. Below we see an except of the iPerf3 output from the Server, showing the single second with 45 packet losses.

...

[  5]  10.00-11.00  sec  59.6 MBytes   500 Mbits/sec  0.011 ms  0/43128 (0%)

[  5]  11.00-12.00  sec  59.8 MBytes   501 Mbits/sec  0.010 ms  0/43269 (0%)

[  5]  12.00-13.00  sec  59.6 MBytes   500 Mbits/sec  0.008 ms  45/43220 (0.1%)

[  5]  13.00-14.00  sec  59.5 MBytes   499 Mbits/sec  0.018 ms  0/43068 (0%)

So, we haven't yet seen correlation between PROX interrupt counts and the single loss event, but it would be good to have second by second results from PROX in a flat file, and to confirm that we have the correct cores identified in Node 4.


In a follow-up round of testing (Jan 20), we increased the iPerf3 sending rate to 750Mbps, and  found good correlation between:

Losses near the beginning of the test:

[  5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 59043

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams

[  5]   0.00-1.00   sec  81.9 MBytes   687 Mbits/sec  0.005 ms  918/60210 (1.5%)

[  5]   1.00-2.00   sec  89.5 MBytes   751 Mbits/sec  0.011 ms  0/64798 (0%)

[  5]   2.00-3.00   sec  87.1 MBytes   731 Mbits/sec  0.008 ms  1701/64770 (2.6%)

[  5]   3.00-4.00   sec  88.2 MBytes   740 Mbits/sec  0.008 ms  891/64735 (1.4%)

[  5]   4.00-5.00   sec  88.4 MBytes   741 Mbits/sec  0.007 ms  888/64883 (1.4%)

[  5]   5.00-6.00   sec  89.5 MBytes   751 Mbits/sec  0.008 ms  41/64869 (0.063%)

[  5]   6.00-7.00   sec  89.2 MBytes   748 Mbits/sec  0.006 ms  16/64618 (0.025%)

[  5]   7.00-8.00   sec  88.2 MBytes   740 Mbits/sec  0.007 ms  1427/65283 (2.2%)

[  5]   8.00-9.00   sec  90.5 MBytes   759 Mbits/sec  0.005 ms  0/65554 (0%)

... remainder of the test had 1 second with 3 losses.

Node 4 log entries (much Network Manager activity)

Jan 20 10:40:01 pod12-node4 systemd: Removed slice User Slice of root.

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5382] policy: auto-activating connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5387] policy: auto-activating connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5394] device (ens801f2): Activation: starting connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5397] device (ens801f3): Activation: starting connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5398] device (ens801f2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5404] device (ens801f3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5409] device (ens801f2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5415] device (ens801f3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')

and PROX interrupts on Core 5 (two observed over the 60 second test)

Image Added


Next steps:

  1. Obtain second-by-second output from PROX, and confirm Core occupation of OVS-Vanilla, etc.
  2. OVS-DPDK with isolcpu rcu_nocbs     
  3. and taskset the iPerf process to qa core with no interrupts.

iPerf3 example Output

As an example of the iPerf3 loss counting capability, we have the results of one 60 second test below (where each second would be a trial):

-----------------------------------------------------------

Server listening on 5201

--------------------------------------------------------

Accepted connection from 10.10.124.25, port 39851

[  5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 35449

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams-------------------------

Accepted connection from 10.10.124.25, port 39851

[  5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 35449

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams

[  5]   0.00-1.00   sec  55.7 MBytes   467 Mbits/sec  0.004 ms  0/40351 (0%)

[  5]   1.00-2.00   sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43161 (0%)

[  5]   2.00-3.00   sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43152 (0%)

[  5]   03.00-14.00   sec  5559.7 6 MBytes   467 500 Mbits/sec  0.004 005 ms  0/40351 43166 (0%)

[  5]   14.00-25.00   sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43161 43139 (0%)

[  5]   25.00-36.00   sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43152 43181 (0%)

[  5]   36.00-47.00   sec  59.6 MBytes   500 Mbits/sec  0.005 ms  0/43166 43157 (0%)

[  5]   47.00-58.00   sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43139 43172 (0%)

[  5]   58.00-69.00   sec  59.6 MBytes   500 Mbits/sec  0.004 006 ms  046/43181 43211 (0%0.11%)

[  5]   69.00-710.00   00  sec  59.6 MBytes   500 Mbits/sec  0.005 006 ms  0/43157 43128 (0%)

[  5]   7  10.00-811.00   00  sec  59.6 5 MBytes   500 499 Mbits/sec  0.004 005 ms  0124/43172 43196 (0%0.29%)

[  5]   8  11.00-912.00   00  sec  59.6 MBytes   500 Mbits/sec  0.006 004 ms  460/43211 43126 (0.11%0%)

[  5]   9  12.00-1013.00  sec  59.6 7 MBytes   500 501 Mbits/sec  0.006 011 ms  0/43128 43214 (0%)

[  5]  1013.00-1114.00  sec  59.5 MBytes   499 Mbits/sec  0.005 004 ms  1240/43196 43106 (0.29%0%)

[  5]  1114.00-1215.00  sec  59.6 7 MBytes   500 Mbits/sec  0.004 005 ms  032/43126 43237 (0%0.074%)

[  5]  1215.00-1316.00  sec  59.7 5 MBytes   501 499 Mbits/sec  0.011 005 ms  0/43214 43095 (0%)

[  5]  1316.00-1417.00  sec  59.5 7 MBytes   499 501 Mbits/sec  0.004 005 ms  0/43106 43210 (0%)

[  5]  1417.00-1518.00  sec  59.7 6 MBytes   500 Mbits/sec  0.005 006 ms  320/43237 43127 (0.074%0%)

[  5]  1518.00-1619.00  sec  59.5 6 MBytes   499 500 Mbits/sec  0.005 004 ms  0/43095 43189 (0%)

[  5]  1619.00-1720.00  sec  59.7 6 MBytes   501 500 Mbits/sec  0.005 004 ms  0/43210 43140 (0%)

[  5]  1720.00-1821.00  sec  59.6 MBytes   500 Mbits/sec  0.006 005 ms  0/43127 43158 (0%)

[  5]  1821.00-1922.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43189 43142 (0%)

[  5]  1922.00-2023.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43140 43185 (0%)

[  5]  2023.00-2124.00  sec  59.6 MBytes   500 Mbits/sec  0.005 006 ms  0/43158 43173 (0%)

[  5]  2124.00-2225.00  sec  59.6 MBytes   500 Mbits/sec  0.005 006 ms  0/43142 43159 (0%)

[  5]  2225.00-2326.00  sec  59.6 MBytes   500 Mbits/sec  0.004 006 ms  0/43185 43195 (0%)

[  5]  2326.00-2427.00  sec  59.6 MBytes   500 Mbits/sec  0.006 ms  0/43173 43152 (0%)

[  5]  2427.00-2528.00  sec  59.6 MBytes   500 Mbits/sec  0.006 005 ms  0/43159 43155 (0%)

[  5]  2528.00-2629.00  sec  59.6 MBytes   500 Mbits/sec  0.006 004 ms  0/43195 43143 (0%)

[  5]  2629.00-2730.00  sec  59.6 MBytes   500 Mbits/sec  0.006 005 ms  0/43152 43160 (0%)

[  5]  2730.00-2831.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43155 43141 (0%)

[  5]  2831.00-2932.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43143 43172 (0%)

[  5]  2932.00-3033.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  019/43160 43178 (0%0.044%)

[  5]  3033.00-3134.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43141 43166 (0%)

[  5]  3134.00-3235.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  07/43172 43155 (0%0.016%)

[  5]  3235.00-3336.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  190/43178 43171 (0.044%0%)

[  5]  3336.00-3437.00  sec  59.6 MBytes   500 Mbits/sec  0.005 ms  0/43166 43131 (0%)

[  5]  3437.00-3538.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  70/43155 43168 (0.016%0%)

[  5]  3538.00-3639.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43171 43185 (0%)

[  5]  3639.00-3740.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43131 43169 (0%)

[  5]  3740.00-3841.00  sec  59.6 MBytes   500 Mbits/sec  0.005 ms  0/43168 43160 (0%)

[  5]  3841.00-3942.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43185 43148 (0%)

[  5]  3942.00-4043.00  sec  59.6 MBytes   500 Mbits/sec  0.004 005 ms  0/43169 43162 (0%)

[  5]  4043.00-4144.00  sec  59.6 MBytes   500 Mbits/sec  0.005 ms  0/43160 43176 (0%)

[  5]  4144.00-4245.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43148 43178 (0%)

[  5]  4245.00-4346.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43162 43141 (0%)

[  5]  4346.00-4447.00  sec  59.6 MBytes   500 Mbits/sec  0.005 006 ms  0/43176 43156 (0%)

[  5]  4447.00-4548.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43178 43189 (0%)

[  5]  4548.00-4649.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43141 43142 (0%)

[  5]  4649.00-4750.00  sec  59.6 MBytes   500 Mbits/sec  0.006 005 ms  0/43156 43153 (0%)

[  5]  4750.00-4851.00  sec  59.6 5 MBytes   500 499 Mbits/sec  0.004 005 ms  045/43189 43150 (0%0.1%)

[  5]  4851.00-4952.00  sec  59.6 7 MBytes   500 501 Mbits/sec  0.004 005 ms  0/43142 43210 (0%)

[  5]  4952.00-5053.00  sec  59.6 MBytes   500 Mbits/sec  0.005 ms  0/43153 43168 (0%)

[  5]  5053.00-5154.00  sec  59.5 6 MBytes   499 500 Mbits/sec  0.005 004 ms  450/43150 43158 (0.1%0%)

[  5]  5154.00-5255.00  sec  59.7 6 MBytes   501 500 Mbits/sec  0.005 004 ms  0/43210 43160 (0%)

[  5]  5255.00-5356.00  sec  59.6 MBytes   500 Mbits/sec  0.005 004 ms  0/43168 43177 (0%)

[  5]  5356.00-5457.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43158 43150 (0%)

[  5]  5457.00-5558.00  sec  59.6 MBytes   500 Mbits/sec  0.004 ms  0/43160 43152 (0%)

[  5]  5558.00-5659.00  sec  59.6 5 MBytes   500 499 Mbits/sec  0.004 003 ms  0/43177 43055 (0%)

[  5]  5659.00-5760.00  sec  59.6 7 MBytes   500 501 Mbits/sec  0.004 ms  0/43150 43267 (0%)

[  5]  5760.00-5860.00  sec  59.6 MBytes   500 Mbits04  sec  0.00 Bytes  0.00 bits/sec  0.004 ms  0/43152 (0%)0/0 (0%)

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams

[  5]  58   0.00-5960.00  sec  59.5 MBytes   499 Mbits/sec  0.003 ms  0/43055 (0%)[  5]  59.00-60.00  sec  59.7 MBytes   501 Mbits04  sec  0.00 Bytes  0.00 bits/sec  0.004 ms  0/43267 (0%)

[  5]  60.00-60.04  sec  0.00 Bytes  0.00 bits/sec  0.004 ms  0/0 (0%)

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams

[  5]   0.00-60.04  sec  0.00 Bytes  0.00 bits/sec  0.004 ms  273/2586968 (0.011%)

This particular test exhibited 6 seconds (Trials) with loss during a 60 second Test, or one loss event every 10 seconds on average. However, many of the Trials with loss occur in a pattern of Loss, no-Loss, Loss over three consecutive Trials.  Loss-free intervals were 9, 15, 17, 3, and 8 seconds in length.  This testing was conducted at about 70% of the RFC2544 Throughput level.

Related papers:

https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/SPECTS15NAPIoptimization.pdf

https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/NetSys2015.pdf

https://builders.intel.com/docs/networkbuilders/numa_aware_hypervisor_and_impact_on_brocade_vrouter.pdf

...

273/2586968 (0.011%)

This particular test exhibited 6 seconds (Trials) with loss during a 60 second Test, or one loss event every 10 seconds on average. However, many of the Trials with loss occur in a pattern of Loss, no-Loss, Loss over three consecutive Trials.  Loss-free intervals were 9, 15, 17, 3, and 8 seconds in length.  This testing was conducted at about 70% of the RFC2544 Throughput level.

Related papers:

https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/SPECTS15NAPIoptimization.pdf

https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/NetSys2015.pdf

https://builders.intel.com/docs/networkbuilders/numa_aware_hypervisor_and_impact_on_brocade_vrouter.pdf

PROX config:  

cd ./samplevnf/VNFs/DPPD-PROX/build

export RTE_SDK='/home/opnfv/dpdk/build'

export RTE_SDK='/home/opnfv/dpdk'

export RTE_TARGET=build

and

sudo ./prox -k -f ../config/irq.cfg

-=-=-=-=-=-

./samplevnf/VNFs/DPPD-PROX/config/irq.cfg

...