Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Recall the host config is  VSWITCH_VHOST_CPU_MAP = [4,5,8,11]  and here PROX observed interrupts on NUMA 0 cores 2, 7, 14, 16, 17 (many!) 17 and 18 in the 10-50usec range. Eventually, all cores 1-15 experienced 2 events in the 10-50usec range (when PROX was allowed to run for about 30 minutes after the 60 second test).

The data plane measurements indicate one second with 45 frame losses (see the very bottom of the figure). Message logging on Nodes 4 and 5 indicated nothing during the 60 sec iPerf3 test. Below we see an except of the iPerf3 output from the Server, showing the single second with 45 packet losses.

...

So, we haven't yet seen correlation between PROX interrupt counts and the single loss event, but it would be good to have second by second results from PROX in a flat file, and to confirm that we have the correct cores identified in Node 4.

...


In a follow-up round of testing (Jan 20), we increased the iPerf3 sending rate to 750Mbps, and  found good correlation between:

Losses near the beginning of the test:

[  5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 59043

[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams

[  5]   0.00-1.00   sec  81.9 MBytes   687 Mbits/sec  0.005 ms  918/60210 (1.5%)

[  5]   1.00-2.00   sec  89.5 MBytes   751 Mbits/sec  0.011 ms  0/64798 (0%)

[  5]   2.00-3.00   sec  87.1 MBytes   731 Mbits/sec  0.008 ms  1701/64770 (2.6%)

[  5]   3.00-4.00   sec  88.2 MBytes   740 Mbits/sec  0.008 ms  891/64735 (1.4%)

[  5]   4.00-5.00   sec  88.4 MBytes   741 Mbits/sec  0.007 ms  888/64883 (1.4%)

[  5]   5.00-6.00   sec  89.5 MBytes   751 Mbits/sec  0.008 ms  41/64869 (0.063%)

[  5]   6.00-7.00   sec  89.2 MBytes   748 Mbits/sec  0.006 ms  16/64618 (0.025%)

[  5]   7.00-8.00   sec  88.2 MBytes   740 Mbits/sec  0.007 ms  1427/65283 (2.2%)

[  5]   8.00-9.00   sec  90.5 MBytes   759 Mbits/sec  0.005 ms  0/65554 (0%)

... remainder of the test had 1 second with 3 losses.

Node 4 log entries (much Network Manager activity)

Jan 20 10:40:01 pod12-node4 systemd: Removed slice User Slice of root.

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5382] policy: auto-activating connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5387] policy: auto-activating connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5394] device (ens801f2): Activation: starting connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5397] device (ens801f3): Activation: starting connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5398] device (ens801f2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5404] device (ens801f3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5409] device (ens801f2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')

Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info>  [1548009823.5415] device (ens801f3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')

and PROX interrupts on Core 5 (two observed over the 60 second test)

Image Added


Next steps:

  1. Obtain second-by-second output from PROX, and confirm Core occupation of OVS-Vanilla, etc.
  2. OVS-DPDK with isolcpu rcu_nocbs     
  3. and taskset the iPerf process to qa core with no interrupts.

...