...
So, we haven't yet seen correlation between PROX interrupt counts and the single loss event, but it would be good to have second by second results from PROX in a flat file, and to confirm that we have the correct cores identified in Node 4.
In a follow-up round of testing (Jan 20), we increased the iPerf3 sending rate to 750Mbps, and found good correlation between:
Losses near the beginning of the test:
[ 5] local 10.10.122.25 port 5201 connected to 10.10.124.25 port 59043
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 5] 0.00-1.00 sec 81.9 MBytes 687 Mbits/sec 0.005 ms 918/60210 (1.5%)
[ 5] 1.00-2.00 sec 89.5 MBytes 751 Mbits/sec 0.011 ms 0/64798 (0%)
[ 5] 2.00-3.00 sec 87.1 MBytes 731 Mbits/sec 0.008 ms 1701/64770 (2.6%)
[ 5] 3.00-4.00 sec 88.2 MBytes 740 Mbits/sec 0.008 ms 891/64735 (1.4%)
[ 5] 4.00-5.00 sec 88.4 MBytes 741 Mbits/sec 0.007 ms 888/64883 (1.4%)
[ 5] 5.00-6.00 sec 89.5 MBytes 751 Mbits/sec 0.008 ms 41/64869 (0.063%)
[ 5] 6.00-7.00 sec 89.2 MBytes 748 Mbits/sec 0.006 ms 16/64618 (0.025%)
[ 5] 7.00-8.00 sec 88.2 MBytes 740 Mbits/sec 0.007 ms 1427/65283 (2.2%)
[ 5] 8.00-9.00 sec 90.5 MBytes 759 Mbits/sec 0.005 ms 0/65554 (0%)
... remainder of the test had 1 second with 3 losses.
Node 4 log entries (much Network Manager activity)
Jan 20 10:40:01 pod12-node4 systemd: Removed slice User Slice of root.
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5382] policy: auto-activating connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5387] policy: auto-activating connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5394] device (ens801f2): Activation: starting connection 'Wired connection 1' (f7b61226-727b-39a3-ba0e-c5eecae22c32)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5397] device (ens801f3): Activation: starting connection 'Wired connection 2' (9fd53d60-d93d-354d-8132-e97e39496541)
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5398] device (ens801f2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5404] device (ens801f3): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5409] device (ens801f2): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jan 20 10:43:43 pod12-node4 NetworkManager[13205]: <info> [1548009823.5415] device (ens801f3): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
and PROX interrupts on Core 5 (two observed over the 60 second test)
Next steps:
- Obtain second-by-second output from PROX, and confirm Core occupation of OVS-Vanilla, etc.
- OVS-DPDK with isolcpu rcu_nocbs
- and taskset the iPerf process to qa core with no interrupts.
...