1 | PCIe plugin notifications upon collectd start | - Collected initial configuration. Start collectd.
- Verify notifications dispatched by PCIe plugin.
| - Collectd started.
- Basic/AER PCIe errors dispatched after collectd start for all related (network NIC's) PCIe devices.
| Pass | SA: No clear limits between messages in collectd/exec plugin notifications for exec plugin. |
2 | PCIe plugin dispatches notifications after every collectd restart | - Collectd initial configuration. Start collectd.
- Inject PCIe error.
- Clear PCIe error.
- Restart collectd.
- Repeat steps #2-#4 multiple times.
| 2. Notification about PCIe error is dispatched as set. 3. Notification about PCIe error is dispatched as cleared. 4. Collectd is started, no warnings/errors in syslog. PCIe errors dispatched after collectd start for all related (network NIC's) PCIe devices. | Pass | |
3 | PCIe plugin upon raised notification and collectd restart | - Collectd initial configuration. Start collectd.
- Inject PCIe error.
- Restart collectd.
- Clear PCIe error.
- Repeat steps #2-#4 multiple times.
| 2. Notification about PCIe error is dispatched as set. 3. Collectd is started, no warnings/errors in syslog. Notification about PCIe error is dispatched as set. 4. Notification about PCIe error is dispatched as cleared. | Pass | |
4 | PCIe plugin results upon collectd recovery (kill collectd process) | - Collectd initial configation. Start collectd.
- Inject PCIe error.
- Clear PCIe error.
- Kill collectd.
- Start collectd if collectd not autostarted.
- Repeat steps#2-#4 at least one more time.
| 2. Notification about basic/AER PCIe error is dispatched as set. 3. Notification about basic/AER PCIe error is dispatched as cleared. 4. / 5. Collectd is killed and started. Note: if collectd is configured as service it may be auto started. | Pass | |
5 | PCIe plugin upon commented whole "Plugin pcie_errors" section | - Comment whole pcie_errors section including <Plugin pcie_errors>, </Plugin> lines. Start collectd.
- Inject basic PCIe error.
| - Collectd started.
PCIE plugin loaded with default parameters. - Notification about basic/AER PCIe error is dispatched.
| Fail Internal JIRA filed | SA: Readout's summary - default non native OS support. So, default configuration should be available (Ubuntu is target OS). |
6 | PCIe plugin upon commented content of "Plugin pcie_errors" (<Plugin pcie_errors>...</Plugin>) | - Comment content of "Plugin pcie_errors", like:
<Plugin pcie_errors> # ... </Plugin> - Inject basic PCIe error.
| - Collectd started. PCIE plugin loaded with default parameters.
- Notification about basic/AER PCIe error is dispatched.
| Fail Internal JIRA filed | SA: Readout's summary - default non native OS support. So, default configuration should be available (Ubuntu is target OS). |
7 | PCIe plugin upon 'Source' parameter changed | - Configure collectd:
<Plugin pcie_errors> Source "sysfs" AccessDir "/sys/bus/pci" </Plugin> - Inject basic/AER PCIe error.
- Clear injected PCIe error.
- Configure collectd, restart collectd:
<Plugin pcie_errors> Source "proc" AccessDir "/proc/bus/pci" </Plugin> - Inject basic/AER PCIe error.
- Clear injected PCIe error.
- Configure collectd, restart collectd:
<Plugin pcie_errors> Source "off" </Plugin> - Inject basic/AER PCIe error.
- Clear injected PCIe error.
| - Collectd started.
- Notification about basic/AER PCIe error is dispatched as set.
- Notification about basic/AER PCIe error is dispatched as cleared.
- Collectd started.
- Notification about basic/AER PCIe error is dispatched as set.
- Notification about basic/AER PCIe error is dispatched as cleared.
- Collectd started, plugin not loaded.
- No notification about basic/AER PCIe error is dispatched.
| Pass | |
8 | PCIe plugin upon incorrect device location given ('AccessDir') | - Collectd initial configuration. Start collectd.
- Inject basic/AER PCIe error.
- Change location to existing path to invalid ("/sys/bus", "/proc/cpuinfo"), restart collectd.
- Inject basic/AER PCIe error.
- Clear injected PCIe error.
- Change location to non existing path, restart collectd.
- Inject basic/AER PCIe error.
- Clear injected PCIe error.
| - Collectd started.
- Notification about basic/AER PCIe error is dispatched.
- Collectd started, pcie_errors plugin loaded. Warning about PCIe devices read failure logged.
- No notification about basic/AER PCIe error is dispatched.
- PCIe error is cleared (lspci -s 05:00.0 -vv | grep -e "DevSta:" -A5 -e "Capabilities: \[100").
- Collectd started, plugin not loaded.
- No notification about basic/AER PCIe error is dispatched.
| Fail Internal JIRA filed | |
9 | PCIe plugin errors upon 'ReportMasked' parameter changed | - Collectd initial configuration. Start collectd.
- Inject AER PCIe error:
# cat ue_unsupreq AER PCI_ID 05:00.0 UNCOR_STATUS 0x00100000 # ./aer-inject ue_unsupreq - Clear PCIe error (setpci -s 05:00.0 0x104.l=00100000).
- Set 'ReportMasked true' in collectd.conf. Restart collectd.
- Inject ue_unsupreq AER PCIe error.
- Clear PCIe error.
| - Collectd started.
- Notification about AER PCIe error is not dispatched. Status is changed:
# setpci -s 05:00.0 0x104.l 00100000 UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol- - Notification about AER PCIe error is dispatched as cleared (00000000, UnsupReq-).
- Collectd started.
- Notification about AER PCIe error is dispatched as set.
- Notification about AER PCIe error is dispatched as cleared.
| Pass | |
10 | PCIe plugin dispatches events according to 'PersistentNotifications' parameter | - Collectd initial configuration. Start collectd.
- Inject basic/AER PCIe error.
- Clear injected errors.
- Set 'PersistentNotifications true' in collectd.conf. Restart collectd.
- Inject basic/AER PCIe error.
- Clear injected errors.
| - Collectd started.
- Notification about PCIe error 'set' dispatched only once when error injected.
- Notification about PCIe error 'cleared' is dispatched only once when error cleared.
- Collectd started.
- Notification about PCIe error 'set' is dispatched every time interval after error was injected.
- Notification about PCIe error 'cleared' is dispatched every time interval after error was injected.
| Pass | |
11 | PCIe plugin dispatches events every interval when 'PersistentNotifications true' | - Set 'PersistentNotifications true' in collectd.conf. Change time interval to 1 second. Restart collectd.
- Inject basic/AER PCIe error.
- Clear injected errors.
- Repeat test with time interval in range 2-60 seconds.
| - Collectd started.
- Notification about PCIe error 'set' is dispatched every time interval after error was injected.
- Notification about PCIe error 'cleared' is dispatched every time interval after error was injected.
| Pass | |
12 | PCIE plugin upon 'ReadLog' parameter changed | - Set 'ReadLog true'. Restart collectd.
- Inject basic/AER PCIe error.
- Set 'ReadLog false'. Restart collectd.
- Clear/inject PCIe errors.
| - Collectd started.
- No notification about PCIe error dispatched as syslog is parsed.
- After collectd restart notifications appeared about PCIe errors injected.
- Notification about PCIe error dispatched once injected/cleared.
| Pass | |
13 | PCIE plugin upon 'ReadLog true' parameter upon collectd start and error injected | - Set 'ReadLog true' in collectd.conf. Start collectd.
- Inject a PCIe error.
- Restart colectd.
- Clear injected error.
| - Collectd started.
- PCIe error is raised (lspci/setpci) but not dispatched as syslog is parsed.
- No notifications after collectd started as non native OS support configured with syslog read option.
Observed: Upon start up all errors are read from default configuration as well as injected prior PCIe error, but syslog doesn't contain any PCIe error. - Error is cleared, notification is not dispatched.
| Fail Internal JIRA filed | |
14 | PCIe plugin AER Corrected errors notifications | - Collectd initial configuration. Start collectd.
- Inject AER PCIe error:
# cat ce_badtlp AER PCI_ID 05:00.0 COR_STATUS 0x00000040 # ./aer-inject ce_badtlp - Clear injected errors:
setpci -s 05:00.0 0x110.l=0x40 - Repeat test with other PCIe correctable errors: Receiver Error Status (0x00000001), Bad DLLP Status (0x00000080), REPLAY_NUM Rollover (0x00000100), etc.
| - Collectd started.
- Verify register status by setpci -s 05:00.0 0x110.l:
00000040 Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": CESta: RxErr- BadTLP+ BadDLLP- Rollover- Timeout- NonFatalErr- Notification with correct timestamp, title, severity, address is dispatched as set: Severity:WARNING Time:1491226583.117 Host:silpixa00398942 Plugin:pcie_errors PluginInstance:0000:05:00.0 Type:pcie_error TypeInstance:correctable CorrectableError set: Bad TLP Status - Verify register status by setpci -s 05:00.0 0x110.l:
00000000 Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- Notification with correct timestamp, title, severity, address is dispatched as cleared.
| Pass | |
15 | PCIe plugin AER Uncorrected non fatal errors notifications | - Collectd initial configuration. Start collectd.
- Inject AER PCIe uncorrectable fatal error:
# cat ue_acsviol AER PCI_ID 05:00.0 UNCOR_STATUS 0x00200000 # ./aer-inject ue_acsviol - Clear injected errors: setpci -s 05:00.0 0x104.l=0x00200000
| - Collectd started.
- Verify register status by setpci -s 05:00.0 0x104.l:
00200000 Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol+ Notification with correct timestamp, title, severity, address is dispatched as set: Severity:WARNING Time:1491229811.884 Host:silpixa00398942 Plugin:pcie_errors PluginInstance:0000:05:00.0 Type:pcie_error TypeInstance:non_fatal Uncorrectable(non_fatal)Error set: ACS Violation - Verify register status by setpci -s 05:00.0 0x104.l: 00000000
Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- Notification with correct timestamp, title, severity, address is dispatched as cleared.
| Pass | |
16 | PCIe plugin AER Uncorrected fatal error notifications | - Collectd initial configuration. Start collectd.
- Inject AER PCIe uncorrectable fatal error:
# cat ue_dlp AER PCI_ID 05:00.0 UNCOR_STATUS 0x10 # ./aer-inject ue_dlp - Clear injected errors:
setpci -s 05:00.0 0x104.l=0x10 - Repeat test with other PCIe Uncorrectable non fatal errors: Surprise Down (0x00000020), Poisoned TLP (0x00001000), Flow Control Protocol (0x00002000), etc.
| - Collectd started.
- Verify register status by setpci -s 05:00.0 0x104.l:
00000010 Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": UESta: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- Notification with correct timestamp, title, severity, address is dispatched as set: Severity:FAILURE Time:1491227475.743 Host:silpixa00398942 Plugin:pcie_errors PluginInstance:0000:05:00.0 Type:pcie_error TypeInstance:fatal Uncorrectable(fatal)Error set: Data Link Protocol - Verify register status by setpci -s 05:00.0 0x104.l:
00000000 Verify status by lspci -s 05:00.0 -vv | grep -A9 "Capabilities: \[10": UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- Notification with correct timestamp, title, severity, address is dispatched as cleared.
| Pass | |
17 | PCIe plugin basic errors notification | - Collectd initial configuration. Start collectd.
- Check PCIe devices for active errors (lspci -s 81:00.0 -vv | grep "DevSta").
- Get errors and try to clear, e.g.:
Get: setpci -s 81:00.0 0xAA.w Clear: setpci -s 81:00.0 0xAA.w=<read_value>. - Restart collectd.
| - Collectd started.
- Found similar to:
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- - Notification with correct timestamp, title, severity, address is dispatched as cleared.
- PCIe error cleared in previous step is not notified after collectd restart.
| Pass | |