| | | | | | |
|---|
1 | RAS plugin notifications upon collectd start with "McelogLogfileEnabled false" | Collected initial configuration. Set "McelogLogfileEnabled false". Start collectd. Verify notifications dispatched by PCIe plugin. Inject IO error: echo "CPU 0 BANK 1 STATUS 0x8800000000000E0B" | ./mce-inject
| 2. Collectd started. 3. Notification that mcelog is connected to server dispatched. 4. Notification is not dispatched. | | Pass | |
2 | RAS plugin notifications upon collectd start with "McelogLogfileEnabled true" | Collected initial configuration. Verify notifications dispatched by PCIe plugin.
| Collectd started. Notification that mcelog is connected to server dispatched. Other old notifications read from mcelog are dispatched.
| | Fail Internal JIRA Filed | |
3 | RAS plugin dispatches notifications after every collectd restart | Collectd initial configuration. Start collectd. Inject IO error. Restart collectd. Inject IO error (corrected): ./mce-inject io_err # cat io_err CPU 0 BANK 1 STATUS 0x8800000000000E0B
| Collectd started. Notification about IO error is dispatched as notification. Collectd started. Notification about IO error is dispatched as notification.
| | Pass | |
4 | RAS plugin upon mcelog LoadPlugin commented | Comment out mcelog part. Restart collectd. #LoadPlugin mcelog #<Plugin mcelog> # ... #</Plugin> Inject IO error.
| 2. No notification dispatched. | 2. No notification dispatched. | Pass | |
5 | RAS plugin upon mcelog Plugin commented (default) | Comment out mcelog part. Restart collectd. LoadPlugin mcelog #<Plugin mcelog> # ... #</Plugin> Inject IO error.
| 2. Notification is dispatched with correct values for all fields. | Severity:WARNING Time:0.000 Host:silpixa00398942 Plugin:mcelog PluginInstance:BUS Type:gauge TypeInstance:Corrected error DISCLAIMER:Hardware event. This is not a software error. MCEdetails: MCE 0 MCEdetails: CPU 0 BANK 1 MCEdetails: MISC 0 MCEdetails: MCG status: MCEdetails: MCi status: MCEdetails: Corrected error MCEdetails: MCi_MISC register valid MCEdetails: MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout MCEdetails: Running trigger `bus-error-trigger' MCEdetails: IO MCA reported by root port 0:00:00.0 MCEdetails: Running trigger `iomca-error-trigger' MCEdetails: STATUS 8800000000000e0b MCGSTATUS 0 MCEdetails: MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID:CPUID Vendor Intel Family 6 Model 79 GotMachine Check Exception | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". Hardware event. This is not a software error. MCE 0 CPU 0 BANK 1 MISC 0 TIME 1492529725 Tue Apr 18 16:35:25 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout Running trigger `bus-error-trigger' IO MCA reported by root port 0:00:00.0 Running trigger `iomca-error-trigger' STATUS 8800000000000e0b MCGSTATUS 0 MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 79 |
6 | RAS plugin upon mcelog Plugin "McelogLogfile ..." part commented | Comment out mcelog part. Restart collectd. LoadPlugin mcelog <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled true #<McelogLogfile "/var/log/mcelog"> # ... #</McelogLogfile> McelogLogfileEnabled true </Plugin> Inject IO error.
| 2. Notification is dispatched with correct values for all fields. | Same as above. | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". |
7 | RAS plugin upon mcelog Plugin Match part commented | Comment out mcelog part. Restart collectd. LoadPlugin mcelog <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled true <McelogLogfile "/var/log/mcelog"> # ... </McelogLogfile> McelogLogfileEnabled true </Plugin> Inject IO error.
| 2. Notification is dispatched with correct values for all fields. | Same as above. | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". |
8 | RAS plugin upon mcelog Plugin all fields uncommented (same as default configuration) | Uncomment default mcelog part. Restart collectd. LoadPlugin mcelog <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled true <McelogLogfile "/var/log/mcelog"> ... </McelogLogfile> McelogLogfileEnabled true </Plugin> Inject IO error.
| 2. Notification is dispatched with correct values for all fields. | Severity:WARNING Time:1492529930.000 Host:silpixa00398942 Plugin:mcelog PluginInstance:BUS Type:gauge TypeInstance:Corrected error DISCLAIMER:Hardware event. This is not a software error. MCEdetails: MCE 0 MCEdetails: CPU 0 BANK 1 MCEdetails: MISC 0 MCEdetails: MCG status: MCEdetails: MCi status: MCEdetails: Corrected error MCEdetails: MCi_MISC register valid MCEdetails: MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout MCEdetails: Running trigger `bus-error-trigger' MCEdetails: IO MCA reported by root port 0:00:00.0 MCEdetails: Running trigger `iomca-error-trigger' MCEdetails: STATUS 8800000000000e0b MCGSTATUS 0 MCEdetails: MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID:Vendor Intel Family 6 Model 79 GotMachine Check Exception. | Pass | |
9 | RAS plugin upon mcelog Plugin commented/removed Match part with "IsMandatory false" | Comment out mcelog part. Restart collectd. LoadPlugin mcelog <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled true <McelogLogfile "/var/log/mcelog"> <Match> Name "DISCLAIMER" Regex "(Hardware event.*)" Excluderegex "kernel" IsMandatory true </Match> # ... <Match> Name "CPUID" Regex "CPUID (Vendor.*)" Excluderegex "kernel" IsMandatory true </Match> </McelogLogfile> McelogLogfileEnabled true </Plugin> Inject IO error.
| 2. Notification is dispatched with correct values for all fields. | Notification: Severity:FAILURE Time:1492530303.353 Host:silpixa00398942 Plugin:mcelog PluginInstance:other Type:gauge TypeInstance:Uncorrected error DISCLAIMER:Hardware event. This is not a software error. CPUID:Vendor Intel Family 6 Model 79 GotMachine Check Exception. mcelog: Hardware event. This is not a software error. MCE 0 CPU 0 BANK 1 MISC 0 TIME 1492606180 Wed Apr 19 13:49:40 2017 MCG status: MCi status: Corrected error MCi_MISC register valid MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout Running trigger `bus-error-trigger' IO MCA reported by root port 0:00:00.0 Running trigger `iomca-error-trigger' STATUS 8800000000000e0b MCGSTATUS 0 MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 79 | Fail Internal JIRA Filed | SA: Looks like fields in notifications are filtered: "MCEdetails" part is missing. Error type is different: Corrected vs Uncorrected. "PluginInstance" is changed to "other". Time is different: mcelog: "TIME 1492530298 Tue Apr 18 16:44:58 2017"; notification: "1492530303.353 Tue Apr 18 16:45:03 IST 2017"; (attempt#2: 16:50:49 vs Tue Apr 18 16:50:53 IST 2017) |
10 | RAS plugin correctly reads severity of injected IO errors | Collectd initial configuration. Start collectd. Inject corrected IO error. # ./mce-inject io_err # cat io_err CPU 0 BANK 1 STATUS 0x8800000000000E0B Inject uncorrected non fatal IO error. # ./mce-inject io_uncor_err # cat io_uncor_err ?
| 2. Notification is dispatched with severity WARNING for corrected error. 3. Notification is dispatched with severity FAILURE for uncorrected error. | 2. Notification is dispatched with severity WARNING for corrected error. 3. Notification is dispatched with severity FAILURE for uncorrected error??? | 2-Pass | SA: How to inject uncorrected non fatal/fatal? |
11 | RAS plugin upon memory and IO error injection | Collectd initial configuration. Start collectd. Inject corrected IO error. # ./mce-inject io_err # cat io_err CPU 0 BANK 1 STATUS 0x8800000000000E0B Inject corrected memory error.
| 2. Notification is dispatched about IO error once. 3. Notification is dispatched about memory corrected error once. | 2. Notification is dispatched about IO error once. 3. Notification is dispatched about memory corrected error every time interval. | Fail Internal JIRA Filed | |
12 | RAS plugin events received from different mcelog location | Change mcelog file location in mcelog.conf and collectd.conf. Restart mcelog, restart collectd services. Inject IO error.
| Mcelog, collectd are running. Collectd plugins are loaded. Notification is dispatched about IO error.
| | Pass | |
13 | RAS plugin events received from mcelog-client socket upon "McelogClientSocketEnabled false/true" is changed | Change collectd.conf. Restart collectd. LoadPlugin mcelog <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled true <McelogLogfile "/var/log/mcelog"> Name "Host:silpixa00398942" Regex "(Host.*)" Excluderegex "kernel" IsMandatory true </Match> <Match> Name "MCE details" Regex "(.*)" SubmatchIdx 0 Excluderegex "kernel|Hardware event|TIME|CPUID" IsMandatory false </Match> <Match> Name "Gotmemory" Regex "(Gotmemory.*)" Excluderegex "kernel" IsMandatory true </Match> </McelogLogfile> McelogLogfileEnabled false </Plugin> Inject corrected memory error (memory errors is sent over socket). Change "McelogClientSocketEnabled false". Restart collectd. Inject corrected memory error (memory errors is sent over socket).
| Collectd started. Notification about an error is dispatched with parsing. Collectd started. Notification about an error is dispatched is without parsing.
| 2. Notification about an IO error is dispatched. 4. Notification about an error is not dispatched. | Fail Internal JIRA Filed | |
14 | RAS plugin events received from mcelog file upon "McelogClientSocketEnabled false" and "McelogLogfileEnabled true" | Change collectd.conf. Restart collectd. <Plugin mcelog> McelogClientSocket "/var/run/mcelog-client" McelogClientSocketEnabled false <McelogLogfile "/var/log/mcelog"> <Match> Name "DISCLAIMER" Regex "(Hardware event.*)" Excluderegex "kernel" IsMandatory true </Match> <Match> Name "MCE details" Regex "(.*)" SubmatchIdx 0 Excluderegex "kernel|Hardware event|TIME|CPUID" IsMandatory false </Match> <Match> Name "CPUID" Regex "CPUID (Vendor.*)" Excluderegex "kernel" IsMandatory true </Match> </McelogLogfile> McelogLogfileEnabled true</Plugin>
|