Anuket Project
RAS Other Executed Tests
est Environment details:
- Bare Metal, Ubuntu 16.04.2 LTS
Repo/branch used:
- collectd/feat_ras_with_msgparser
Tests precondition:
- Mcelog installed.
- mce-inject tool installed.
- Collectd installed.
- Exec/python collectd plugin configured.
RAS Other
Collectd configuration (default):
LoadPlugin mcelog
#<Plugin mcelog>
# McelogClientSocket "/var/run/mcelog-client"
# McelogClientSocketEnabled true
# <McelogLogfile "/var/log/mcelog">
# <Match>
# Name "DISCLAIMER"
# Regex "(Hardware event.*)"
# Excluderegex "kernel"
# IsMandatory true
# </Match>
# <Match>
# Name "MCE details"
# Regex "(.*)"
# SubmatchIdx 0
# Excluderegex "kernel|Hardware event|TIME|CPUID"
# IsMandatory false
# </Match>
# <Match>
# Name "ORIGIN"
# Regex "MCA: (.*)[ _][Ee][Rr]{2}"
# SubmatchIdx 1
# Excluderegex "kernel|Hardware event|TIME|CPUID|No Error"
# IsMandatory false
# </Match>
# <Match>
# Name "TIME"
# Regex "TIME ([0-9]*)"
# Excluderegex "kernel"
# IsMandatory false
# </Match>
# <Match>
# Name "CPUID"
# Regex "CPUID (Vendor.*)"
# Excluderegex "kernel"
# IsMandatory true
# </Match>
# </McelogLogfile>
# McelogLogfileEnabled true
#</Plugin>
Table#1: RAS IO test cases
# | Test Summary | Steps | Expected | Observed | Status | Comments |
---|---|---|---|---|---|---|
1 | RAS plugin notifications upon collectd start with "McelogLogfileEnabled false" |
| 2. Collectd started. 3. Notification that mcelog is connected to server dispatched. 4. Notification is not dispatched. | Pass | ||
2 | RAS plugin notifications upon collectd start with "McelogLogfileEnabled true" |
|
| Fail Internal JIRA Filed |
| |
3 | RAS plugin dispatches notifications after every collectd restart |
|
| Pass | ||
4 | RAS plugin upon mcelog LoadPlugin commented |
| 2. No notification dispatched. | 2. No notification dispatched. | Pass | |
5 | RAS plugin upon mcelog Plugin commented (default) |
| 2. Notification is dispatched with correct values for all fields. | Severity:WARNING Time:0.000 Host:silpixa00398942 Plugin:mcelog PluginInstance:BUS Type:gauge TypeInstance:Corrected error DISCLAIMER:Hardware event. This is not a software error. MCEdetails: MCE 0 MCEdetails: CPU 0 BANK 1 MCEdetails: MISC 0 MCEdetails: MCG status: MCEdetails: MCi status: MCEdetails: Corrected error MCEdetails: MCi_MISC register valid MCEdetails: MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout MCEdetails: Running trigger `bus-error-trigger' MCEdetails: IO MCA reported by root port 0:00:00.0 MCEdetails: Running trigger `iomca-error-trigger' MCEdetails: STATUS 8800000000000e0b MCGSTATUS 0 MCEdetails: MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID:CPUID Vendor Intel Family 6 Model 79 GotMachine Check Exception | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". Hardware event. This is not a software error. |
6 | RAS plugin upon mcelog Plugin "McelogLogfile ..." part commented |
| 2. Notification is dispatched with correct values for all fields. | Same as above. | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". |
7 | RAS plugin upon mcelog Plugin Match part commented |
| 2. Notification is dispatched with correct values for all fields. | Same as above. | Fail Internal JIRA Filed | SA: Time is incorrect in notification but valid in "/var/log/mcelog". |
8 | RAS plugin upon mcelog Plugin all fields uncommented (same as default configuration) |
| 2. Notification is dispatched with correct values for all fields. | Severity:WARNING Time:1492529930.000 Host:silpixa00398942 Plugin:mcelog PluginInstance:BUS Type:gauge TypeInstance:Corrected error DISCLAIMER:Hardware event. This is not a software error. MCEdetails: MCE 0 MCEdetails: CPU 0 BANK 1 MCEdetails: MISC 0 MCEdetails: MCG status: MCEdetails: MCi status: MCEdetails: Corrected error MCEdetails: MCi_MISC register valid MCEdetails: MCA: BUS error: 0 0 Level-3 Generic Generic IO Request-did-not-timeout MCEdetails: Running trigger `bus-error-trigger' MCEdetails: IO MCA reported by root port 0:00:00.0 MCEdetails: Running trigger `iomca-error-trigger' MCEdetails: STATUS 8800000000000e0b MCGSTATUS 0 MCEdetails: MCGCAP 7000c16 APICID 0 SOCKETID 0 CPUID:Vendor Intel Family 6 Model 79 GotMachine Check Exception. | Pass | |
9 | RAS plugin upon mcelog Plugin commented/removed Match part with "IsMandatory false" |
| 2. Notification is dispatched with correct values for all fields. | Notification: Severity:FAILURE mcelog: Hardware event. This is not a software error. | Fail Internal JIRA Filed
| SA: Looks like fields in notifications are filtered: "MCEdetails" part is missing. Error type is different: Corrected vs Uncorrected. "PluginInstance" is changed to "other". Time is different: mcelog: "TIME 1492530298 Tue Apr 18 16:44:58 2017"; notification: "1492530303.353 Tue Apr 18 16:45:03 IST 2017"; (attempt#2: 16:50:49 vs Tue Apr 18 16:50:53 IST 2017) |
10 | RAS plugin correctly reads severity of injected IO errors |
| 2. Notification is dispatched with severity WARNING for corrected error. 3. Notification is dispatched with severity FAILURE for uncorrected error. | 2. Notification is dispatched with severity WARNING for corrected error. 3. Notification is dispatched with severity FAILURE for uncorrected error??? | 2-Pass | SA: How to inject uncorrected non fatal/fatal? |
11 | RAS plugin upon memory and IO error injection |
| 2. Notification is dispatched about IO error once. 3. Notification is dispatched about memory corrected error once. | 2. Notification is dispatched about IO error once. 3. Notification is dispatched about memory corrected error every time interval. | Fail Internal JIRA Filed | |
12 | RAS plugin events received from different mcelog location |
|
| Pass | ||
13 | RAS plugin events received from mcelog-client socket upon "McelogClientSocketEnabled false/true" is changed |
|
| 2. Notification about an IO error is dispatched.
4. Notification about an error is not dispatched. | Fail Internal JIRA Filed
| |
14 | RAS plugin events received from mcelog file upon "McelogClientSocketEnabled false" and "McelogLogfileEnabled true" |
| 2. Notification about an error is dispatched (read from mcelog file). | Pass | ||
15 | RAS plugin events time detection for error received from mcelog-client socket |
| 2. Notification is dispatched up to 50 ms. | 2. Notification is dispatched within 33 ms | Pass
| |
16 | RAS plugin events time detection for error received from mcelog file |
| 2. Notification is dispatched up to 50 ms. | 2. Notification is dispatched within 30 ms | Pass | |
17 | RAS plugin upon tags configuration failures |
| Collectd not started. In all cases Error is recorded in syslog with messages like "Parse error in file ..." | Pass | ||
18 | RAS plugin upon invalid path for mcelog file and socket |
| Collectd started.
| Pass |
Table#2: RAS QPI test cases
CPU 0 BANK 2 STATUS 0x8800000000000E0F
Table#3: RAS CPU test cases
CPU 0 BANK 1 STATUS CORRECTED PCC
Table#4: RAS System test cases
Under question how to inject any of System error.