Memory RAS

Anuket Project

Memory RAS

Metrics List & Descriptions:

Technology/Category

Metric/Feature Name

Date Type

Format Example

Collectd Release

Internal Collectd Version

Collectd Plugin

Description

Dependencies

Limitations

Comments

Memory RAS

Memory corrected errors

Int 

51522

5.8

None

mcelog

Number of Corrected memory errors since the system boot





gets metrics from mcelog daemon.

Memory RAS

Memory corrected errors in 24 Hours

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since previous 24 hours





gets metrics from mcelog daemon.

Memory RAS

Memory Uncorrected errors

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since the system boot





gets metrics from mcelog daemon.

Memory RAS

Memory Uncorrected errors in 24 Hours

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since previous 24 hours





gets metrics from mcelog daemon.

Memory RAS

Socket

Int

0

5.8

None

mcelog

Socker number error occurred on





gets metrics from mcelog daemon.

Memory RAS

Channel

Char

0

5.8

None

mcelog

Memory channel each channel represents a DIMM module





gets metrics from mcelog daemon.

Memory RAS

Memory DIMM

Char

B1

5.8

None

mcelog

Memory DIMM corresponding the memory used by the cores errors occurred on





gets metrics from mcelog daemon.

Memory RAS

Memory Slot

Char

1

5.8

None

mcelog

Memory slot corresponding the memory used by the cores errors occurred on





gets metrics from mcelog daemon.

Memory RAS

CPU ID

Int

0

Future

Future

EDAC

CPU ID of the cores errors occurred on. Will be added to new EDAC plugin







Memory RAS

Memory Page

Hex

0x12345

Future

Future

EDAC

Memory page corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Offset

Hex

0x0

Future

Future

EDAC

Memory offset in the page. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Row

Hex

0x12345













Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Grain

Int

8

Future

Future

EDAC

The byte granularity or the error grain. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error Syndrome

Hex

0x6ce3

Future

Future

EDAC

Memory syndrome corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error Type

Text



Future

Future

EDAC

Error type. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error code

Integer

0101:0090

Future

Future

EDAC

Error code put out by EDAC. Will be added to new EDAC plugin





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Logging

Log path



?



EDAC

Configurable logging path





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

dimmX or rankX directory info

Varying



Future

Future

EDAC

Expose interface files provided by sysfs through mcX/dimmX or rankX directories





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

csrowX directory info

Varying



Future

Future

EDAC

Expose interface files provided by sysfs through mcX/csrowX directories





Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

RAS interrupts

Count on each core

[CoreID]:[InterruptCont]

Future

Future

EDAC

Expose the RAS related interrupts on cores of interest via Collectd





Discussion open to see if this info can be exposed through the plugin.

Sub-sections:

RAS/mcelog Plugin High Level Design 

Memory RAS Plugin Executed Tests 

RAS Other Executed Tests