Memory RAS

Anuket Project

Memory RAS

Metrics List & Descriptions:

Technology/Category

Metric/Feature Name

Date Type

Format Example

Collectd Release

Internal Collectd Version

Collectd Plugin

Description

Dependencies

Limitations

Comments

Memory RAS

Memory corrected errors

Int 

51522

5.8

None

mcelog

Number of Corrected memory errors since the system boot

 

 

gets metrics from mcelog daemon.

Memory RAS

Memory corrected errors in 24 Hours

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since previous 24 hours

 

 

gets metrics from mcelog daemon.

Memory RAS

Memory Uncorrected errors

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since the system boot

 

 

gets metrics from mcelog daemon.

Memory RAS

Memory Uncorrected errors in 24 Hours

Int

51522

5.8

None

mcelog

Number of Corrected memory errors since previous 24 hours

 

 

gets metrics from mcelog daemon.

Memory RAS

Socket

Int

0

5.8

None

mcelog

Socker number error occurred on

 

 

gets metrics from mcelog daemon.

Memory RAS

Channel

Char

0

5.8

None

mcelog

Memory channel each channel represents a DIMM module

 

 

gets metrics from mcelog daemon.

Memory RAS

Memory DIMM

Char

B1

5.8

None

mcelog

Memory DIMM corresponding the memory used by the cores errors occurred on

 

 

gets metrics from mcelog daemon.

Memory RAS

Memory Slot

Char

1

5.8

None

mcelog

Memory slot corresponding the memory used by the cores errors occurred on

 

 

gets metrics from mcelog daemon.

Memory RAS

CPU ID

Int

0

Future

Future

EDAC

CPU ID of the cores errors occurred on. Will be added to new EDAC plugin

 

 

 

Memory RAS

Memory Page

Hex

0x12345

Future

Future

EDAC

Memory page corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Offset

Hex

0x0

Future

Future

EDAC

Memory offset in the page. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Row

Hex

0x12345

 

 

 

 

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Memory Grain

Int

8

Future

Future

EDAC

The byte granularity or the error grain. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error Syndrome

Hex

0x6ce3

Future

Future

EDAC

Memory syndrome corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error Type

Text

 

Future

Future

EDAC

Error type. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Error code

Integer

0101:0090

Future

Future

EDAC

Error code put out by EDAC. Will be added to new EDAC plugin

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

Logging

Log path

 

?

 

EDAC

Configurable logging path

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

dimmX or rankX directory info

Varying

 

Future

Future

EDAC

Expose interface files provided by sysfs through mcX/dimmX or rankX directories

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

csrowX directory info

Varying

 

Future

Future

EDAC

Expose interface files provided by sysfs through mcX/csrowX directories

 

 

Not part of Collectd. Currently available with kernel EDAC logs

Memory RAS

RAS interrupts

Count on each core

[CoreID]:[InterruptCont]

Future

Future

EDAC

Expose the RAS related interrupts on cores of interest via Collectd

 

 

Discussion open to see if this info can be exposed through the plugin.

Sub-sections:

RAS/mcelog Plugin High Level Design 

Memory RAS Plugin Executed Tests 

RAS Other Executed Tests