Anuket Project

IPMI plugin HLD

SA Legacy – Equivalence

Requirement

 

1.0

Supports IPMI versions 1.5 and 2.0

 

2.0

IPMI BMC watchdog supported by IPMI events

 

3.0

Support sensor threshold and discrete event processing

 

4.0

MIB support

 

5.0

In-band monitoring

 

 

Overview

A baseboard management controller (BMC) is a specialized service processor that monitors the physical state of a computer, network server or other hardware device using sensors and communicating with the system administrator through an independent connection. The BMC is part of the Intelligent Platform Management Interface (IPMI) and is usually contained in the motherboard or main circuit board of the device to be monitored. The sensors of a BMC measure internal physical variables such as temperature, humidity, power-supply voltage, fan speeds, communications parameters and operating system (OS) functions.

If any of these variables happens to stray outside specified limits, the administrator is notified. That person can then take corrective action through remote control. The monitored device can be power cycled or rebooted as necessary. In this way, a single administrator can remotely manage numerous servers and other devices simultaneously, saving on the overall operating cost of the network and helping to ensure its reliability.

IPMI defines two basic types of sensors. Threshold sensors monitor “analog” things like temperature, voltage, or fan speed. Discrete sensors monitor events or states, like entity presence, software initialization progress, or if external power is applied to the system. Both threshold and discrete sensors may generate events. Sensor events are stored in system event log (SEL). Most entries will display the SEL record id, date of event, time of event, sensor group, sensor name, and the sensor event occurrence. Some timestamps in the SEL may report a date of 1-Jan-1970. This timestamp is not necessarily incorrect. It usually indicates a hardware event that occurred before a timestamp in firmware has been initialized. For example, certain hardware components will have their internal clocks reset during a power cycle.

 

Design

ipmi plugin

The ipmi plugin collects information about sensors provided by BMC:

 

 

Name

Type

Type Instance

Description

Comment

-

Sensor type

Sensor name

Sensor types and sensor names are not defined and are generated for each sensor individually.

Depends to hardware.

 

The ipmi plugin generates notifications when event is received from BMC and exposes as much information as it is provided using notifications:

 

Event

Description

Comment

Threshold SEL event received

Notification is sent when SEL event is received for threshold sensors.

Sensor name, sensor type, event severity, event time, event message, entity sensor belongs to and current value are specified.

Discrete SEL event received

Notification is sent when SEL event is received for discrete  sensors.

Sensor name, sensor type, event severity, event time, event message, entity sensor belongs to and current value are specified.

Sensor added

Notification is sent when sensor is added for collecting information.

Sensor name and sensor type is specified.

Sensor removed

Notification is sent when sensor is removed and collecting information will be stopped.

Sensor name and sensor type is specified.

Sensor not present

Notification is sent when sensors becomes present.

Sensor name and sensor type is specified.

Sensor present

Notification is sent when sensors becomes present.

Sensor name and sensor type is specified.

 

Plugin configuration

The following configuration options should be supported by ipmi collectd plugin:  

Name

Description

Comment

Interval

The interval within which to retrieve information about  sensors in seconds

Interval option is supported by collectd and is defined in <LoadPlugin> block. No additional functionality should be developed in ipmi  plugin to support this option.

Sensor

Selects sensors to collect information.

Depends on ignoreSelected.

IgnoreSelected

If TRUE, selects all sensors except those defined in Sensor. Else all sensors from the list of Sensor will be selected.

If no configuration if given, the ipmi plugin will collect data from all sensors found.

NotifySensorAdd

Enables or disables notification to be send when sensor appears.

If a sensor appears after initialization time of a minute a notification is sent.

NotifySensorRemove

Enables or disables notification to be send when sensor disappears.

If a sensor disappears a notification is sent.

NotifySensorNotPresent

Enables or disables notification to be send when sensor has been (un)plugged.

If you have for example dual power supply and one of them is (un)plugged then a notification is sent.

SELEnabled

Enables or disables subscription for SEL events.

If system event log (SEL) is enabled, plugin will listen for sensor threshold and discrete events. When event is received the notification is sent.

SELClearEvent

Enables or Disables clearing the event after successful handling.

If SEL clear event is enabled, plugin will delete event from SEL list after it is received and successfully handled. In this case other tools that are subscribed for SEL events will receive an empty event.

 

Here is an example of the plugin configuration section of collectd.conf file:

  <Plugin ipmi>

    Sensor "some_sensor"

    Sensor "another_one"

    IgnoreSelected false

    NotifySensorAdd false

    NotifySensorRemove true

    NotifySensorNotPresent false

    SELEnabled false

    SELClearEvent false

   </Plugin>

Implementation details

To enable support of IPMI features OpenIPMI library will be used. OpenIPMI is callback based library and provides API to register handers. The following diagram describes the process of collecting information about the analog sensors.

Collectd IPMI plugin state diagram.

 

Collectd IPMI plugin callback API registration workflow

The diagram above describes the workflow of initialization and handler registration used by ipmi plugin. After OpenIPMI library is initialized, the first step is to setup the SMI (system management interface) connection to create and manage the domain. The domain update handler will be called after successful creation and ipmi plugin will register sensor entity update handlers for sensors. This will cause OpenIPMI driver to scan sensors on the system and determine the sensors belongings to entity. And invoke sensor update handler to update the list of known sensors for ipmi plugin in specified domain. Each sensor has its type and can be either threshold or discrete. IPMI plugin filters and works only with predefined types of sensors described before. If system event log (SEL) feature is enabled, threshold or discrete event handler will be registered. And when event will be received from BMC, event handler will be called and notification will be sent. According to the default OpenIPMI behaviour SEL is scanned for configurable period of time. To receive events asynchronically event handler is being registered for SMI. And when event arrives, ipmi_domain_reread_sels() is called to force domain to reread SELs.

 

Extended list of supported sensors

 

The current collectd IPMI plugin implementation supports the following types of IPMI sensors:

IPMI Sensor type

CollectD sensor value type

Voltage

voltage

Temperature

temperature

Fan

fanspeed

Current

current

 

The plugin supports only analog sensors and selects them (creates a list of monitored sensors) based on their sensor type which looks incorrect. The IPMI sensor may be of supported type but may provide discrete value instead of analog. Thus, the plugin should select sensors based on the value that sensor provide. For this reason, the select logic has been extended (to preserve backward compatibility) to support the following sensors based on their value type:

IPMI Sensor value type

CollectD sensor value type

CFM (Cubic Feet per Minute)

flow

Watts

power

Percentage

percent

 

Considerations

Configuration Considerations

Deployment Considerations

If your platform does not support BMC – this plugin will be unloaded at initialization time.

 

API/GUI/CLI Considerations

Equivalence Considerations

Security Considerations

Alarms, events, statistics considerations

Not all metrics will be reported as not all types of sensors are supported by IPMI plugin.

IPMI plugin registers to listen for all type of sensor events received from System Event Log (SEL).

Redundancy Considerations

Performance Considerations

Testing Consideration

The timing interval requirement needs to be taken into consideration when conducting tests.

The Tests should be carried out on a system underload as well as a relatively idle system.

 

Other Considerations

Impact

The following table outlines possible impact(s) the deployment of this deliverable may have on the current system.

 

Ref

System Impact Description

Recommendation / Comments

1

 

 

Key Assumptions

The following assumptions apply to the scope specified in this document.

 

Ref

Assumption

Status

1

 

 

Key Exclusions

The following exclusions apply to the scope discussed in this document.

 

Ref

Exclusion

Status

1

 

 

Key Dependencies

The following table outlines the key dependencies associated with this deliverable.

 

Ref

Dependency

Status

1

OpenIPMI

 

Issues List

Ref

Issue

Status

1

 

The sample file format is proposed in section 1.1.1.1

2https://sourceforge.net/p/openipmi/bugs/86/

OpenIPMI library supports only up to 1024 opened file descriptors. If it is

assigned bigger file descriptor it causes buffer overflow. On systems

that allow more than 1024 file descriptors (ulimit -n) it is advised to load ipmi

plugin first before other plugins consume the limit.