Anuket Project
IPMI plugin HLD
SA Legacy – Equivalence
Requirement
1.0 | Supports IPMI versions 1.5 and 2.0 |
|
2.0 | IPMI BMC watchdog supported by IPMI events |
|
3.0 | Support sensor threshold and discrete event processing |
|
4.0 | MIB support |
|
5.0 | In-band monitoring |
|
Overview
A baseboard management controller (BMC) is a specialized service processor that monitors the physical state of a computer, network server or other hardware device using sensors and communicating with the system administrator through an independent connection. The BMC is part of the Intelligent Platform Management Interface (IPMI) and is usually contained in the motherboard or main circuit board of the device to be monitored. The sensors of a BMC measure internal physical variables such as temperature, humidity, power-supply voltage, fan speeds, communications parameters and operating system (OS) functions.
If any of these variables happens to stray outside specified limits, the administrator is notified. That person can then take corrective action through remote control. The monitored device can be power cycled or rebooted as necessary. In this way, a single administrator can remotely manage numerous servers and other devices simultaneously, saving on the overall operating cost of the network and helping to ensure its reliability.
IPMI defines two basic types of sensors. Threshold sensors monitor “analog” things like temperature, voltage, or fan speed. Discrete sensors monitor events or states, like entity presence, software initialization progress, or if external power is applied to the system. Both threshold and discrete sensors may generate events. Sensor events are stored in system event log (SEL). Most entries will display the SEL record id, date of event, time of event, sensor group, sensor name, and the sensor event occurrence. Some timestamps in the SEL may report a date of 1-Jan-1970. This timestamp is not necessarily incorrect. It usually indicates a hardware event that occurred before a timestamp in firmware has been initialized. For example, certain hardware components will have their internal clocks reset during a power cycle.
Design
ipmi plugin
The ipmi plugin collects information about sensors provided by BMC:
Name | Type | Type Instance | Description | Comment |
- | Sensor type | Sensor name | Sensor types and sensor names are not defined and are generated for each sensor individually. | Depends to hardware. |
The ipmi plugin generates notifications when event is received from BMC and exposes as much information as it is provided using notifications:
Event | Description | Comment |
Threshold SEL event received | Notification is sent when SEL event is received for threshold sensors. | Sensor name, sensor type, event severity, event time, event message, entity sensor belongs to and current value are specified. |
Discrete SEL event received | Notification is sent when SEL event is received for discrete sensors. | Sensor name, sensor type, event severity, event time, event message, entity sensor belongs to and current value are specified. |
Sensor added | Notification is sent when sensor is added for collecting information. | Sensor name and sensor type is specified. |
Sensor removed | Notification is sent when sensor is removed and collecting information will be stopped. | Sensor name and sensor type is specified. |
Sensor not present | Notification is sent when sensors becomes present. | Sensor name and sensor type is specified. |
Sensor present | Notification is sent when sensors becomes present. | Sensor name and sensor type is specified. |
Plugin configuration
The following configuration options should be supported by ipmi collectd plugin:
Name | Description | Comment |
Interval | The interval within which to retrieve information about sensors in seconds | Interval option is supported by collectd and is defined in <LoadPlugin> block. No additional functionality should be developed in ipmi plugin to support this option. |
Sensor | Selects sensors to collect information. | Depends on ignoreSelected. |
IgnoreSelected | If TRUE, selects all sensors except those defined in Sensor. Else all sensors from the list of Sensor will be selected. | If no configuration if given, the ipmi plugin will collect data from all sensors found. |
NotifySensorAdd | Enables or disables notification to be send when sensor appears. | If a sensor appears after initialization time of a minute a notification is sent. |
NotifySensorRemove | Enables or disables notification to be send when sensor disappears. | If a sensor disappears a notification is sent. |
NotifySensorNotPresent | Enables or disables notification to be send when sensor has been (un)plugged. | If you have for example dual power supply and one of them is (un)plugged then a notification is sent. |
SELEnabled | Enables or disables subscription for SEL events. | If system event log (SEL) is enabled, plugin will listen for sensor threshold and discrete events. When event is received the notification is sent. |
SELClearEvent | Enables or Disables clearing the event after successful handling. | If SEL clear event is enabled, plugin will delete event from SEL list after it is received and successfully handled. In this case other tools that are subscribed for SEL events will receive an empty event. |
Here is an example of the plugin configuration section of collectd.conf file:
<Plugin ipmi>
Sensor "some_sensor"
Sensor "another_one"
IgnoreSelected false
NotifySensorAdd false
NotifySensorRemove true
NotifySensorNotPresent false
SELEnabled false
SELClearEvent false
</Plugin>
Implementation details
To enable support of IPMI features OpenIPMI library will be used. OpenIPMI is callback based library and provides API to register handers. The following diagram describes the process of collecting information about the analog sensors.
Collectd IPMI plugin state diagram.
Collectd IPMI plugin callback API registration workflow
The diagram above describes the workflow of initialization and handler registration used by ipmi plugin. After OpenIPMI library is initialized, the first step is to setup the SMI (system management interface) connection to create and manage the domain. The domain update handler will be called after successful creation and ipmi plugin will register sensor entity update handlers for sensors. This will cause OpenIPMI driver to scan sensors on the system and determine the sensors belongings to entity. And invoke sensor update handler to update the list of known sensors for ipmi plugin in specified domain. Each sensor has its type and can be either threshold or discrete. IPMI plugin filters and works only with predefined types of sensors described before. If system event log (SEL) feature is enabled, threshold or discrete event handler will be registered. And when event will be received from BMC, event handler will be called and notification will be sent. According to the default OpenIPMI behaviour SEL is scanned for configurable period of time. To receive events asynchronically event handler is being registered for SMI. And when event arrives, ipmi_domain_reread_sels() is called to force domain to reread SELs.
Extended list of supported sensors
The current collectd IPMI plugin implementation supports the following types of IPMI sensors:
IPMI Sensor type | CollectD sensor value type |
Voltage | voltage |
Temperature | temperature |
Fan | fanspeed |
Current | current |
The plugin supports only analog sensors and selects them (creates a list of monitored sensors) based on their sensor type which looks incorrect. The IPMI sensor may be of supported type but may provide discrete value instead of analog. Thus, the plugin should select sensors based on the value that sensor provide. For this reason, the select logic has been extended (to preserve backward compatibility) to support the following sensors based on their value type:
IPMI Sensor value type | CollectD sensor value type |
CFM (Cubic Feet per Minute) | flow |
Watts | power |
Percentage | percent |
Considerations
Configuration Considerations
Deployment Considerations
If your platform does not support BMC – this plugin will be unloaded at initialization time.
API/GUI/CLI Considerations
Equivalence Considerations
Security Considerations
Alarms, events, statistics considerations
Not all metrics will be reported as not all types of sensors are supported by IPMI plugin.
IPMI plugin registers to listen for all type of sensor events received from System Event Log (SEL).
Redundancy Considerations
Performance Considerations
Testing Consideration
The timing interval requirement needs to be taken into consideration when conducting tests.
The Tests should be carried out on a system underload as well as a relatively idle system.
Other Considerations
Impact
The following table outlines possible impact(s) the deployment of this deliverable may have on the current system.
Ref | System Impact Description | Recommendation / Comments |
1 |
|
|
Key Assumptions
The following assumptions apply to the scope specified in this document.
Ref | Assumption | Status |
1 |
|
|
Key Exclusions
The following exclusions apply to the scope discussed in this document.
Ref | Exclusion | Status |
1 |
|
|
Key Dependencies
The following table outlines the key dependencies associated with this deliverable.
Ref | Dependency | Status |
1 | OpenIPMI |
|
Issues List
Ref | Issue | Status |
1 |
| The sample file format is proposed in section 1.1.1.1 |
2 | https://sourceforge.net/p/openipmi/bugs/86/ | OpenIPMI library supports only up to 1024 opened file descriptors. If it is assigned bigger file descriptor it causes buffer overflow. On systems that allow more than 1024 file descriptors (ulimit -n) it is advised to load ipmi plugin first before other plugins consume the limit. |