Anuket Project

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

PMU – Equivalence

Requirement

1.0

Use Linux perf interface to collect data about performance events on a per core basis

 

2.0

Use jevents library (PMU tools)

 

3.0

Report hardware cache events, kernel PMU events, software events, hardware specific events

 

4.0

Should have a configurable interval

 

5.0

Should have configurable hardware specific events list

 

6.0

Provide SNMP support for any collectd values, through an PMU MIB

 

7.0

 

 

 

Overview

Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. Linux perf interface provides rich generalized abstractions over hardware specific capabilities. 

PMU Tools

PMU tools is a collection of tools for profiling and performance analysis on Intel CPUs on top of Linux perf. This uses performance counters in the CPU.  These tools are developed and maintained on https://github.com/andikleen/pmu-tools. In addition to a number of tools for profiling and performance analysis this package provides jevents library.

jevents library

jevents is a C library to use from C programs to make access to the kernel Linux perf interface easier. It also includes some examples to use the library. This library provides the following features:

  • Resolving symbolic event names using downloaded event files
  • Reading performance counters from ring 3 in C programs,
  • Handling the perf ring buffer (for example to read memory addresses)

 

For more information on jevents see https://github.com/andikleen/pmu-tools/tree/master/jevents.

Design

intel_pmu plugin

The intel_pmu plugin collects information provided by Linux perf interface. Using this interface, the intel_pmu plugin should collect the following metrics:

 

Name

Type

Type Instance

Description

Kernel PMU events

cpu-cycles

counter

cpu-cycles

 

instructions

counter

instructions

 

cache-references

counter

cache-references

 

cache-misses

counter

cache-misses

 

Branches

counter

Branches

 

branch-misses

counter

branch-misses

 

bus-cycles

counter

bus-cycles

 

Hardware cache events

L1-dcache-loads

counter

L1-dcache-loads

 

L1-dcache-load-misses

counter

L1-dcache-load-misses

 

L1-dcache-stores

counter

L1-dcache-stores

 

L1-dcache-store-misses

counter

L1-dcache-store-misses

 

L1-dcache-prefetches

counter

L1-dcache-prefetches

 

L1-dcache-prefetch-misses

counter

L1-dcache-prefetch-misses

 

L1-icache-loads

counter

L1-icache-loads

 

L1-icache-load-misses

counter

L1-icache-load-misses

 

L1-icache-prefetches

counter

L1-icache-prefetches

 

L1-icache-prefetch-misses

counter

L1-icache-prefetch-misses

 

LLC-loads

counter

LLC-loads

 

LLC-load-misses

counter

LLC-load-misses

 

LLC-stores

counter

LLC-stores

 

LLC-store-misses

counter

LLC-store-misses

 

LLC-prefetches

counter

LLC-prefetches

 

LLC-prefetch-misses

counter

LLC-prefetch-misses

 

dTLB-loads

counter

dTLB-loads

 

dTLB-load-misses

counter

dTLB-load-misses

 

dTLB-stores

counter

dTLB-stores

 

dTLB-store-misses

counter

dTLB-store-misses

 

dTLB-prefetches

counter

dTLB-prefetches

 

dTLB-prefetch-misses

counter

dTLB-prefetch-misses

 

iTLB-loads

counter

iTLB-loads

 

iTLB-load-misses

counter

iTLB-load-misses

 

branch-loads

counter

branch-loads

 

branch-load-misses

counter

branch-load-misses

 

Software events

cpu-clock

counter

cpu-clock

 

task-clock

counter

task-clock

 

context-switches

counter

context-switches

 

cpu-migrations

counter

cpu-migrations

 

page-faults

counter

page-faults

 

minor-faults

counter

minor-faults

 

major-faults

counter

major-faults

 

alignment-faults

counter

alignment-faults

 

emulation-faults

counter

emulation-faults

 

 

Plugin configuration

The following configuration options should be supported by intel_pmu collectd plugin:  

Name

Description

Comment

Interval

The interval within which to retrieve statistics on monitored events in seconds

Interval option is supported by collectd and is defined in <LoadPlugin> block. No additional functionality should be developed in intel_pmu plugin to support this option.

ReportHardwareCacheEvents

Enable/disable monitoring of hardware cache events

 

ReportKernelPMUEvents

Enable/disable monitoring of kernel PMU events

 

ReportSoftwareEvents

Enable/disable monitoring of software vents

 

EventListPath to hardware events list file for current CPU.File can be downloaded by event_download.py script which is part of pmu-tools package.

HardwareEvents

String containing comma separated list of hardware specific events to monitor

 

Cores

Core groups definition. Monitored metrics are reported only for configured cores. If this option is omitted all available cores are monitored.

If a group is enclosed in square brackets each core is added individually to a separate group (that is statistics are not aggregated).

Allowed formats:
"0,1,2,3"
"0-3"
"[0-3]"

 

Here is an example of the plugin configuration section of collectd.conf file:

  <Plugin intel_pmu>
    ReportHardwareCacheEvents true
    ReportKernelPMUEvents true
    ReportSoftwareEvents true
    EventList "/var/cache/pmu/GenuineIntel-6-55-core.json"
    HWSpecificEvents "L2_RQSTS.CODE_RD_HIT,L2_RQSTS.CODE_RD_MISS" "L2_RQSTS.ALL_CODE_RD"
    Cores ""
  </Plugin>

 Implementation details

 intel_pmu plugin does not introduce its own layer of functionality. It just reads configuration provided by user and prepares all needed parameters/data structures for jevents API. This table shows the correspondence between plugin’s API and jevents API that is used to configure Linux perf monitoring.

 
plugin API
jevents API
Description
pmu_config
 
Parse events groups to monitor provided by user in collectd.conf
pmu_init
resolve_event
Resolve hardware specific events names to perf events (perf_event_attr)
setup_event
Setup perf events for monitoring
pmu_read
read_all_events
Read values of all monitored events
pmu_shutdown
 
 
 

For more details on plugin API see collectd plugin implementation guide https://collectd.org/wiki/index.php/Plugin_architecture.

Hardware Specific Events

In addition to standard groups of events supported by Linux perf (hardware cache, kernel pmu, software) intel_pmu plugin allows to monitor hardware specific events. To support this functionality plugin will use feature provided by jevents library – resolving symbolic event names using downloaded event files. To be able to use hardware specific event names in configuration file user will have to download events list file for current CPU before using intel_pmu plugin. This can be done using event_download.py script which is part of pmu-tools package.

SNMP Support

All metrics collected by intel_pmu plugin should be available through SNMP.  This will be achieved by creating proper configuration for snmp_agent collectd plugin. No additional functionality needed in intel_pmu plugin to support SNMP. See description of SNMP feature for more details on snmp_agent plugin.

Considerations

Configuration Considerations

.

Deployment Considerations

By leveraging the core configuration for the PMU plugin, it’s necessary to taskset and isolate cores for specific applications that you would like to monitor until the process support is implemented.

API/GUI/CLI Considerations

Equivalence Considerations

The SNMP MIB used for this plugin is a newly Defined MIB.

Security Considerations

Alarms, events, statistics considerations

Certain platform generations will not support all the metrics intended to be collected by the plugin. Unsupported metrics will not be reported.

Redundancy Considerations

Performance Considerations

Not part of Telemetry so performance is Not Applicable

Testing Consideration

The timing interval requirement needs to be taken into consideration when conducting tests.

The Tests should be carried out on a system underload as well as a relatively idle system.

Other Considerations

Impact

The following table outlines possible impact(s) the deployment of this deliverable may have on the current system.

 

Ref

System Impact Description

Recommendation / Comments

1

Plugin can easily exceed the default

limit of allowed file descriptors.

  1. Reduce the number of monitored events and/or cores.
  2. Increase the limit on the number of open file descriptors allowed.

Key Assumptions

The following assumptions apply to the scope specified in this document.

 

Ref

Assumption

Status

1

 

 

Key Exclusions

The following exclusions apply to the scope discussed in this document.

 

Ref

Exclusion

Status

1

 

 

Key Dependencies

The following table outlines the key dependencies associated with this deliverable.

 

Ref

Dependency

Status

1

libjevents

 

2

Net-SNMP

 

3

 

 

4

 

 

  • No labels