Page Comparison

Table of Contents

Distinction between metrics and events

For the purposes of Platform Service Assurance, it's important to distinguish between metrics and

Table of Contents

Distinction between metrics and events

For the purposes of Platform Service Assurance, it's important to distinguish between metrics and events as well as how they are measured (from a timing perspective).

...

Statistics in collectd consist of a value list. A value list includes:

Value list		Example	comment
Values		99.8999	percentage
Value length	the number of values in the data set.
Time	timestamp at which the value was collected.	1475837857	epoch
Interval	interval at which to expect a new value.	10	interval
Host	used to identify the host.	localhost	can be uuid for vm or host… or can give host a name
Plugin	used to identify the plugin.	cpu
Plugin instance (optional)	used to group a set of values together. For e.g. values belonging to a DPDK interface.	0
Type	unit used to measure a value. In other words used to refer to a data set.	percent
Type instance (optional)	used to distinguish between values that have an identical type.	user
meta data	an opaque data structure that enables the passing of additional information about a value list. “Meta data in the global cache can be used to store arbitrary information about an identifier”

Notifications

Notifications in collectd are generic messages containing:

...

Libvirt mbmt instructions cachereferencescachemissesthe cache misses by applications running on the platformLocal Memory Bandwidth in Bytes per seconderrors
ifrxdropped
ifrxoctetsif_packets
size65 rx_xoffiftxoctets
if_packets
operations
This field counts the number of added filters to the flow director filters logic.This field counts the number of matched filters to the flow director filters logic.

mcelog (RAS memory)

A read plugin that uses mcelog to check for memory Machine Check Exceptions and sends the stats for reported exceptions

errors

IPMI

A read plugin that reports platform thermals, voltages, fan speed, current, flow, power etc.

(specific per BMC) so these will change depending on what's supported by the BMC.
This is en example for S2600WT2R platform IPMI defines many types of sensors, but groups them into two
main categories: Threshold and discrete.
Threshold sensors are “analog”, they have continuous (or mostly continuous) readings. Things like fans speed, voltage, or temperature.
Discrete sensors have a set of binary readings that may each be independently zero or one. In some sensors, these may be independent. For instance, a power supply may have both an external power failure and a predictive failure at the same time. In other cases they may be mutually exclusive. For instance, each
bit may represent the initialization state of a piece of software.

"Memory Thermal Throttling" is related to memory thermal management system. Based on the DIMM thermal conditions it may restrict read and write traffic/bandwidth to main memory as a means of controlling power consumption. This metric is measured as a percentage and 0% means no memory throttling occurs. When thermal conditions are going high, the memory management system enables throttling and restricts the read or write traffic (e.g. 50%).

The IPMI plugin supports analog sensors of type voltage, temperature, fan and current + analog sensors that have VALUE type WATTS, CFM and percentage (%).
http://openipmi.sourceforge.net/IPMI.pdf
https://www.intel.com/content/dam/support/us/en/documents/motherboards/server/s5400sf/sb/s5400sf_tps_r2_02.pdf

intel_pmu

A read plugin that collects performance monitoring events supported by Intel Performance Monitoring Units (PMUs). The PMU is hardware built inside a processor to measure its performance parameters such as instruction cycles, cache hits, cache misses, branch misses and many others. Performance monitoring events provide facilities to characterize the interaction between programmed sequences of instructions and microarchitectural sub-systems.

The types of events are:
Hardware Events: These instrument low-level processor activity based on CPU performance counters. For example, CPU cycles, instructions retired, memory stall cycles, level 2 cache misses, etc. Some will be listed as Hardware Cache Events.
Software Events: These are low level events based on kernel counters. For example, CPU migrations, minor faults, major faults, etc.
http://www.brendangregg.com/perf.html#Events [Software event]switchesORcsL1-icache-prefetch-missesLLC-loadsLLC-load-missesLLC-storesLLC-store-missesLLC-prefetch-missesdTLB-loads dTLB-prefetch-misses

Where collectd is running	Plugin	Plugin Instance	Type	Type Instance	Description	Range	comment	Additional Info
Host/guest	CPU (A read plugin that retrieves CPU usage in Nanoseconds of as a percentage)		percent/nanoseconds	idle	Time CPU spends idle.		Can be per cpu/aggregate across all the cpus.For more info, please see:http://man7.org/linux/man-pages/man1/top.1.html http://blog.scoutapp.com/articles/2015/02/24/understanding-linuxs-cpu-stats Note that jiffies operate on a variable time base, HZ. The default value of HZ should be used (100), yielding a jiffy value of 0.01 seconds) [time(7)]. Also, the actual number of jiffies in each second is subject to system factors, such as use of virtualization. Thus, the percent calculation based on jiffies will nominally sum to 100% plus or minus error.

			percent/nanoseconds	nice	Time the CPU spent running user space processes that have been niced. The priority level a user space process can be tweaked by adjusting its niceness.
			percent/nanoseconds	interrupt	Time the CPU has spent servicing interrupts.
			percent/nanoseconds	softirq	(apparently) Time spent handling interrupts that are synthesized, and almost as important as Hardware interrupts (above). "In current kernels there are ten softirq vectors defined; two for tasklet processing, two for networking, two for the block layer, two for timers, and one each for the scheduler and read-copy-update processing. The kernel maintains a per-CPU bitmask indicating which softirqs need processing at any given time." [Ref]
			percent/nanoseconds	steal	CPU steal is a measure of the fraction of time that a machine is in a state of “involuntary wait.” It is time for which the kernel cannot otherwise account in one of the traditional classifications like user, system, or idle. It is time that went missing, from the perspective of the kernel.http://www.stackdriver.com/understanding-cpu-steal-experiment/
			percent/nanoseconds
			percent/nanoseconds	system	Time that the CPU spent running the kernel.
			percent/nanoseconds	user	Time CPU spends running un-niced user space processes.
			percent/nanoseconds	wait	The time the CPU spends idle while waiting for an I/O operation to complete
	Interface (A read plugin that retrieves Linux Interface statistics)		if_dropped	in	The total number of received dropped packets.
			if_errors	in	The total number of received error packets.		http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html
			if_octets	in	The total number of received bytes.
			if_packets	in	The total number of received packets.
			if_dropped	out	The total number of transmit packets dropped
			if_errors	out	The total number of transmit error packets. (This is the total of error conditions encountered when attempting to transmit a packet. The code here explains the possibilities, but this code is no longer present in /net/core/dev.c master at present - it appears to have moved to /net/core/net-procfs.c.)
			if_octets	out	The total number of bytes transmitted
			if_packets	out	The total number of transmitted packets
	Memory (A read plugin that retrieves memory usage statistics)		memory	buffered	The amount, in kibibytes, of temporary storage for raw disk blocks.		https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html
			memory	cached	The amount of physical RAM, in kibibytes, left unused by the system.
			memory	free	The amount of physical RAM, in kibibytes, left unused by the system.
			memory	slab_recl	The part of Slab that can be reclaimed, such as caches.		Slab — The total amount of memory, in kibibytes, used by the kernel to cache data structures for its own use
			memory	slab_unrecl	The part of Slab that cannot be reclaimed even when lacking memory		https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html
			memory	total	Total amount of usable RAM, in kibibytes, which is physical RAM minus a number of reserved bits and the kernel binary code.		https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/s2-proc-meminfo.html	This was the only undefined metric in the mem_used calculation below.
			memory	used	mem_used = mem_total - (mem_free + mem_buffered + mem_cached + mem_slab_total);		https://github.com/collectd/collectd/blob/master/src/memory.c#L325
	disk (A read plugin that retrieves disk usage statistics)		disk_io_time	io_time	time spent doing I/Os (ms). You can treat this metric as a device load percentage (Value of 1 sec time spent matches 100% of load).
			disk_io_time	weighted_io_time	measure of both I/O completion time and the backlog that may be accumulating.
			disk_merged	read	the number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better.
			disk_merged	write	the number of operations, that could be merged into other, already queued operations, i. e. one physical disk access served two or more logical operations. Of course, the higher that number, the better.
			disk_octects	read	the number of octets read from a disk or partition
			disk_octects	write	the number of octets written to a disk or partition
			disk_ops	read	the number of read operations issued to the disk
			disk_ops	write	the number of write operations issued to the disk
			disk_time	read	the average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt.
			disk_time	write	the average time an I/O-operation took to complete. Note from collectd Since this is a little messy to calculate take the actual values with a grain of salt.		https://collectd.org/wiki/index.php/Plugin:Disk
			pending_operations		shows queue size of pending I/O operations.		http://lxr.free-electrons.com/source/include/uapi/linux/if_link.h#L43
	Ping (A read plugin that retrieves the RTT for a ping)		ping		Network latency is measured as a round-trip time in milliseconds. An ICMP “echo request” is sent to a host and the time needed for its echo-reply to arrive is measured.		Latency
			ping_droprate		droprate = ((double) (pkg_sent - pkg_recv)) / ((double) pkg_sent);		https://github.com/collectd/collectd/blob/master/src/ping.c#L703
			ping_stddev		if pkg_recv > 1 latency_stddev = sqrt (((((double) pkg_recv) * latency_squared) - (latency_total * latency_total)) / ((double) (pkg_recv * (pkg_recv - 1))));		https://github.com/collectd/collectd/blob/master/src/ping.c#L698
							pkg_recv = # of echo-reply messages receivedlatency_squared = latency * latency (for a received echo-reply message)latency_total = the total latency for received echo-reply messages


	load (A read plugin that retrieves system load for 1, 5 and 15 mins.)		load	shortterm	load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 1 Minute		http://man7.org/linux/man-pages/man5/proc.5.html
				shortterm	measured CPU and IO utilization for 1 min using /proc/loadavg		https://github.com/collectd/collectd/blob/master/src/load.c
				midterm	load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 5 Minutes
				midterm	measured CPU and IO utilization for 5 mins using /proc/loadavg
				longterm	load average figures giving the number of jobs in the run queue (state R) or waiting for disk I/O (state D) averaged over 15 Minutes
				longterm	measured CPU and IO utilization for 15 mins using /proc/loadavg
	OVS events (A read plugin that retrieves events (like link status changes) from OVS.)		gauge	link_status	Link status of the OvS interface: UP or DOWN
	OVS Stats (A read plugin that retrieves interface stats from OVS.)		if_collisions		Number of collisions.		per interface
			if_rx_octets		Number of received bytes.		http://openvswitch.org/ovs-vswitchd.conf.db.5.pdf
			if_rx_errors	crc	Number of CRC errors.
			if_dropped rx:		Number of packets dropped by RX.
			if_errors rx:		Total number of receive errors, greater than or equal to the sum of the RX errors above.
			if_rx_errors	frame	Number of frame alignment errors.
			if_rx_errors	over	Number of packets with RX overrun.
			if_packets rx:		Number of received packets
			if_tx_octets		Number of transmitted bytes
			if_dropped tx:		Number of packets dropped by TX
			if_errors tx:		Total number of transmit errors, greater than or equal to the sum of the TX errors above.
			if_packets tx:		Number of transmitted packets
			if_packets rx:	1_to_64_packets	The total number of packets (including bad packets) received that were 64 octets in length (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
			if_packets rx:	65_to_127_packets	The total number of packets (including bad packets) received that were between 128 and 255 octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
			if_packets rx:	128_to_255_packets	The total number of packets (including bad packets) received that were between 256 and 511 octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
		if_packets rx:	256_to_511_packets	The total number of packets (including badpackets) received that were between 512 and 1023 octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
		if_packets rx:	512_to_1023_packets	The total number of packets (including bad packets) received that were between 1024 and 1518 octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
		if_packets rx:	1024_to_1522_packets	The total number of packets (including bad packets) received that were between 1523 and max octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
		if_packets rx:	1523_to_max_packets	The total number of packets (including bad packets) received that were between 1523 and max octets in length inclusive (excluding framing bits but including FCS octets).		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	1_to_64_packets	The total number of packets transmitted that were 64 octets in length.		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	65_to_127_packets	The total number of packets received that were between 65 and 127 octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	128_to_255_packets	The total number of packets received that were between 128 and 255 octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	256_to_511_packets	The total number of packets received that were between 256 and 511 octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	512_to_1023_packets	The total number of packets received that were between 512 and 1023 octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	1024_to_1522_packets	The total number of packets received that were between 1024 and 1518 octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	1523_to_max_packets	The total number of packets received that were between 1523 and max octets in length inclusive		supported in OvS v2.6+ and dpdk ports only
		if_multicast	tx_multicast_packets	The number of good packets transmitted that were directed to a multicast. Note: that this number does not include packets directed to the broadcast address		supported in OvS v2.6+ and dpdk ports only
		if_packets rx:	broadcast_packets	The total number of packets (including bad packets, broadcast packets, and multicast packets) received.		supported in OvS v2.6+ and dpdk ports only
		if_packets tx:	broadcast_packets	The number of good packets transmitted that were directed to the broadcast address.		supported in OvS v2.6+ and dpdk ports only
		if_rx_errors	rx_undersized_errors	The total number of packets received that were less than 64 octets long (excluding framing bits, but including FCS octets) and were otherwise well formed.		supported in OvS v2.6+ and dpdk ports only
		if_rx_errors	rx_oversize_errors	The total number of packets received that were longer than max octets (excluding framing bits, but including FCS octets) and were otherwise well formed.		supported in OvS v2.6+ and dpdk ports only
		if_rx_errors	rx_fragmented_errors	The total number of packets received that were less than 64 octets in length (excluding framing bits but including FCS octets) and had either a bad Frame Check Sequence (FCS) with an integral number of octets (FCS Error) or a bad FCS with a non-integral number of octets (Alignment Error). Note: that it is entirely normal for rx_fragmented_errors to increment. This is because it counts both runts (which are normal occurrences due to collisions) and noise hits		supported in OvS v2.6+ and dpdk ports only
		if_rx_errors	rx_jabber_errors	The total number of jabber packets received that had either a bad Frame Check Sequence (FCS) with an integral number of octets (FCS Error) or a bad FCS with a non-integral number of octets (Alignment Error).		supported in OvS v2.6+ and dpdk ports only
Hugepages (A read plugin that retrieves the number of available and free hugepages on a platform as well as what is available in terms of hugepages per socket.)		bytesmemory	used	Number of used hugepages in bytes		total/pernode/both	Virtual memory makes it easy for several processes to share memory [TM1] . Each process has its own virtual address space, which is mapped to physical memory by the operating system” [TM2] . The process views the virtual memory address space as a contiguous/linear address space, but in reality – the virtual addresses need to be mapped to physical addresses, this is typically done by the Memory Management Unit (MMU) on the CPU. “There are two ways to enable the system to manage large amounts of memory: Increase the number of page table entries in the hardware memory management unit Increase the page size/use huge pages à to reduce the number of lookups The first method is expensive, since the hardware memory management unit in a modern processor only supports hundreds or thousands of page table entries. Additionally, hardware and memory management algorithms that work well with thousands of pages (megabytes of memory) may have difficulty performing well with millions (or even billions) of pages. This results in performance issues: when an application needs to use more memory pages than the memory management unit supports, the system falls back to slower, software-based memory management, which causes the entire system to run more slowly. Huge pages are blocks of memory that come in 2MB and 1GB sizes. The page tables used by the 2MB pages are suitable for managing multiple gigabytes of memory, whereas the page tables of 1GB pages are best for scaling to terabytes of memory” [TM3] More info on virtual memory and TLB lookups can be found: http://www.tldp.org/LDP/tlk/mm/memory.html https://lwn.net/Articles/253361/ [TM1]http://www.tldp.org/LDP/tlk/mm/memory.html [TM2]https://www.kernel.org/doc/gorman/html/understand/understand007.html [TM3]https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-memory-transhuge.html
		bytesmemory	free	Number of free hugepages in bytes
		vmpage_number	used	Number of used hugepages in numbers
		vmpage_number	free	Number of free hugepages in numbers
		percent	used	Number of used hugepages in percent
		percent	free	Number of free hugepages in percent
processes (A read plugin that collects the number of processes, grouped by their state (e. g. running, sleeping, zombies, etc.). In addition to that, it can select detailed statistics about selected processes, grouped by name.)		fork_rate		the number of threads created since the last reboot		The information comes mainly from /proc/PID/status, /proc/PID/psinfo and /proc/PID/usage.
		ps_state	blocked	the number of processes in a blocked state		https://collectd.org/wiki/index.php/Plugin:Processes
		ps_state	paging	the number of processes in a paging state		http://man7.org/linux/man-pages/man5/proc.5.html
		ps_state	running	the number of processes in a running state
		ps_state	sleeping	the number of processes in a sleeping state
		ps_state	stopped	the number of processes in a stopped state
		ps_state	zombies	the number of processes in a Zombie state
Host only	virt (A read plugin that uses virtualization API libvirt to gather statistics about virtualized guests on a system directly from the hypervisor, without a need to install collectd instance on the guest.)		disk_octets	DISK	number of read/write bytes as unsigned long long.
			disk_ops	DISK	number of read/write requests
			disk_time	flush-DISK	total time spend on cache reads/writes in nano-seconds
			if_dropped	INTERFACE	packets dropped on rx/tx as unsigned long long
			if_errors	INTERFACE	rx/tx errors as unsigned long long
			if_octets	INTERFACE	bytes received/transmitted as unsigned long long
			if_packets	INTERFACE	packets received/transmitted as unsigned long long
			memory	actual_balloon	Resident Set Size of the process running the domain. This value is in kB		https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainMemoryStatStruct
			memory	rss	How much the balloon can be inflated without pushing the guest system to swap, corresponds to 'Available' in /proc/meminfo
					memory	swap_in	The total amount of memory written out to swap space (in kB).				memory	swap_in	The total amount of memory written out to swap space (in kB).
			memory	swap_out	Amount of memory written out to swap space
			memory	major_fault	Number of page faults when disk IO was required
			memory	minor_fault	Number of other page faults
			memory	unused	Amount of memory left unused by the system
			memory	available	Amount of usable memory as seen by the domain
			memory	usable	Amount of memory which can be reclaimed by balloon without causing host swapping
			memory	last_update	Timestamp of the last update of statistics
			memory	total	the memory in KBytes used by the domain
			virt_cpu_total		the CPU time used in nanoseconds
			virt_vcpu	VCPU_NR	the CPU time used in nanoseconds per cpu
			cpu_affinity	vcpu_NR-cpu_NR	pinning of domain VCPUs to host physical CPUs.		Value stored is a boolean.
			job_stats	*	Information about progress of a background/completed job on a domain.		Number of metrics depend on job type. Check API documentation for more information: virDomainGetJobStats
			disk_error	DISK_NAME	Disk error code		Metric isn’t dispatched for disk with no errors
			percent	virt_cpu_total	CPU utilization in percentage per domain
			perf	*	Performance monitoring events		Number of metrics depends on libvirt API version. Following perf metric are avilable in libvirt API version 2.4. To collectd perf metric they must be enabled in domain and supported by the platform. To collectd perf metric they must be enabled in domain and supported by the platform.
			perf	perf_alignment_faults	Count of alignment faults
			perf	perf_branch_instructions	Count of branch instructions
			perf	perf_branch_misses	Count of branch misses
			perf	perf_bus_cycles	the count of cpu cycles (total/elapsed)
				perf_cache_misses	the count of cache misses by applications running on the platform
			perf	perf_cache_references	the count of cache hits by applications running on the platform
			perf	perf_cmt	usage of l3 cache in bytes by applications running on the platform
			perf	perf_context_switches	Count of context switches
			perf	perf_cpu_clock	Count of CPU clock time
			perf	perf_cpu_	cycles	total system bandwidth from one level of cache			Count of CPU cycles (total/elapsed)
			perf	perf_cpu_migrations	Count of CPU migrations
			perf	perf_emulation_faults	Count of emulation faults
			perf	perf_instructions	the count of instructions by applications running on the platform
			perf	perf_mbml	bandwidth of memory traffic for a memory controller
			perf	perf_cpu_cycles	the count of cpu cycles (total/elapsed)	mbmt	total system bandwidth from one level of cache
			perf	perf_page_	faults	the count of instructions by applications running on the platform				Count of page faults
			perf	perf_page_faults_maj	Count of major page faults
			perf	perf_	page_faults_	min	the count of cache hits by applications running on the platform					Count of minor page faults
			perf	perf_ref_	cpu_	cycles	The count of			ref CPU cycles
			perf	perf_task_clock	Count of task clock time
			ps_cputime		physical user/system cpu time consumed by the hypervisor
			total_requests	flush-DISK	total flush requests of the block device
			total_time_in_ms	flush-DISK	total time spend on cache flushing in milliseconds
	RDT (A read plugin that provides the last level cache utilization and memory bandwidth utilization) NOTE: it's recommended to configure the interval for this plugin as 1 second	Core number or group of cores	ipc		Number of instructions per clock per core group		per core group	A higher IPC means that the processor can get more work done per unit time, which generally translates to faster application performance [0] "Ideally every instruction a CPU gets should be read, executed and finished in one cycle, however that is never the case. The processor has to take the instruction, decode the instruction, gather the data (depends on where the data is), perform work on the data, then decide what to do with the result. Moving has never been more complicated, and the ability for a processor to hide latency, pre-prepare data by predicting future events or keeping hold of previous events for potential future use is all part of the plan. All the meanwhile there is an external focus on making sure power consumption is low and the frequency of the processor can scale depending on what the target device actually is". [1] 0 https://www.nextplatform.com/2016/03/31/examining-potential-hpc-benefits-new-intel-xeon-processors/ 1 http://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc/3		memory_bandwidth	local	http://www.anandtech.com/show/9482/intel-broadwell-pt2-overclocking-ipc/3
			memory_bandwidth	local	Local Memory Bandwidth in Bytes per second, for a specified processor (socket), Resource Monitoring ID (RMID), and a given measurement time over the preceding measurement interval. RMID may be mapped to Threads, Application processes, or VMs by the host OS or hypervisor. Local refers to Memory Bandwidth used by RMID within the specified processor (socket).		Question is whether this metric is always reported in units of MegaBytes per second.	Image Added https://software.intel.com/en-us/articles/memory-bandwidth-monitoring-proof-points
			memory_bandwidth	remote	Remote Memory Bandwidth in Bytes per second, for a specified processor (socket), Resource Monitoring ID (RMID), and a given measurement time over the preceding measurement interval. RMID may be mapped to Threads, Application processes, or VMs by the host OS or hypervisor. Remote refers to Memory Bandwidth used by RMID external to the specified processor (socket), such as the other processor in a pair.
			bytes	llc	Last Level Cache occupancy in bytes, for a specified processor (socket), Resource Monitoring ID (RMID), and a given measurement time over the preceding measurement interval. RMID and Class of Service (CLOS) limits may be mapped to Threads, Application processes, or VMs by the host OS or hypervisor. Local refers to Memory Bandwidth used by RMID within the specified processor (socket).		Question is whether this metric is always reported in units of MegaBytes per second.	Image Removed https://software.intel.com/en-us/articles/memory-bandwidth-monitoring-proof-points		memory_bandwidth	remote	Remote Memory Bandwidth in Bytes per second, for a specified processor (socket), Resource Monitoring ID (RMID), and a given measurement time over the preceding measurement interval. RMID may be mapped to Threads, Application processes, or VMs by the host OS or hypervisor. Remote refers to Memory Bandwidth used by RMID external to the specified processor (socket), such as the other processor in a pair host OS or hypervisor.		Question if units are in bytes or kibibits. Al Morton after looking into this, I found that kilobytes is what is used for the cache. In collectd we are reporting it simply in bytes. I updated the 4th column to reflect that as well as the definition.
Host/guest	dpdkstats (A read plugin that retrieve stats from the DPDK extended NIC stats API.)		derive	rx_l3_l4_xsum_error	Number of receive IPv4, TCP, UDP or SCTP XSUM errors.
			errors	flow_director_filter_add_errors	Number of failed added filters		compatible with DPDK 16.04, 16.07 (based on ixgbe, vhost support will be enabled in DPDK 16.11)
				flow_director_filter_remove_errors	Number of failed removed filters
				mac_local_errors	Number of faults in the local MAC.				bytes	llc	Last Level Cache occupancy in bytes, for a specified processor (socket), Resource Monitoring ID (RMID), and a given measurement time over the preceding measurement interval. RMID and Class of Service (CLOS) limits may be mapped to Threads, Application processes, or VMs by the host OS or hypervisor.		Question if units are in bytes or kibibits. Al Morton after looking into this, I found that kilobytes is what is used for the cache. In collectd we are reporting it simply in bytes. I updated the 4th column to reflect that as well as the definition.
		Host/guest		dpdkstats (A read plugin that retrieve stats from the DPDK extended NIC stats API.)		derive	rx_l3_l4_xsum_error	Number of receive IPv4, TCP, UDP or SCTP XSUM errors.
					flow_director_filter_add_errors	Number of failed added filters		compatible with DPDK 16.04, 16.07 (based on ixgbe, vhost support will be enabled in DPDK 16.11)			flow_director_filter_remove_errors	Number of failed removed filters	mac_remote_errors	Number of faults in the remote MAC.
					if_rx_dropped	rx_fcoe_dropped	Number of Rx packets dropped due to lack of descriptors.
						rx_mac_short_packet_dropped	Number of MAC short packet discard packets received.
						rx_management_dropped	Number of management packets dropped. This register counts the total number of packets received that pass the management filters and then are dropped because the management receive FIFO is full. Management packets include any packet directed to the manageability console (such as RMCP and ARP packets).
						rx_priorityX_dropped	Number of dropped packets received per UP		where X is 0 to 7
					if_rx_errors	rx_crc_errors	Counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (from <Destination Address> through <CRC>, inclusively) in length.
						rx_errors	Number of errors received
						macrx_fcoe_localcrc_errors	Number of faults in the local MAC.FC CRC Count.
						mac_remote_errors	Number of faults in the remote MAC.	Count the number of packets with good Ethernet CRC and bad FC CRC
						rx_	fcoe_	rxmbuf_fcoeallocation_droppederrors	Number of fcoe Rx packets dropped due to lack of descriptors.
						rx_mac_short_packet_droppedNumber of MAC short packet discard packets received._fcoe_no_direct_data_placement
						rx_fcoe_no_direct_data_placement_ext_buff
						rx_managementfragment_droppederrors	Number of management packets dropped. This register counts the total number of packets received that pass the management filters and then are dropped because the management receive FIFO is full. Management packets include any packet directed to the manageability console (such as RMCP and ARP packets).receive fragment errors (frame shorted than 64 bytes from <Destination Address> through <CRC>, inclusively) that have bad CRC (this is slightly different from the Receive Undersize Count register).
						rx_illegal_priorityXbyte_droppederrors	Number Counts the number of dropped packets received per UPreceive packets with illegal bytes errors (such as there is an illegal symbol in the packet).		where X is 0 to 7
						rx_	jabber_errors	rx_crc_errors	Counts Number of receive jabber errors. This register counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (received packets that are greater than maximum size and have bad CRC (this is slightly different from the Receive Oversize Count register). The packets length is counted from <Destination Address> through <CRC>, inclusively) in lengthinclusively.
						rx_length_errors	Number of errors received
						rx_fcoe_crc_errors	FC CRC Count.
							Count the number of packets with good Ethernet CRC and bad FC CRCpackets with receive length errors. A length error occurs if an incoming packet length field in the MAC header doesn't match the packet length.
						rx_fcoe_mbuf_allocation_errors	Number of fcoe Rx packets dropped due to lack of descriptors.
						rx_fcoe_no_direct_data_placement
						rx_fcoe_no_direct_data_placement_ext_buff
					rx_fragmentoversize_errors	Number of receive fragment errors (frame shorted than 64 bytes eceive Oversize Error. This register counts the number of received frames that are longer than maximum size as defined by MAXFRS.MFS (from <Destination Address> through <CRC>, inclusively) that and have bad CRC (this is slightly different from the Receive Undersize Count register)valid CRC.
					rx_illegal_byte_errors	Counts the number of receive packets with illegal bytes errors (such as there is an illegal symbol in the packet).		_priorityX_mbuf_allocation_errors	Number of received packets per UP dropped due to lack of descriptors.		where X is 0 to 7
					rx_q0_errors	Number of errors received for the queue.		if more queues are allocated then you get the errors per Queue
					rx_jabberundersize_errors	Number of receive jabber errorsReceive Undersize Error. This register counts the number of received packets frames that are greater shorter than maximum minimum size and have bad CRC (this is slightly different from the Receive Oversize Count register). The packets length is counted from (64 bytes from <Destination Address> through <CRC>, inclusively), and had a valid CRC.
					if_rx_octets	rx_lengtherror_errorsbytes	Number Counts the number of receive packets with receive length errors. A length error occurs if an incoming packet length field in the MAC header doesn't match the packet length.		error bytes (such as there is an error symbol in the packet). This registers counts all packets received, regardless of L2 filtering and receive enablement.		bug - will move this to errors
						rx_mbuf_allocation_errorsNumber of Rx packets dropped due to lack of descriptors.fcoe_bytes	number of received fcoe bytes
						rx_oversizegood_errorseceive Oversize Errorbytes	Good octets/bytes received count. This register counts the number of received frames that are longer than maximum size as defined by MAXFRS.MFS (from <Destination Address> through <CRC>, inclusively) and have valid CRC. includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusively.
						rx_priorityX_mbuf_allocation_errors	Number of received packets per UP dropped due to lack of descriptors.		where X is 0 to 7			rx_q0_errorsq0_bytes	Number of errors bytes received for the queue.		if more queues are allocated then you get the errors per Queueper queue
						rx_undersizetotal_errorsReceive Undersize Error. This register counts the number of received frames that are shorter than minimum size (64 bytes from <Destination Address> through <CRC>, inclusively), and had a valid CRCbytes	Total received octets. This register includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusively.
					if_rx_	packets	rx_errorbroadcast_bytespackets	Counts the number of receive packets with error bytes (such as there is an error symbol in the packet). This registers counts all packets received, regardless of L2 filtering and receive enablementNumber of good (non-erred) broadcast packets received.		bug - will move this to errors
							rx_fcoe_bytespackets	number of received fcoe bytes					rx_good_bytes	Good octets/bytes received count. This register includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusivelyNumber of FCoE packets posted to the host. In normal operation (no save bad frames) it equals to the number of good packets.
							rx_flow_control_q0xoff_bytespackets	Number of bytes received for the queueXOFF packets received. This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities.		per queue
							rx_flow_control_totalxon_bytesTotal received octets. This register includes bytes received in a packet from the <Destination Address> field through the <CRC> field, inclusivelypackets	Number of XON packets received. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities.
							rx_	rx_broadcastgood_packets	Number of good (non-erred) broadcast packets receivedRx packets (from the network).
							rx_fcoemanagement_packets	Number of FCoE packets posted to the host. In normal operation (no save bad frames) it equals to the number of good packetsmanagement packets received. This register counts the total number of packets received that pass the management filters. Management packets include RMCP and ARP packets. Any packets with errors are not counted, except for the packets that are dropped because the management receive FIFO is full are counted.
							rx_flow_controlmulticast_xoff_packets	Number of XOFF good (non-erred) multicast packets received . This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities.(excluding broadcast packets). This register does not count received flow control packets.
							rx_flowpriorityX_control_xonxoff_packets	Number of XON packets received. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities.		XOFF packets received per UP		where X is 0 to 7
							rx_priorityX_goodxon_packets	Number of good (non-erred) Rx packets (from the network).XON packets received per UP		where X is 0 to 7
							rx_managementq0_packets	Number of management packets received . This register counts the total number of packets received that pass the management filters. Management packets include RMCP and ARP packets. Any packets with errors are not counted, except for the packets that are dropped because the management receive FIFO is full are counted.for the queue.		per queue
							rx_size_1024_to_max_packets	Number of packets received that are 1024-max bytes in length (from <Destination Address> through <CRC>, inclusively). This registers does not include received flow control packets. The maximum is dependent on the current receiver configuration and the type of packet being received. If a packet is counted in receive oversized count, it is not counted in this register. Due to changes in the standard for maximum frame size for VLAN tagged frames in 802.3, packets can have a maximum length of 1522 bytes.
							rx_multicastsize_128_to_255_packets	Number of good (non-erred) multicast packets received (excluding broadcast packets). This register does not count received flow control packets. that are 128-255 bytes in length (from <Destination Address> through <CRC>, inclusively).
							rx_size_256_priorityXto_xoff511_packets	Number of XOFF packets received per UPthat are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively).		where X is 0 to 7
							rx_size_512_priorityXto_xon1023_packets	Number of XON packets received per UPthat are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively).		where X is 0 to 7
							rx_size_q064_packets	Number of packets received for the queuegood packets received that are 64 bytes in length (from <Destination Address> through <CRC>, inclusively).		per queue
							rx_size_102465_to_max127_packets	Number of packets received that are 102465-max 127 bytes in length (from <Destination Address> through <CRC>, inclusively). This registers does not include received flow control packets. The maximum is dependent on the current receiver configuration and the type of packet being received. If a packet is counted in receive oversized count, it is not counted in this register. Due to changes in the standard for maximum frame size for VLAN tagged frames in 802.3, packets can have a maximum length of 1522 bytes.
							rx_total_missed_packets	the total number of rx missed packets, that is is a packet that was correctly received by the NIC but because it was out of descriptors and internal memory, the packet had to be dropped by the NIC itself
							rx_size_128_to_255total_packets	Number of all packets received that are 128-255 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts the total number of all packets received. All packets received are counted in this register, regardless of their length, whether they are erred, but excluding flow control packets.
							rx_size_256_to_511xoff_packets	Number of packets received that are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively)XOFF packets received. Sticks to 0xFFFF. XOFF packets can use the global address or the station address. This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only this counter is incremented.
							rx_size_512_to_1023xon_packets	Number of packets received that are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively)XON packets received. XON packets can use the global address, or the station address. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only the LXOFFRXCNT counter is incremented.
					rxif_sizetx_64errors	tx_packetserrors	Number Total number of good packets received that are 64 bytes in length (from <Destination Address> through <CRC>, inclusively).TX error packets
			rx		if_	tx_	octets	tx_tofcoe_127_packetsbytes	Number of packets received that are 65-127 bytes in length (from <Destination Address> through <CRC>, inclusively)fcoe bytes transmitted
								rxtx_totalgood_missed_packetsthe total number of rx missed packets, that is is a packet that was correctly received by the NIC but because it was out of descriptors and internal memory, the packet had to be dropped by the NIC itselfbytes	counter of successfully transmitted octets. This register includes transmitted bytes in a packet from the <Destination Address> field through the <CRC> field, inclusively.
								rxtx_totalq0_packetsbytes	Number of all packets received. This register counts the total number of all packets received. All packets received are counted in this register, regardless of their length, whether they are erred, but excluding flow control packets.	bytes transmitted by the queue.		per queue
						if_tx_packets	tx_broadcast_packets	Number of XOFF packets received. Sticks to 0xFFFF. XOFF packets can use the global address or the station address. This register counts any XOFF packet whether it is a legacy XOFF or a priority XOFF. Each XOFF packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only this counter is incremented.broadcast packets transmitted count. This register counts all packets, including standard packets, secure packets, FC packets and manageability packets
					rxtx_xonfcoe_packets		Number of XON packets received. XON packets can use the global address, or the station address. This register counts any XON packet whether it is a legacy XON or a priority XON. Each XON packet is counted once even if it is designated to a few priorities. If a priority FC packet contains both XOFF and XON, only the LXOFFRXCNT counter is incremented.fcoe packets transmitted
					iftx_flow_txcontrol_errorstxxoff_errorsTotal number of TX error packetspackets		Link XOFF Transmitted Count
					tx_		flow_	txcontrol_fcoexon_bytesNumber of fcoe bytes transmittedpackets	Link XON Transmitted Count
					tx_good_bytespackets		counter of successfully transmitted octets. This register includes transmitted bytes in a packet from the <Destination Address> field through the <CRC> field, inclusively.Number of good packets transmitted
	tx_q0management_bytespackets		Number of bytes management packets transmitted by the queue.				per queue
	tx_		tx_broadcastmulticast_packets		Number of broadcast multicast packets transmitted. This register counts the number of multicast packets transmitted count. This register counts all packets, including standard packets, secure packets, FC packets and manageability packets.
	tx_priorityX_fcoexoff_packets		Number of fcoe XOFF packets transmitted per UP				where X is 0 to 7
			tx_flow_control_xoff_packets		Link XOFF Transmitted Count					tx_flow_control_priorityX_xon_packets	Link XON Transmitted CountNumber of XON packets transmitted per UP		where X is 0 to 7
	tx_goodq0_packets	Number of good packets transmitted		packets transmitted for the queue. A packet is considered as transmitted if it is was forwarded to the MAC unit for transmission to the network and/or is accepted by the internal Tx to Rx switch enablement logic. Packets dropped due to anti-spoofing filtering or VLAN tag validation (as described in Section 7.10.3.9.2) are not counted.			per queue
	tx_management_packetsNumber of management packets transmitted_size_1024_to_max_packets	Number of packets transmitted that are 1024 or more bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets.
	tx_multicast_size_128_to_255_packets	Number of multicast packets transmitted that are 128-255 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts the number of multicast packets transmitted. This register counts all packets, including standard packets, secure packets, FC packets and manageability packets.
		tx_priorityXsize_xoff_packets	Number of XOFF packets transmitted per UP		where X is 0 to 7			tx_priorityX_xon256_to_511_packets	Number of XON packets transmitted per UP		where X is 0 to 7packets transmitted that are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets.
	tx_q0_size_512_to_1023_packets	Number of packets transmitted for the queue. A packet is considered as transmitted if it is was forwarded to the MAC unit for transmission to the network and/or is accepted by the internal Tx to Rx switch enablement logic. Packets dropped due to anti-spoofing filtering or VLAN tag validation (as described in Section 7.10.3.9.2) are not counted.		per queuethat are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets.
	tx_size_1024_to_max64_packets	Number of packets transmitted that are 1024 or more 64 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure secure packets, FC packets, and manageability packets.
	tx_size_12865_to_255127_packets	Number of packets transmitted that are 12865-255 127 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets.
	tx_size_256_to_511total_packets	Number of all packets transmitted. This register counts the total number of all packets transmitted that are 256-511 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packetspackets, FC packets, and manageability packets.
	tx_xoff_packets	Number of XOFF packets transmitted
	tx_xon_packets	Number of XON packets transmitted
	operations	flow_director_added_filters	This field counts the number of added filters to the flow director filters logic.
		txflow_sizedirector_512_to_1023_packetsNumber of packets transmitted that are 512-1023 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packetsmatched_filters	This field counts the number of matched filters to the flow director filters logic.
		txflow_sizedirector_64missed_packetsNumber of packets transmitted that are 64 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, FC packets, and manageability packetsfilters	This field counts the number of missed filters to the flow director filters logic.
		txflow_sizedirector_65_to_127_packetsNumber of packets transmitted that are 65-127 bytes in length (from <Destination Address> through <CRC>, inclusively). This register counts all packets, including standard packets, secure packets, and manageability packets.removed_filters	This field counts the number of removed filters from the flow director filters logic.
	tx_total_packets	Number of all packets transmitted. This register counts the total number of all packets transmitted. This register counts all packets, including standard packets, secure packets, FC packets, and manageability packets.
	tx_xoff_packets	Number of XOFF packets transmitted
	tx_xon_packets	Number of XON packets transmitted
	flow_director_added_filters	mcelog (RAS memory) A read plugin that uses mcelog to check for memory Machine Check Exceptions and sends the stats for reported exceptions		errors	corrected_memory_errors	The total number of hardware errors that were corrected by the hardware (e.g. using a single bit data corruption that was correctible using ECC). These errors do not require immediate software actions, but are still reported for accounting and predictive failure analysis.		Memory (RAM) errors are among the most common errors in typical server systems. They also scale with the amount of memory: the more memory the more errors. In addition large clusters of computers with tens or hundreds (or sometimes thousands) of active machines increase the total error rate of the system.
	uncorrected_memory_error		the total number of uncorrected hardware errors detected by the hardware. Data corruption has occurred. These errors require software reaction.			http://www.mcelog.org/memory.html
	corrected_memory_errors_in_%s		The total number of hardware errors that were corrected by the hardware in a certain period of time			where %s is a timed period like 24 hours
					flow_director_matched_filters		http://www.mcelog.org/memory.html
					flow_director_missed_filters	This field counts the number of missed filters to the flow director filters logic.uncorrected_memory_errors_in_%s	the total number of uncorrected hardware errors detected by the hardware in a certain period of time		where %s is a timed period like 24 hours
						flow_director_removed_filters	This field counts the number of removed filters from the flow director filters logic.					corrected_memory_errors	The total number of hardware errors that were corrected by the hardware (e.g. using a single bit data corruption that was correctible using ECC). These errors do not require immediate software actions, but are still reported for accounting and predictive failure analysis.		Memory (RAM) errors are among the most common errors in typical server systems. They also scale with the amount of memory: the more memory the more errors. In addition large clusters of computers with tens or hundreds (or sometimes thousands) of active machines increase the total error rate of the system.
	uncorrected_memory_error	the total number of uncorrected hardware errors detected by the hardware. Data corruption has occurred. These errors require software reaction.		http://www.mcelog.org/memory.html
	corrected_memory_errors_in_%s	The total number of hardware errors that were corrected by the hardware in a certain period of time		where %s is a timed period like 24 hours
				http://www.mcelog.org/memory.html
	uncorrected_memory_errors_in_%s	the total number of uncorrected hardware errors detected by the hardware in a certain period of time		where %s is a timed period like 24 hours
				http://www.mcelog.org/memory.html
Host		percent	MTT CPU2 (Memory Throttling sensor)				MTT CPU1 (Memory Throttling sensor)http://www.mcelog.org/memory.html
	Host		IPMI A read plugin that reports platform thermals, voltages, fan speed, current, flow, power etc. (specific per BMC) so these will change depending on what's supported by the BMC. This is en example for S2600WT2R platform		percent	MTT CPU2 (Memory Throttling sensor)	IPMI defines many types of sensors, but groups them into two main categories: Threshold and discrete. Threshold sensors are “analog”, they have continuous (or mostly continuous) readings. Things like fans speed, voltage, or temperature. Discrete sensors have a set of binary readings that may each be independently zero or one. In some sensors, these may be independent. For instance, a power supply may have both an external power failure and a predictive failure at the same time. In other cases they may be mutually exclusive. For instance, each bit may represent the initialization state of a piece of software. "Memory Thermal Throttling" is related to memory thermal management system. Based on the DIMM thermal conditions it may restrict read and write traffic/bandwidth to main memory as a means of controlling power consumption. This metric is measured as a percentage and 0% means no memory throttling occurs. When thermal conditions are going high, the memory management system enables throttling and restricts the read or write traffic (e.g. 50%).		The IPMI plugin supports analog sensors of type voltage, temperature, fan and current + analog sensors that have VALUE type WATTS, CFM and percentage (%). http://openipmi.sourceforge.net/IPMI.pdf https://www.intel.com/content/dam/support/us/en/documents/motherboards/server/s5400sf/sb/s5400sf_tps_r2_02.pdf
						MTT CPU1 (Memory Throttling sensor)
						P2 Therm Ctrl %
						P1 Therm Ctrl %
				PS1 Curr Out %
				voltage	BB +3.3V Vbat
				voltage	BB +12.0V
				temperature	Agg Therm Mgn 1
					DIMM Thrm Mrgn 4
					DIMM Thrm Mrgn 3
					DIMM Thrm Mrgn 2
					DIMM Thrm Mrgn 1
					P2 DTS Therm Mgn
					P1 DTS Therm Mgn
					P2 Therm Ctrl %
					P1 Therm Ctrl %
					P2 Therm Margin
					P1 Therm Margin
					PS1 Curr Out %Temperature
					voltage	BB +3.3V Vbat LAN NIC Temp
					voltage	Exit Air Temp				BB +12.0V
					HSBP 1 Temp					temperature
		Agg Therm Mgn 1			I/O Mod Temp
		DIMM Thrm Mrgn 4			BB Lft Rear Temp
		DIMM Thrm Mrgn 3			BB Rt Rear Temp
		DIMM Thrm Mrgn 2			BB BMC Temp
		DIMM Thrm Mrgn 1			SSB Temp
		P2 DTS Therm Mgn			Front Panel Temp
		P1 DTS Therm Mgn			BB P2 VR Temp
		P2 Therm Ctrl %			BB P1 VR Temp
		P1 Therm Ctrl %		fan	System Fan 6B
		P2 Therm Margin			System Fan 6A
		P1 Therm Margin			System Fan 5B
		PS1 Temperature			System Fan 5A
		LAN NIC Temp			System Fan 4B
		Exit Air Temp			System Fan 4A
		HSBP 1 Temp			System Fan 3B
		I/O Mod Temp			System Fan 3A
		BB Lft Rear Temp			System Fan 2B
		BB Rt Rear Temp			System Fan 2A
		BB BMC Temp			System Fan 1B
		SSB Temp			System Fan 1A
		Front Panel Temp		CFM	System Airflow
BB P2 VR Temp		watts		PS1 Input Power
Host
		BB P1 VR Temp
		fan	System Fan 6B
			System Fan 6A	intel_pmu A read plugin that collects performance monitoring events supported by Intel Performance Monitoring Units (PMUs). The PMU is hardware built inside a processor to measure its performance parameters such as instruction cycles, cache hits, cache misses, branch misses and many others. Performance monitoring events provide facilities to characterize the interaction between programmed sequences of instructions and microarchitectural sub-systems.		counter	cpu-cycles	[Hardware event]		The types of events are: Hardware Events: These instrument low-level processor activity based on CPU performance counters. For example, CPU cycles, instructions retired, memory stall cycles, level 2 cache misses, etc. Some will be listed as Hardware Cache Events. Software Events: These are low level events based on kernel counters. For example, CPU migrations, minor faults, major faults, etc. http://www.brendangregg.com/perf.html#Events
			System Fan 5Binstructions
	System Fan 5A		cache-references
	System Fan 4B		cache-misses
			System Fan 4Abranches
	System Fan 3B		branch-misses
	System Fan 3A		bus-cycles
	System Fan 2B		cpu-clock		[Software event]
	System Fan 2A		task-clock
	System Fan 1B		page-faults
	System Fan 1A		minor-faults
		CFM	System Airflowmajor-faults
		watts	PS1 Input Powercontext-switches
		Host					counter	cpu-cycles	[Hardware event]					instructionscpu-migrations
			alignment-faults
			emulation-faults
			L1-dcache-loads		[Hardware cache event]
			L1-dcache-load-misses
			L1-dcache-stores
			L1-dcache-store-misses
			cacheL1-dcache-referencesprefetches
			cacheL1-dcache-prefetch-misses
			branchL1-icache-instructionsORbranchesloads
			branchL1-icache-load-misses
			busL1-icache-cyclesprefetches
			cpu-clock					L1-icache-prefetch-misses
			taskLLC-clockloads
			pageLLC-load-faultsORfaultsmisses
			minorLLC-faultsstores
			majorLLC-store-faultsmisses
	context		LLC-					prefetches
			cpuLLC-prefetch-migrationsORmigrationsmisses
			alignmentdTLB-faultsloads
			emulationdTLB-load-faultsmisses
			L1dTLB-dcache-loads					[Hardware cache event]	stores
			L1-dcachedTLB-loadstore-misses
			L1dTLB-dcache-storesprefetches
			L1dTLB-dcache-storeprefetch-misses
			L1iTLB-dcache-prefetchesloads
	L1iTLB-dcache-prefetchload-misses
	L1-icachebranch-loads
	L1-icachebranch-load-misses											L1-icache-prefetches
Host	OVS PMD stats		main thread	counter	emc hits	Number of packets hitting Exact Match Cache.						Plugin instance: pmd thread can be different combinations of <numa_id> and <core_id>. e.g: ovs_pmd_stats-pmd_thread_numa_id_0_core_id_5
			pmd thread _ numa_id #value _ core_id #value		megaflow hits	Number of packets hitting Megaflow Cache.
					avg. subtable lookups per hit	Average number of subtable lookups for every hit.
					miss	Number of packets not matching any existing flow.
					lost	Number of packets destined for user space process but subsequently dropped before reaching userspace.
					polling cycles	Number of cycles used for polling packets.
						LLC-prefetches processing cycles			Number of cycles used for processing incoming packets.
					avg cycles per packet	Average number of cycles per packet.
					avg processing cycles per packet	Average number of processing cycles per packet.
	dTLB-load-missesHost		cpufreq		cpufreq				dTLB-stores	The realtime speed of the CPU reported from: /sys/devices/system/cpu/cpu<id>/cpufreq/scaling_cur_freq			dTLB-store-misses
Host	cpusleep		dTLB-prefetches		total_time_in_ms				Sleep is calculated in milliseconds = CLOCK_BOOTTIME - CLOCK_MONOTONIC
	iTLB-loads
	iTLB-load-misses
	branch-loads
	branch-load-misses
Host	OVS PMD stats	main thread	counter	emc hits	Number of packets hitting Exact Match Cache.				Plugin instance: pmd thread can be different combinations of <numa_id> and <core_id>. e.g: ovs_pmd_stats-pmd_thread_numa_id_0_core_id_5
		pmd thread _ numa_id #value _ core_id #value		megaflow hits	Number of packets hitting Megaflow Cache.
				avg. subtable lookups per hit	Average number of subtable lookups for every hit.
				miss	Number of packets not matching any existing flow.
				lost	Number of packets destined for user space process but subsequently dropped before reaching userspace.
				polling cycles	Number of cycles used for polling packets.
				processing cycles	Number of cycles used for processing incoming packets.
				avg cycles per packet	Average number of cycles per packet.
				avg processing cycles per packet	Average number of processing cycles per packet. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. This clock is not affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the clock), but is affected by the incremental adjustments performed by adjtime(3) and NTP. CLOCK_BOOTTIME (since Linux 2.6.39; Linux-specific)Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. This allows applications to get a suspend-aware monotonic clock without having to deal with the complications of CLOCK_REALTIME, which may have discontinuities if the time is changed usingsettimeofday(2). Reference: https://linux.die.net/man/2/clock_gettime
Host	Numa	vmpage_action	interleave_hit
			local_node
			numa_foreign
			numa_hit
			numa_miss
			other_node

Events

NOTE: Collectd can generate events based on thresholds for any of the metrics reported in the table above. For more info please see: https://collectd.org/documentation/manpages/collectd.conf.5.shtml#threshold_configuration

...

SNMP interface in collectd provides access to collected metrics using SNMP Agent plugin. This plugin is an AgentX subagent that receives and handles queries from SNMP master agent and returns the metrics collected by "read" (collector) plugins. The plugin handles requests only for OIDs specified in configuration file. To handle SNMP queries the plugin gets data from collectd and translates requested values from collectd's internal format to SNMP format. This plugin is a generic plugin and cannot work without configuration. For more details on configuration file see <https://github.com/collectd/collectd/pull/2105/files#diff-9fc6980794a396e7288e1bd17c59a358>

perf

perf_cache_references

the count of cache hits by applications running on the platform

Versions Compared

Old Version 75

New Version Current

Key

Distinction between metrics and events

Distinction between metrics and events

Notifications

Events