The tables and lists of questions have been created by Sridhar Rao <Sridhar.Rao@spirent.com>
Terminology Definition
Term | What we mean by that? |
Metric | A Measurement of a particular characteristic. Ex: %ge of CPU used, Amount of Bandwidth used, etc. Complete definition can be found here |
Event | A record of something that has happened - A simple immutable fact. Example: Link has gone down. A packet from a flow is dropped, etc. Complete definition can be found here |
Agent | Software that runs on a node/system that needs to be monitored. |
Client Node | A node that is monitored (Node on which agent runs) |
Server Node | A node that collects metrics and events from the client node. |
Sampling Interval | How frequently the metrics are sent. |
Push Mode | Fetching of events by subscribing |
Poll Mode | Fetching of events via polling. |
Writing of Metrics/events | sending/outputting of metrics or events. |
Reading of Metrics/events | receiving/reading of measrements |
Logging of Metrics/events | Logging of monitored/received metric or event |
Metric Types (data source types) | Guage: Value stored as-is |
Parameter Table
Parameters\Tools | Collectd | Ceilometer | Monasca | SNAP | node-exporter and other exporters | sensu | munin | telegraf | nagios | diamond | centreon | icinga | OpenNMS | NSClient++ | Elastic Beats | Reimann | Note: 1. For some parameters the answer could be just YES/NO, 2. Whereas, for some we may have to provide a description/details 3. For some we may have to choose from the list [], whereas for some we may append a value to the list. 4. For some parameters, please provide the number of 'actual metrics' provided under that category. For example, collectd would provide 12 metrics for Processes-category Use NA - If Not applicable. Use NK - If it is Not Known | |
Lowest Sampling Interval | can go down to a nano second resolution | |||||||||||||||||
CPU metrics | idle, system, wait, stolen, user (% & time), util, vcpus | idle, system, wait, stolen, user (% & time), util, vcpus | idle, system, wait, stolen, user (% & time) | idle, system, wait, stolen, user (% & time), util, vcpus | Same as ceilometer or monasca | idle, system, wait, user, nice | ||||||||||||
Disk IO metrics | read and write (bytes, rate, req) | read and write (bytes, rate, req) | read and write (bytes, rate, req) | Same as ceilometer or monasca | read and write (bytes, rate, req) | |||||||||||||
Memory metrics | usage, bandwidth | free, swap, total, used | free, swap, total, used | Same as ceilometer or monasca | free, total, swap, active, dirty, inactive, buffers. | |||||||||||||
Process metrics | IO, SCHED, STATS | btime, ctxt, processes, blocked, running | ||||||||||||||||
Network Interface Metrics | Interface plugin: Standard 4 fields of rx/tx (octets, packets, errors, dropped). Netlink plugin: uses netlink sockets and covers others | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Same as ceilometer or monasca | Rx and Tx. MBs | ||||||||||||
Libvirt Metrics | YES - | YES | NO | YES | ||||||||||||||
Container resource usage Monitoring | YES | NO | NO | Docker | Docker | Docker | ||||||||||||
Databases Support Writing to and Monitoring : [Influxdb, MongoDb, MySql, PostgreSql, Carbon(graphite), Prometheus, RRDCache,Redis, TSDB] | YES for all | MySql, PostgreSql, MongoDb - monitoring | Influxdb, Vertica, MySql, PostgreSql, Cassandra - monitoring | Monitoring only | Writing - Influxdb Monitoring - All. | Monitoring - All | ||||||||||||
Encryption Support | YES | NO | NO | NO | YES | |||||||||||||
Extensibility - multilanguage support [Python, Java, Golang, C/C++, Lua] | YES for all | Java | Java | Java, Python | ||||||||||||||
Interoperability [with other monitoring solutions] | Sensu, statsd, telegraf? | Nagios zabbix | ceilometer | Collectd | Nagios | |||||||||||||
Write to Message Queues and protocols (AMQP, Kafka, MQTT, NSQ) | YES for ALL | AMQP | Kafka | NO | kafka, MQTT, NSQ | |||||||||||||
Metrics Pub/sub Mode Support | YES | YES | YES | |||||||||||||||
Metrics Res/Resp Mode Support | ||||||||||||||||||
Support for Events (polling, Pushing) | Yes | |||||||||||||||||
Notification Support | YES | YES | YES | NO | ||||||||||||||
Logging Support | YES | YES | YES | YES | ||||||||||||||
Hypervisor metrics | YES | YES | ||||||||||||||||
Log-File Analysis | YES | NO | NO | |||||||||||||||
Other Writing Support: [CSV, HTTP, RRD, UnixSocket] | ALL that are listed. | |||||||||||||||||
Transport Protocol | Depends on the end point it's communicating with | TCP, UDP | ||||||||||||||||
Data-Format [XML, JSON, etc] | JSON, Custom, XML | JSON XML | JSON | Custom | ||||||||||||||
Data-model | Custom | KVP | KVP | Custom | ||||||||||||||
Hardware: IPMI, Battery, Sensors, | YES for all | IPMI | IPMI | |||||||||||||||
Metric Types: Guage, Derive, Counter, absolute | YES for all | Guage cumulative delta | ||||||||||||||||
Language (written) | C | Python | Python | Go | ||||||||||||||
Last-Updated | 2017 | 2017 | 2017 | |||||||||||||||
Commercial Versions? | NO | ? | No | |||||||||||||||
Resource consumption by the agent | ||||||||||||||||||
License | MIT/GPL v2 or later | Apache License, Version 2.0 | Apache License, Version 2.0 | |||||||||||||||
Webserver monitoring [Nginix, Apache] | YES for all | Apache | Apache | Nginix, Apache, Passenger varnish | ||||||||||||||
Platforms - OS? | Supports windows, linux, freebsd... | Linux | Linux | |||||||||||||||
Configuration Tool support [Puppet, Chef, Ansible, Salt] | YES for all | Puppet Chef | ||||||||||||||||
Server-mode support? | YES | |||||||||||||||||
Other Services Support | statsd, webhooks |
Inference Questions
The Questions | The Answer |
Lowest Interval: Which agent supports the lowest sampling interval, and what is the value? | |
Interoperability: Which agent is 'most interoperable'? (Work with maximum of 'servers' (collection node) | |
Large-scale deployment: Which agent is ideal for large-scale monitoring (Provide description in a separate page, if needed) | |
Low-footprint: Which agent has the lowest footprint (memory and CPU)? | |
Metrics: Which agent supports maximum number of metrics? | |
Gaps: Are there any metrics that are not supported by any of the agent and that are relavant to NFV? | |
Which agent is ideal for realtime analytics?- [Support for maximum scalable datastores, visualization tools and Analytics engines?] | |
Is any of the agents been used in large-scale real-world deployments? If so, please provide the details on the performance. | |
Which agent has the least/maximum dependency - Libraries, OS/Kernel versions, etc.? | |
Which agent provides maximum 'freedom' w.r.t. Licenses (core agent + plugins)? | |
Which agent is best for the following datastores: Influxdb, Graphite, ElasticSearch? | |
Which agent support dynamic configuration? | |