The tables and lists of questions have been created by Sridhar Rao <Sridhar.Rao@spirent.com>
Terminology Definition
Term | What we mean by that? |
Metric | A Measurement of a particular characteristic. Ex: %ge of CPU used, Amount of Bandwidth used, etc. Complete definition can be found here |
Event | A record of something that has happened - A simple immutable fact. Example: Link has gone down. A packet from a flow is dropped, etc. Complete definition can be found here |
Agent | Software that runs on a node/system that needs to be monitored. |
Client Node | A node that is monitored (Node on which agent runs) |
Server Node | A node that collects metrics and events from the client node. |
Sampling Interval | How frequently the metrics are sent. |
Push Mode | Fetching of events by subscribing |
Poll Mode | Fetching of events via polling. |
Writing of Metrics/events | sending/outputting of metrics or events. |
Reading of Metrics/events | receiving/reading of measrements |
Logging of Metrics/events | Logging of monitored/received metric or event |
Metric Types (data source types) | Guage: Value stored as-is |
Parameter Table
Parameters\Tools | Collectd | Ceilometer | Monasca | statsd | node-exporter | sensu | munin | telegraf | nagios | diamond | centreon | icinga | OpenNMS | NSClient++ | Elastic Beats | Reimann | Note: 1. For some parameters the answer could be just YES/NO, 2. Whereas, for some we may have to provide a description/details 3. For some we may have to choose from the list [], whereas for some we may append a value to the list. 4. For some parameters, please provide the number of 'actual metrics' provided under that category. For example, collectd would provide 12 metrics for Processes-category Use NA - If Not applicable. Use NK - If it is Not Known | |
Lowest Sampling Interval | 1 sec | |||||||||||||||||
CPU metrics | idle, system, wait, stolen, user (% & time), util, vcpus | idle, system, wait, stolen, user (% & time) | Same as ceilometer or monasca | |||||||||||||||
Disk IO metrics | read and write (bytes, rate, req) | read and write (bytes, rate, req) | Same as ceilometer or monasca | |||||||||||||||
Memory metrics | usage, bandwidth | free, swap, total, used | Same as ceilometer or monasca | |||||||||||||||
Process metrics | Same as ceilometer or monasca | |||||||||||||||||
Network Interface Metrics | Interface plugin: Standard 4 fields of rx/tx (octets, packets, errors, dropped). Netlink plugin: uses netlink sockets and covers others | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Same as ceilometer or monasca | ||||||||||||||
Libvirt Metrics | YES - | NO | ||||||||||||||||
Container resource usage Monitoring | YES | NO | NO | Docker | ||||||||||||||
Writing to Databases Support: [Influxdb, MongoDb, MySql, PostgreSql, Carbon(graphite), Prometheus, RRDCache,Redis, TSDB] | YES for all | MySql, PostgreSql, MongoDb | Influxdb, Vertica, MySql, PostgreSql, Cassandra | Yes for all. | ||||||||||||||
Encryption Support | YES | NO | NO | |||||||||||||||
Extensibility - multilanguage support [Python, Java, Golang, C/C++, Lua] | YES for all | Java | Java | |||||||||||||||
Interoperability [with other monitoring solutions] | Sensu, statsd, telegraf? | Nagios zabbix | ceilometer | |||||||||||||||
Write to Message Queues and protocols (AMQP, Kafka, MQTT, NSQ) | YES for ALL | AMQP | Kafka | kafka, MQTT, NSQ | ||||||||||||||
Metrics Pub/sub Mode Support | YES | YES | YES | |||||||||||||||
Metrics Res/Resp Mode Support | ||||||||||||||||||
Support for Events (polling, Pushing) | ||||||||||||||||||
Notification Support | YES | YES | YES | |||||||||||||||
Logging Support | YES | YES | YES | |||||||||||||||
Hypervisor metrics | YES | |||||||||||||||||
Log-File Analysis | YES | NO | NO | |||||||||||||||
Other Writing Support: [CSV, HTTP, RRD, UnixSocket] | ||||||||||||||||||
Transport Protocol | ||||||||||||||||||
Data-Format [XML, JSON, etc] | JSON, Custom, XML | JSON XML | JSON | |||||||||||||||
Data-model | Custom | KVP | KVP | |||||||||||||||
Hardware: IPMI, Battery, Sensors, | YES for all | IPMI | IPMI | |||||||||||||||
Metric Types: Guage, Derive, Counter, absolute | YES for all | Guage cumulative delta | ||||||||||||||||
Language (written) | C | Python | Python | Go | ||||||||||||||
Last-Updated | 2017 | 2017 | ||||||||||||||||
Commercial Versions? | NO | ? | No | |||||||||||||||
Resource consumption by the agent | ||||||||||||||||||
License | Apache License, Version 2.0 | Apache License, Version 2.0 | ||||||||||||||||
Webserver monitoring [Nginix, Apache] | YES for all | Apache | Apache | |||||||||||||||
Platforms - OS? | Linux | Linux | ||||||||||||||||
Configuration Tool support [Puppet, Chef, Ansible, Salt] | YES for all | Puppet Chef | ||||||||||||||||
Server-mode support? | YES | |||||||||||||||||
Other Services Support | statsd, webhooks |
Inference Questions
The Questions | The Answer |
Lowest Interval: Which agent supports the lowest sampling interval, and what is the value? | |
Interoperability: Which agent is 'most interoperable'? (Work with maximum of 'servers' (collection node) | |
Large-scale deployment: Which agent is ideal for large-scale monitoring (Provide description in a separate page, if needed) | |
Low-footprint: Which agent has the lowest footprint (memory and CPU)? | |
Metrics: Which agent supports maximum number of metrics? | |
Gaps: Are there any metrics that are not supported by any of the agent and that are relavant to NFV? | |
Which agent is ideal for realtime analytics?- [Support for maximum scalable datastores, visualization tools and Analytics engines?] | |
Is any of the agents been used in large-scale real-world deployments? If so, please provide the details on the performance. | |
Which agent has the least/maximum dependency - Libraries, OS/Kernel versions, etc.? | |
Which agent provides maximum 'freedom' w.r.t. Licenses (core agent + plugins)? | |
Which agent is best for the following datastores: Influxdb, Graphite, ElasticSearch? | |
Which agent support dynamic configuration? | |