...
There are numerous opensource monitoring solutions available, with varying approaches and architectures. In this study, we compare only the 'agent' component of the monitoring solution, and will not consider the server-side component(s). Because, there can be multiple implementation options of the 'server' - for example, with collectd, it could be simple collectd-web or a timeseries database such as Influxdb, telemetry system based on Apache Kafka, etc. - and considering all the options would be extremely difficult. Typically the server side components could include some or all of the following (a) Metric collection infrastructure - raw-metric receiver, message-queues, etc. (b) Metric Modifier - add contexts, perform-aggregation, filter, etc. (c) Storage solution (d) Alarm/Alerting System (e) Visualization/Graphing. (f) Publishing.
Terminology Definition
Term | What we mean by that? |
Metric | A Measurement of a particular characteristic. Ex: %ge of CPU used, Amount of Bandwidth used, etc. Complete definition can be found here |
Event | A record of something that has happened - A simple immutable fact. Example: Link has gone down. A packet from a flow is dropped, etc. Complete definition can be found here |
Agent | Software that runs on a node/system that needs to be monitored. |
Client Node | A node that is monitored (Node on which agent runs) |
Server Node | A node that collects metrics and events from the client node. |
Sampling Interval | How frequently the metrics are sent. |
Push Mode | Fetching of events by subscribing |
Poll Mode | Fetching of events via polling. |
Writing of Metrics/events | sending/outputting of metrics or events. |
Reading of Metrics/events | receiving/reading of measrements |
Logging of Metrics/events | Logging of monitored/received metric or event |
Metric Types (data source types) | Guage: Value stored as-is |
...
Parameters\Tools | Collectd | Ceilometer Polling agent. | Monasca | SNAP | node-exporter and other exporters | sensu | munin | telegraf | nagios | diamond | centreon | icinga | OpenNMS | NSClient++ | Elastic Beats | Reimann | Note: 1. For some parameters the answer could be just YES/NO, 2. Whereas, for some we may have to provide a description/details 3. For some we may have to choose from the list [], whereas for some we may append a value to the list. 4. For some parameters, please provide the number of 'actual metrics' provided under that category. For example, collectd would provide 12 metrics for Processes-category Use NA - If Not applicable. Use NK - If it is Not Known | |
Lowest Sampling Interval - (for transmitting over network) | can go down to a nano second resolution (1-sec) | |||||||||||||||||
CPU metrics | idle, system, wait, stolen, user (% & time), util, vcpus | idle, system, wait, stolen, user (% & time), util, vcpus | idle, system, wait, stolen, user (% & time) | idle, system, wait, stolen, user (% & time), util, vcpus | Same as ceilometer or monasca | idle, system, wait, user, nice | ||||||||||||
Disk IO metrics | Read and write (bytes, rate, time, sectors) | read and write (bytes, rate, req) | read and write (bytes, rate, req) | read and write (bytes, rate, req) | Same as ceilometer or monasca | read and write (bytes, rate, req) | ||||||||||||
Memory metrics | usage, bandwidth | free, swap, total, used | free, swap, total, used | Same as ceilometer or monasca | free, total, swap, active, dirty, inactive, buffers. | |||||||||||||
Process metrics | I/O, Schec, Statsmemory, CPU-Usage, count. | NO | NO | Same as collectd. | btime, ctxt, processes, blocked, running | |||||||||||||
Network Interface Metrics | Interface plugin: Standard 4 fields of rx/tx (octets, packets, errors, dropped). Netlink plugin: uses netlink sockets and covers others | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Standard 4 fields of rx/tx (octets, packets, errors, dropped). | Same as ceilometer or monasca | Rx and Tx. MBs | ||||||||||||
Libvirt Metrics | YES - | YES | YES | YES | NO | YES | ||||||||||||
Container resource usage Monitoring | YES | NO | NO | Docker | Docker | Docker | ||||||||||||
Databases Support - Writing to and Monitoring : [Influxdb, MongoDb, MySql, PostgreSql, Carbon(graphite), Prometheus, RRDCache,Redis, TSDB] | YES for all | MySql, PostgreSql, MongoDb - monitoring | Influxdb, Vertica, MySql, PostgreSql, Cassandra - monitoring | Monitoring only | Writing - Influxdb Monitoring - All. | Monitoring - All | ||||||||||||
Encryption Support | YES | NO | NO | NO | YES | |||||||||||||
Extensibility - multilanguage support [Python, Java, Golang, C/C++, Lua] | YES for all | Java | Java | Java, Python | ||||||||||||||
Interoperability [with other monitoring solutions] | Sensu, statsd, telegraf? | Nagios zabbix | ceilometer | Collectd | Nagios | |||||||||||||
Write to Message Queues and protocols (AMQP, Kafka, MQTT, NSQ) | YES for ALL | AMQP | Kafka | NO | kafka, MQTT, NSQ | |||||||||||||
Metrics Pub/sub Mode Support | YES | YES | YES | |||||||||||||||
Metrics Req/Resp Mode Support | NO | NO | NO | |||||||||||||||
Support for Events (polling, Pushing) | Yes | NO (1) | NO (1) | |||||||||||||||
Notification Support | YES | NO (1) | NO (1) | NO (1) | ||||||||||||||
Logging Support | YES | YES | YES | YES | ||||||||||||||
Hypervisor metrics | YES | YES | ||||||||||||||||
Log-File Analysis | YES | NO | NO | |||||||||||||||
Other Writing Support: [CSV, HTTP, RRD, UnixSocket] | ALL that are listed. | |||||||||||||||||
Transport Protocol | Depends on the end point it's communicating with | TCP, UDP | ||||||||||||||||
Data-Format [XML, JSON, etc] | JSON, Custom, XML | JSON XML | JSON | Custom | ||||||||||||||
Data-model | Custom | KVP | KVP | Custom | ||||||||||||||
Hardware: IPMI, Battery, Sensors, | YES for all | IPMI | IPMI | |||||||||||||||
Metric Types: Guage, Derive, Counter, absolute | YES for all | Guage cumulative delta | ||||||||||||||||
Language (written) | C | Python | Python | Go | ||||||||||||||
Last-Updated | 2017 | 2017 | 2017 | |||||||||||||||
Commercial Versions? | NO | ? | No | |||||||||||||||
Resource consumption by the agent | Binary: 617Kb
| |||||||||||||||||
License | MIT/GPL v2 or later | Apache License, Version 2.0 | Apache License, Version 2.0 | |||||||||||||||
Webserver monitoring [Nginix, Apache] | YES for all | Apache | Apache | Nginix, Apache, Passenger varnish | ||||||||||||||
Platforms - OS? | Supports windows, linux, freebsd... | Linux | Linux | |||||||||||||||
Configuration Tool support [Puppet, Chef, Ansible, Salt] | YES for all | Puppet Chef | ||||||||||||||||
Other Services Support | statsd, webhooks |
...