This wiki will hold the minutes of discussion topics from the Barometer Weekly Call
Discussion topics for the “ideal” monitoring agent
- Polling vs Event capture for the monitoring agent
- Platform independent monitor agent
- Network Interfaces
- Kernel events
- VM / Container monitoring
- Common bus for Events / Telemetry / Config
- Common Object model
- Agent configuration
- Performance
- <<50ms and other timing requirements
Decisions
Polling vs Event capture for the monitoring agent <Feb 07 2017>
The scope of polling being discussed is that of the monitoring agent itself (on the node that’s being observed). Collectd is configured to run at a particular interval by default every 10 seconds. the question is, do you leave the read plugins poll for stats and events every time the read interval fires?
A. Both polling and event driven updates should be supported --> it depends on the subsystem you are monitoring, default would be to leverage event based systems where they exist, but polling should be supported as a configuration option that can be selected by the end user.
If we consider the scope of the VIM to the monitoring Agent and whether within this context, we should support polling /event driven updates?
Fault events should always use a push model, and the mechanism over which events are sent needs to be reliable.
Telemetry, can be polled or pushed (could be polled to spread the load on the collection side).
Network (over)load should be taken into consideration as regards which model to use (push vs pull), you don't want to destabilize the network. push is more scalable overall and preferred for fault management.