Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This wiki will hold the minutes of discussion topics from the Barometer Weekly Call

DMA Project Proposol

View file
nameDMA Local-agent Introduction.pdf
height250

Service Assurance Project <April 11, 2018>

View file
nameService Assurance MVP v1.4.pdf
height250

This presentation outlines the Service Assurance Project that is currently in development.  The SA project will showcase the Barometer container and

make the container(s) available.



Providing Sufficient Measurement Context with Results <Jan 30, 2018>

The Barometer Project considers ETSI GS NFV-TST 008 V2.4.1 (2018-01) to define some of the key metrics, and their required Measurement Context.

Measurement Context includes measurement time stamps, measurement scope, and variable parameters (or input factors) required to understand the measurements.

During development of the complimentary ETSI Performance Management Spec (IFA027), many gaps in were identified, but agreement was reached to add text to specify the Measurement Context.

This slide below illustrates the Measurement Context communicated along with measurements, and difference between Measurement Timestamps, Collection Timestamps, and Reporting Timestamps.

View file
nameSummary Measurement-Collection-PMreporting.pdf
height400

It now remains to conduct a Gap analysis on the relevant Barometer Metrics, to ensure the that Measurement Context is available with the collectd results.

There are three related JIRA Tickets for the Gaps: 

Jira Legacy
serverSystem Jira
serverId1afe526e-48e5-33b1-8ed7-4f559eac1ef8
keyBAROMETER-61
Jira Legacy
serverSystem Jira
serverId1afe526e-48e5-33b1-8ed7-4f559eac1ef8
keyBAROMETER-89
Jira Legacy
serverSystem Jira
serverId1afe526e-48e5-33b1-8ed7-4f559eac1ef8
keyBAROMETER-90

 

DMA Project 

  • Distribute some monitoring and analysis capabilities to the edge
  • Allow faster polling rates locally without creating a bottleneck for transfer of large amounts of data to a central site.
  • Allows fast remediation of node-local events
  • Project is looking for an upstream community 
  • Would Barometer be a good fit?

 

View file
nameOpenStackSummitSydney_r6 (1).pdf
height250

 

Discussion topics for the “ideal” monitoring agent

  • Polling vs Event capture for the monitoring agent
  • Platform independent monitor agent
  • Network Interfaces
  • Kernel events
  • VM / Container monitoring
  • Common bus for Events / Telemetry / Config
  • Common Object model
  •  Agent configuration
  • Performance 
  • <<50ms and other timing requirements

 

Decisions 

 

Polling vs Event capture for the monitoring agent <Feb 07 2017>

 

   The scope of polling being discussed is that of the monitoring agent itself (on the node that’s being observed).    Both should be supported – Collectd is configured to run at a particular interval by default every 10 seconds. the question is, do you leave the read plugins poll for stats and events every time the read interval fires?

A. Both polling and event driven updates should be supported --> it depends on the subsystem you are monitoring, default would be to leverage event based systems where they exist, but polling should be supported as a configuration option that can be selected by the end user.

 

If we consider the scope of the VIM to the monitoring Agent and whether within this context, we should support polling /event driven updates?

Fault events should always use a push model, and the mechanism over which events are sent needs to be reliable.

Telemetry, can be polled or pushed (could be polled to spread the load on the collection side).

Network (over)load should be taken into consideration as regards which model to use (push  vs pull), you don't want to destabilize the network. push is more scalable overall and preferred for fault management.

 

Agent configuration  <Feb 14 2017>

Should be able to dynamically:

* Enable/disable/or restart resource monitoring

* Get values/notifications

* Get capabilities

* Get the list of metrics being collected

* flush the list of metrics

* Set thresholds for resources

* blacklist resources

* support some sort of buffering mechanism, and should be able to configure

* get the timing information for the agent and do aTiming sync if required.