Anuket Project

2021-07-16 AI/ML for NFV Meeting Minutes


Attendees

Sridhar Rao

Rohit Singh Rathaur

Girish

Emma Foley


Sl. No.TopicPresenterNotes
1Thoth as formal project in Anuket

Will be presented in next TSC meeting - for project creation request. Mainly to capture all the works done here.

Mail has been sent for any Interest in leading this project.

It is OK in Anuket for a person to be PTL in more than one project.

2EUAG White-Paper

https://wiki.lfnetworking.org/pages/viewpage.action?pageId=56067017

  1. Key Takeaways: Emphasis is too much on Data - Data First. I have these data, what can I do, approach.  We should come up the "Decisions" that Telcos are interested.
  2. Problem Statement section include assumption
    1. "it is not a competitive advantage to have access to better AI tools" - Tools are not important, its the models, and Models definitely can give competitive advantage.
  3. Need to be careful about how to share data and results across competing companies - Sharing with Opensource projects may be easier than "competing companies"
    1. "Opensource Licensing Model" for Data - Ex: Whatever model built using(learning) the data, that should be OPEN tool. 

Where Thoth can contribute to EUAG:

  1. Data Model - For each of the problem statement ("Decision"), we can propose a data model.
3Summary of "Gaps" in Existing Failure prediction works

https://drive.google.com/file/d/1FdhT4d8QHQR7OqfXhx3UqviGDYW5nfBQ/view?usp=sharing

Summarize and share your comments here: Failure Prediction using AI/ML in NFV Environments

4Model Enhancement  - Options

Both girish and rohit have shared the proposals.

Action: Sridhar to review.

Ex: VM Prediction:

  1. VM's Failure Event + Infrastructure (platform) + VM-specific (virtual-Infrastructure) metrics that external to VM.  - Sources are different.
  2. VM's Failure Event + Resource-Consumption (Application) metrics that is internal to VM - Sources are same.

Hypothesis: Cadvisor (CMN) metrics  = Collectd (CMN) metrics from Container.

5Data Status
  1. EUAG Meeting is yet to happen
  2. Request to LF-IT is sent - waiting for response.
  3. Work with Pod18 - not yet started. Barometer include Ansible playbooks to deploy collectd+ on K8S.
    1. https://github.com/opnfv/barometer/blob/master/docs/release/userguide/installguide.docker.rst
    2. https://github.com/opnfv/barometer/blob/master/docs/release/userguide/installguide.oneclick.rst
6Project-1 (AlgoSelector)
Still looking for contributors.
7Project-2 (FailureGen)

Found a contributor. Already started the work.

  1. Time-Varying, Load-Varying Stressng *
  2. Enlisting actions in Linux System that can cause failures 

8

Failure Prediction Definition - Status

(mapping Failures to Data)

  • Rohit: Node and VM
  • Girish: Container and app

https://docs.google.com/spreadsheets/d/1N9LKZjx117zQHJSLcCFK8dwiOpswWyhZECaNNS6NKHo/edit?usp=sharing

Update this page, if any change is required for the data model: Failure Prediction using AI/ML in NFV Environments