2021-07-16 AI/ML for NFV Meeting Minutes

Anuket Project

2021-07-16 AI/ML for NFV Meeting Minutes

 

Attendees

@Sridhar Rao

@Rohit Singh Rathaur

@Girish

@Emma Foley

 

Sl. No.

Topic

Presenter

Notes

Sl. No.

Topic

Presenter

Notes

1

Thoth as formal project in Anuket

 

Will be presented in next TSC meeting - for project creation request. Mainly to capture all the works done here.

Mail has been sent for any Interest in leading this project.

It is OK in Anuket for a person to be PTL in more than one project.

2

EUAG White-Paper

 

https://wiki.lfnetworking.org/pages/viewpage.action?pageId=56067017

  1. Key Takeaways: Emphasis is too much on Data - Data First. I have these data, what can I do, approach.  We should come up the "Decisions" that Telcos are interested.

  2. Problem Statement section include assumption

    1. "it is not a competitive advantage to have access to better AI tools" - Tools are not important, its the models, and Models definitely can give competitive advantage.

  3. Need to be careful about how to share data and results across competing companies - Sharing with Opensource projects may be easier than "competing companies"

    1. "Opensource Licensing Model" for Data - Ex: Whatever model built using(learning) the data, that should be OPEN tool. 

Where Thoth can contribute to EUAG:

  1. Data Model - For each of the problem statement ("Decision"), we can propose a data model.

3

Summary of "Gaps" in Existing Failure prediction works

@Rohit Singh Rathaur

@Girish

https://drive.google.com/file/d/1FdhT4d8QHQR7OqfXhx3UqviGDYW5nfBQ/view?usp=sharing

Summarize and share your comments here: Failure Prediction using AI/ML in NFV Environments

4

Model Enhancement  - Options

@Rohit Singh Rathaur

@Girish

Both girish and rohit have shared the proposals.

Action: Sridhar to review.

Ex: VM Prediction:

  1. VM's Failure Event + Infrastructure (platform) + VM-specific (virtual-Infrastructure) metrics that external to VM.  - Sources are different.

  2. VM's Failure Event + Resource-Consumption (Application) metrics that is internal to VM - Sources are same.

Hypothesis: Cadvisor (CMN) metrics  = Collectd (CMN) metrics from Container.

5

Data Status

 

  1. EUAG Meeting is yet to happen

  2. Request to LF-IT is sent - waiting for response.

  3. Work with Pod18 - not yet started. Barometer include Ansible playbooks to deploy collectd+ on K8S.

    1. https://github.com/opnfv/barometer/blob/master/docs/release/userguide/installguide.docker.rst

    2. https://github.com/opnfv/barometer/blob/master/docs/release/userguide/installguide.oneclick.rst

6

Project-1 (AlgoSelector)

 

Still looking for contributors.

7

Project-2 (FailureGen)

 

Found a contributor. Already started the work.

  1. Time-Varying, Load-Varying Stressng *

  2. Enlisting actions in Linux System that can cause failures 

8

Failure Prediction Definition - Status

(mapping Failures to Data)

@Rohit Singh Rathaur

@Girish

Rohit: Node and VM
Girish: Container and app

https://docs.google.com/spreadsheets/d/1N9LKZjx117zQHJSLcCFK8dwiOpswWyhZECaNNS6NKHo/edit?usp=sharing

Update this page, if any change is required for the data model: Failure Prediction using AI/ML in NFV Environments