Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Antitrust policies

...

Thoth: AI/ML for NFV Usecases (An Incubation Project) - Sridhar Rao

Domain: Failure Prediction. (for works in this domain, and other domains, please refer to the survey).

Failures of: VMs, Containers/Pods, Nodes, Application* (access), Network-Services.

For each of these failure - a separate model is required.

Existing works - Only VM failures are studied very well. All these works consider Infrastructure data.

Data Required:

     * Time Series

    * Failure Events

    * Infrastructure data from 'n'-Days before the failure occured till the time the failure happened.  Infrastructure Data: CPU, Memory, Interfaces, Storage, H/W, VNF-specific resource consumption.

How failure is defined:

  1. Nodes - Shutdown/reset
  2. VM - Shutdown/reset
  3. Containers - Shutdown/reset.
  4. Application - Access.
  5. Network-Services - Access.

Access to the Data:

  1. Current sources - https://docs.google.com/spreadsheets/d/1QgxlPj8siTLc0ZAggPf1l-GoATqqqOij3GiracwQ3oQ/edit#gid=0
  2. Collaboration -  EUAG, OpenInfraLabs - Telemetry WG, Other Researchers.
  3. Generate from Testbed - Academic Openstack Testbed, Kubernetes Kuberef-RI2 Testbed (pod18 Intel) – Chaos Engineering + Barometer Collectd - Create Data.
  4. Generate Synthetic Data - GANs.

Ongoing Efforts

  1. Enlist the operations that are done from VM/Container, and that can make the VM/Container fail – this is to emulate failure event. Time-based, configuration-changing, stress-ng  (supports multiple Dimensions).
  2. AlgoSelector – Series of Qs that are asked to user about the data and the problem - The tool will suggest the Algorithm to use in in one of (Supervised, Unsupervised, Reinforcement).

Artifacts:

  1. Models - Framework Independent (Jupyter Notebooks), Framework Specific - python files 
  2. Tools - Python Files.
  3.  Dataset - Kaggle

Network automation project - Sridhar Rao , Jie Niu

...