Anuket Project

Failure and Anomaly Data Generation

Previous approach that didn't work: Run Stress-NG as CNF, and load stressng as much as possible.


CategoryDetails
Cloud*Kubernetes-cluster
WorkloadNginx, vRouter, ...
Traffic-Generatorsiperf, netperf, wrk2, trex/prox (sriov)
Impairment-ToolLitmus

Types of Impairments


smcasey : Run experiments one at a time

POD_Level

Container Kill
Disk Fill
Pod Autoscaler
Pod CPU Hog Exec
Pod CPU Hog
Pod Delete
Pod Dns Error
Pod Dns Spoof
Pod IO Stress
Pod Memory Hog Exec
Pod Memory Hog
Pod Network Corruption
Pod Network Duplication
Pod Network Latency
Pod Network Loss
Pod Network Partition

NODE_Level

Docker Service Kill
Kubelet Service Kill
Node CPU Hog
Node Drain
Node IO Stress
Node Memory Hog
Node Restart
Node Taint

Metrics
  1. Infrastructure
  2. TGen (analyze the impact of Impairments), Litmus-data.
  3. CNF - External (kubelet)
  4. CNF - Internal (this is not feasible in many cases) *** If CN-CNF it is supposed expose metrics (/metrics) (micro-services approach)
Metrics-CollectionPrometheus
Logs/Events 

Kubernetes

System

Logs-CollectionElasticsearch
PodIntel Pod18 (3-Master Nodes and 2 Worker Nodes)
Duration14 Feb - 28th Feb 2022 (It may get delayed to Intel VPN Issues.)