Anuket Project

Cloud Platform Validation Spec

Notes for contributors:

  • This doc is under development and open for all inputs.
  • This docs tries to define and cover all checks that can be run on a cloud platform to validate its state/health.
  • Validation checks targets to ensures that all cloud software components are healthy/configured as described in PDF as desired cloud state.
  • All validation checks combined can be considered as a kind of ping utility for a cloud platform.
  • Functional tests or Performance Benchmarks are out of scope for this list.
  • SDV(Sub Project of CIRV) implements or will implement all checks defined by this doc.

Table of Contents

RA1 validation checks

Currently defined Cloud Deployment Types:

  • OOK - OpenStack on Kubernetes
  • OOO - OpenStack on OpenStack
  • OAC - OpenStack as Containers  (without Kubernetes)
  • OAV - OpenStack as VMs


Test Suites are used to group multiple checks.


Sl.No.Test SuiteTest/Check NameCloud Deployment TypesDescription/DetailsNotes/References
1.platformpod_health_checkOOK, OAC

Checks health of all overcloud components running as pod in Kubernetes cluster(in OOK deployment).


Pass: All components are healthy.

Fail: One or more components are unhealthy.


2.storageceph_health_checkOOK, OAC, OOO, OAVChecks health of all components of ceph cluster configured for OpenStack.
3.observabilityprometheus_checkOOK

Check health endpoints(https: "/-/healthy") and readiness endpoint("/-/ready") of prometheus.


Pass: On pass of both healthy and ready check.

Fail: If readiness of healthy not true.


4.observabilityprometheus_alert_manager_checkOOKCheck whether Alert Manager is ok by sending https request to "/-/healthy" and "/-/ready" endpoints of the alert manager.
5.observabilitygrafana_checkOOKChecks whether Grafana is healthy by sending request at /api/health endpoint.
6.observabilityelasticsearch_checkOOKChecks health of elasticsearch cluster by sending https request at "/_cluster/health" endpoint of the Elasticsearch cluster.
7.observabilitykibana_checkOOKKibana Dashboard health check, checks health using status at "/api/status" endpoint.
8.observabilitynagios_checkOOKCheck whether Nagios api is reachable and gives https_OK
9.observabilityelasticsearch_exporter_checkOOKCheck whether elasticsearch exporter is exporting prometheus metrics at "/metrics"
10.observabilityfluentd_exporter_checkOOKCheck whether fluentd exporter is exporting prometheus metrics at "/metrics"
11.networkphysical_network_checkOOK, OAC, OOO, OAVChecks network mappings in ml2.conf against PDF.
12.computereserved_vnf_cores_checkOOK, OAC, OOO, OAVChecks vcpu_pin_set configurations in nova against the required PDF value for reserved cores.
13.computeisolated_cores_checkOOK, OAC, OOO, OAVchecks isolcpus configuration against required value in PDF.
14.networkvswitch_pmd_cores_checkOOK, OAC, OOO, OAVEvaluates pmd-cpu-mask in vswitch against required cores in PDF.
15.networkvswitch_dpdk_lcores_checkOOK, OAC, OOO, OAVEvaluates dpdk-lcore-mask in vswitch against required cores in PDF.
16.computeos_reserved_cores_checkOOK, OAC, OOO, OAV

Calculates os_reserved_cores using formula:

os_reserved_cores = all_cores - (reserved_vnf_cores + vswitch_pmd_cores + vswitch_dpdk_lcores)

and compares against required os_reserved cores in PDF.



17.computenova_scheduler_filters_checkOOK, OAC, OOO, OAV

18.computecpu_allocation_ratio_checkOOK, OAC, OOO, OAV

19.platformapi_version_checkOOK, OAC, OOO, OAV

20.networkmtu_checkOOK, OAC, OOO, OAV

21.platformntp_checkOOK, OAC, OOO, OAV

22.networksriov_vfs_checkOOK, OAC, OOO, OAV

23.securitypod_linux_capabilities_allowed_checkOOK

24.securityprevileged_pod_allowed_checkOOK

25.securitypod_host_volume_mount_checkOOK

26.securitypod_host_network_checkOOK

27.securitymgmt_api_access_checkOOK

28.computecpu_manager_policy_checkOOK

Checks whether the actual cpu-manager-policy in the config of Kubelet on each node is same as the desired cpu-manager-policy defined in PDF.

Also, if the policy is set to static then check further that desired cpu-manager-reconcile-period and full-pcpus-only in PDF matches the actual configurations in Kubelet.

For better CPU affinity Telcos may be interested to configure cpu-manager-policy as static with required options.

https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/

29.computetopology_manager_policy_checkOOK

Check-a: Checks whether topology-manager-scope(pod or container) set on kubelets on all nodes are same as required by PDF.

Check-b: Checks whether topology-manager-policy(none, best-efforts, restricted or single-numa-node) set on kubelets on all nodes are same as required by PDF.

This check is considered passed if both Check-a and Check-b pass.

To best utilize NUMA architecture for better performance Teleco may be interested to configure topology-manager-policy in kubelets.

https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/

30.networkcni_checkOOK

31.platformdevice_plugin_checkOOK

32.
service_mesh_checkOOK

33.
ingress_egress_checkOOK

34.platformkubevirt_checkOOK

35.
helm_checkOOK

36.platfromreadliness_probe_checkOOKChecks whether the readiness probe is configured for all overcloud components deployed as pods on undercloud Kubernetes.

For undercloud K8s to ensure that overcloud is ready all overcloud component should define the required readiness check for them.


Note: this check only checks whether the probes are defined rather than checking the readiness of each pod, as it is already covered by  pod_health_check

37.platformstartup_probe_checkOOK

38.platformliveliness_probe_checkOOK

Checks whether the liveliness probe is configured for all overcloud components deployed as pods on undercloud Kubernetes.

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/


Note: this check only checks whether the probes are defined rather than checking the liveliness of each pod, as it is already covered by  pod_health_check














RA2 validation checks

Sl.No.Test suiteTest/Check NameCloud Deployment TypesDescription/DetailsNotes/References
1.securitynetwork_policy_check
Checks whether default policy is used to deny all ingress and egress traffic, & unselected pods are isolated
2.securityencryption_check
Checks whether external key management systems are in use for encryption of secrets  
3.securityaccess_control_check
Checks whether role based access control (RBAC) is enabled