Anuket Project
Cloud Platform Validation Spec
Notes for contributors:
This doc is under development and open for all inputs.
This docs tries to define and cover all checks that can be run on a cloud platform to validate its state/health.
Validation checks targets to ensures that all cloud software components are healthy/configured as described in PDF as desired cloud state.
All validation checks combined can be considered as a kind of ping utility for a cloud platform.
Functional tests or Performance Benchmarks are out of scope for this list.
SDV(Sub Project of CIRV) implements or will implement all checks defined by this doc.
Table of Contents
RA1 validation checks
Currently defined Cloud Deployment Types:
OOK - OpenStack on Kubernetes
OOO - OpenStack on OpenStack
OAC - OpenStack as Containers (without Kubernetes)
OAV - OpenStack as VMs
Test Suites are used to group multiple checks.
Sl.No. | Test Suite | Test/Check Name | Cloud Deployment Types | Description/Details | Notes/References |
|---|---|---|---|---|---|
1. | platform | pod_health_check | OOK, OAC | Checks health of all overcloud components running as pod in Kubernetes cluster(in OOK deployment). Pass: All components are healthy. Fail: One or more components are unhealthy. | |
2. | storage | ceph_health_check | OOK, OAC, OOO, OAV | Checks health of all components of ceph cluster configured for OpenStack. | |
3. | observability | prometheus_check | OOK | Check health endpoints(https: "/-/healthy") and readiness endpoint("/-/ready") of prometheus. Pass: On pass of both healthy and ready check. Fail: If readiness of healthy not true. | |
4. | observability | prometheus_alert_manager_check | OOK | Check whether Alert Manager is ok by sending https request to "/-/healthy" and "/-/ready" endpoints of the alert manager. | |
5. | observability | grafana_check | OOK | Checks whether Grafana is healthy by sending request at /api/health endpoint. | |
6. | observability | elasticsearch_check | OOK | Checks health of elasticsearch cluster by sending https request at "/_cluster/health" endpoint of the Elasticsearch cluster. | |
7. | observability | kibana_check | OOK | Kibana Dashboard health check, checks health using status at "/api/status" endpoint. | |
8. | observability | nagios_check | OOK | Check whether Nagios api is reachable and gives https_OK | |
9. | observability | elasticsearch_exporter_check | OOK | Check whether elasticsearch exporter is exporting prometheus metrics at "/metrics" | |
10. | observability | fluentd_exporter_check | OOK | Check whether fluentd exporter is exporting prometheus metrics at "/metrics" | |
11. | network | physical_network_check | OOK, OAC, OOO, OAV | Checks network mappings in ml2.conf against PDF. | |
12. | compute | reserved_vnf_cores_check | OOK, OAC, OOO, OAV | Checks vcpu_pin_set configurations in nova against the required PDF value for reserved cores. | |
13. | compute | isolated_cores_check | OOK, OAC, OOO, OAV | checks isolcpus configuration against required value in PDF. | |
14. | network | vswitch_pmd_cores_check | OOK, OAC, OOO, OAV | Evaluates pmd-cpu-mask in vswitch against required cores in PDF. | |
15. | network | vswitch_dpdk_lcores_check | OOK, OAC, OOO, OAV | Evaluates dpdk-lcore-mask in vswitch against required cores in PDF. | |
16. | compute | os_reserved_cores_check | OOK, OAC, OOO, OAV | Calculates os_reserved_cores using formula: os_reserved_cores = all_cores - (reserved_vnf_cores + vswitch_pmd_cores + vswitch_dpdk_lcores) and compares against required os_reserved cores in PDF. | |
17. | compute | nova_scheduler_filters_check | OOK, OAC, OOO, OAV | ||
18. | compute | cpu_allocation_ratio_check | OOK, OAC, OOO, OAV | ||
19. | platform | api_version_check | OOK, OAC, OOO, OAV | ||
20. | network | mtu_check | OOK, OAC, OOO, OAV | ||
21. | platform | ntp_check | OOK, OAC, OOO, OAV | ||
22. | network | sriov_vfs_check | OOK, OAC, OOO, OAV | ||
23. | security | pod_linux_capabilities_allowed_check | OOK | ||
24. | security | previleged_pod_allowed_check | OOK | ||
25. | security | pod_host_volume_mount_check | OOK | ||
26. | security | pod_host_network_check | OOK | ||
27. | security | mgmt_api_access_check | OOK | ||
28. | compute | cpu_manager_policy_check | OOK | Checks whether the actual Also, if the policy is set to static then check further that desired | For better CPU affinity Telcos may be interested to configure https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/ |
29. | compute | topology_manager_policy_check | OOK | Check-a: Checks whether Check-b: Checks whether This check is considered passed if both Check-a and Check-b pass. | To best utilize NUMA architecture for better performance Teleco may be interested to configure https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/ |
30. | network | cni_check | OOK | ||
31. | platform | device_plugin_check | OOK | ||
32. | service_mesh_check | OOK | |||
33. | ingress_egress_check | OOK | |||
34. | platform | kubevirt_check | OOK | ||
35. | helm_check | OOK | |||
36. | platfrom | readliness_probe_check | OOK | Checks whether the readiness probe is configured for all overcloud components deployed as pods on undercloud Kubernetes. | For undercloud K8s to ensure that overcloud is ready all overcloud component should define the required readiness check for them. Note: this check only checks whether the probes are defined rather than checking the readiness of each pod, as it is already covered by |
37. | platform | startup_probe_check | OOK | ||
38. | platform | liveliness_probe_check | OOK | Checks whether the liveliness probe is configured for all overcloud components deployed as pods on undercloud Kubernetes. | Note: this check only checks whether the probes are defined rather than checking the liveliness of each pod, as it is already covered by |
RA2 validation checks
Sl.No. | Test suite | Test/Check Name | Cloud Deployment Types | Description/Details | Notes/References |
|---|---|---|---|---|---|
1. | security | network_policy_check | Checks whether default policy is used to deny all ingress and egress traffic, & unselected pods are isolated | ||
2. | security | encryption_check | Checks whether external key management systems are in use for encryption of secrets | ||
3. | security | access_control_check | Checks whether role based access control (RBAC) is enabled | ||