Data
Failure Type | Failure parameter | Failure Event | Infrastructure Metrics | Comments | ||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Links | Link Down. Link removed | Virtual Switch link failure Reason: Hardware Failure Interface Down
(Ref: https://docs.openstack.org/ocata/config-reference/networking/logs.html) | Network interface status, High packet drop, low throughput, excessive latency or jitter crc-statistics, fabric-link-failure, link-flap, transceiver-power-low | |||||||||||||||||||||||||||||||
VM | Deployment/Start Failures:
Post-Deployment/Start failures:
| nova-compute.log nova-api.log nova-scheduler.log libvirt.log qemu/$vm.log neutron-server.log glance/cinder - flavor Node and Core-mapping | cpu: per-core utilization memory Interfaces statistics - sent, recv, drops Disk Read/Write | If possible, Infrastructure metrics and syslogs from within the VM should be collected. Deployment/Start failures can be the first step. | ||||||||||||||||||||||||||||||
Container | Deployment/Start Failures:
Post-Deployment/Start failures:
|
| cpu: per-core utilization memory Interfaces statistics - sent, recv, drops Disk Read/Write | |||||||||||||||||||||||||||||||
Node | A node failure (hardware failure, OS crash, etc) A) node network connectivity failure B) nova service failure C) Failure of other OpenStack services | /var/log/nova/nova-compute.log
(Ref: https://docs.openstack.org/operations-guide/ops-logging.html) A) node network connectivity failure
B) nova service failure (e.g., process crashed) -- detected and restarted by a local watchdog process
C) Failure of other OpenStack services -- N/A, assuming redundant/highly available configuration
| Interfaces statistics - sent, recv, drops Hypervisor Metrics, Nova Server Metrics, Tenant Metrics, Message Queue Metrics Keystone and Glance Metrics | |||||||||||||||||||||||||||||||
Application | Crash/Connectivity/Non-Functional | Application Log i.e. If it is Apache then logs of Apache (/var/log/apache2) | Packet Drops, Latency, Throughput, Saturation, Resource Usage | Deploy Collectd within the application and collect both application logs and infrastructure metrics | ||||||||||||||||||||||||||||||
Middleware Services |
...