Investigating issues with Cloud PBX ,Severity: major

Incident Report for NUACOM

Postmortem

Summary: One of our database clusters became overloaded, leading to an outage that impacted multiple accounts. During this period, affected users were unable to make or receive calls.

Root Cause: The outage was caused by a significant overload on one of the database cluster hosts. This overload resulted in a bottleneck that prevented the database from processing requests efficiently, disrupting user service.

Resolution: Our engineering team promptly identified the issue and re-routed all heavy queries to other hosts within the same cluster. This action successfully alleviated the load on the affected host, restoring normal operations.

Preventive Measures:

We will review and optimize query distribution across the cluster to prevent future overloads.
Additional monitoring and alerts will be implemented to detect and address similar issues more quickly.

Next Steps:

Conduct a thorough analysis of the affected queries to identify optimization opportunities.
Consider expanding the cluster capacity or implementing more granular load balancing.

We apologize for the inconvenience this outage caused and are committed to preventing similar incidents in the future.

Posted Aug 26, 2024 - 18:24 IST

Resolved

The monitoring system hasn't detected any further issues, so we're closing this incident for now.

Posted Aug 26, 2024 - 15:00 IST

Monitoring

A fix has been implemented and we are monitoring the system

Posted Aug 26, 2024 - 14:34 IST

Investigating

We are investigating an incident affecting Cloud PBX. We will provide updates via email and Statuspage shortly.

Posted Aug 26, 2024 - 14:19 IST