Degraded connectivity issues between Control plane and workload clusters
Incident Report for Cloudera
Postmortem

We recently implemented an upgrade to our internal proxy service, responsible for directing requests to workloads through the Cluster Connectivity Management (CCM) system. This update aimed to enhance the identification of healthy routes for traffic flow. However, an unforeseen issue impacted this functionality, causing routing through potentially unhealthy paths for selected configurations. The same was fixed by rolling back the release. 

It's important to note that this problem did not appear during testing in our lower environments, where rigorous evaluation precedes production deployments. Since the incident, we have promptly implemented additional safeguards to effectively identify and prevent similar issues from affecting production systems in the future.

We sincerely apologize for any inconvenience this service disruption may have caused. We remain committed to delivering a reliable and robust platform, and appreciate your understanding.

Posted Feb 14, 2024 - 15:23 UTC

Resolved
We observed degraded connectivity issues between control plane and workload for a subset of customers.
This impacted operations like orchestration, auto-scaling and administration of workloads from control plane.
The incident has been resolved now and no action is needed from customers.
A root cause analysis (RCA) will be published within seven business days.
Posted Jan 31, 2024 - 10:00 UTC