Private cluster deployments on our EU control plane were impacted by an issue stemming from our User Management Service (UMS). UMS is responsible for authenticating all calls to the Control Plane. The service did not autoscale as expected to meet the increased traffic on the EU control plane and was eventually overloaded. This caused delayed responses to authentication calls which in turn caused a breakdown in our network connectivity (CCM) from Control Plane to private workloads.
Existing workloads were not impacted however any new private deployments or operations like update/upgrade/delete were impacted by this.
We have implemented corrective measures within the automation to prevent similar occurrences in the future.
We sincerely apologize for any inconvenience this service disruption may have caused. We remain committed to delivering a reliable and robust platform and appreciate your understanding.
Posted Mar 11, 2024 - 13:35 UTC
Resolved
Current Status: Our teams have confirmed that the solution implemented has addressed the issue. If you continue to experience issues please raise a support case with us. A root cause analysis (RCA) will be published within seven business days.
Posted Feb 28, 2024 - 15:09 UTC
Update
We have made additional progress in isolating the issue today. Please expect further updates by tomorrow
Posted Feb 27, 2024 - 18:13 UTC
Update
Current Status : We are continuing to work on isolating the issue. Please expect further updates by tomorrow.
Posted Feb 26, 2024 - 23:00 UTC
Investigating
Current Status: Existing workloads should be working as expected. Only new deployments of private CML/CDW/CDF/CDE experiences are failing in EU Control Plane region.
This is currently being investigated, we expect to have further updates towards the End of Business today.
Posted Feb 26, 2024 - 17:50 UTC
This incident affected: Cloudera Data Platform (EU) (DataFlow, Data Engineering, Data Warehouse, Machine Learning).