Control Plane Issues
Incident Report for Cloudera
Postmortem

On Oct 5th, 2023 we had reports from customers that Cloudera Manager instances in their environments were reporting UNREACHABLE status for both Datalake’s and Datahub’s. Upon further investigation, it was identified that two independent production changes were attributed to the issue. 

  1. A new software release for connectivity between Control Plane and Workload; which hit an edge case during certain operations. This was addressed by rolling back to the previous software version. 
  2. The second independent production change was related to a decrease in resources allocated to one of our internal systems; responsible for storing key secrets. This was addressed by reverting to the previous configurations. 

These issues did not manifest in our lower environments, where the changes were tested prior to rolling out to Production. As a mitigation action, our teams are working on adding monitoring and alerting around these corner cases and will introduce additional checks and balances before any resource changes are implemented in our production system.

Posted Oct 16, 2023 - 12:31 UTC

Resolved
The fix implemented has seen positive results, resolving connectivity issues.
Posted Oct 05, 2023 - 23:07 UTC
Update
The overall control plane has stabilized. We're investigating connectivity issues that are impacting some number of customers. A potential fix is being implemented to resolve the connectivity issues.
Posted Oct 05, 2023 - 22:34 UTC
Monitoring
We've implemented a fix to our control plane service. We're seeing services recover albeit with higher latency.
Posted Oct 05, 2023 - 21:26 UTC
Identified
Issues have been identified in an internal control plane service. Multiple services are currently impacted.
Posted Oct 05, 2023 - 20:49 UTC
Investigating
We are investigating problems affecting the US control plane.
Posted Oct 05, 2023 - 19:27 UTC
This incident affected: Cloudera Data Platform (US) (CDP Management Console, Cloudera Observability, CDP IAM, DataFlow, Data Engineering, Data Warehouse, Operational Database, Machine Learning, Data Hub, Data Catalog, Replication Manager).