On January/18/2023 between 13:00 UTC to 15:00 UTC us-west CDP Control plane was in degraded state causing Environment creation failures. The teams were notified about this incident immediately. On investigation the team found that one of the services responsible for FreeIPA management was suffering from resource starvation. Indeed this service was configured to be highly available however the cascaded impact caused more service instances to fail. This eventually led to failures of new environment creation although the existing environments were working fine.
As a mitigation item the team is working on adding additional alerts to monitor this situation and also increase the resources available for this service in an automated manner.
Posted Jan 30, 2023 - 04:11 UTC
Resolved
All services are back up and operating as expected.
Posted Jan 18, 2023 - 16:40 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jan 18, 2023 - 15:18 UTC
Update
Issue seems to be partially resolved.
Posted Jan 18, 2023 - 15:12 UTC
Update
We are continuing to investigate this issue.
Posted Jan 18, 2023 - 15:10 UTC
Investigating
SRE team is currently investigating an issue with us-west control plane. This impacts all the management operations including ability to launch clusters, modify users etc. However please do note customer workloads are not impacted due to this
Posted Jan 18, 2023 - 14:00 UTC
This incident affected: Cloudera Data Platform (US) (CDP Management Console, CDP IAM).