FreeIPA clusters are in an unreachable state
Incident Report for Cloudera
Postmortem

On February 14, 2024, a scheduled monthly Kubernetes maintenance activity unintentionally caused a 30-minute outage for the FreeIPA service within the Control Plane. While the service itself remained functional on workloads, the issue stemmed from a bug introduced during the maintenance. This bug impacted the Inverting Proxy component, responsible for facilitating communication between the Control Plane and workloads, leading to the temporary disruption.

The team promptly identified and rectified the bug, restoring full service approximately within 30 minutes. Additionally, we have implemented corrective measures within the maintenance automation to prevent similar occurrences in the future.

We sincerely apologize for the inconvenience this incident may have caused.

Posted Feb 24, 2024 - 02:07 UTC

Resolved
Current Status: Our teams have confirmation that the solution implemented has addressed the issue. If you continue to experience issues please raise a support case with us.

A root cause analysis (RCA) will be published within seven business days.

Customer Experience: Customers may observe FreeIPA service in an unreachable status.

Incident Start time: ~10:53 UTC Feburary 14th, 2024
Posted Feb 14, 2024 - 17:50 UTC
Monitoring
Current Status: Our teams have identified the source of the issue and have implemented a solution which is under monitoring.
We will have another update within 60 mins.
Customer Experience: Customer may observe FreeIPA service in an unreachable status.
Incident Start time: ~10:53 UTC Feburary 14th, 2024
Posted Feb 14, 2024 - 16:54 UTC
Identified
Current Status: Our teams have identified the source of the issue.
We are working on developing and implementing a solution to restore the service(s).
We will have another update within 60 mins.

Customer Experience: Customer may observe FreeIPA service in an unreachable status.

Incident Start time: ~10:53 UTC Feburary 14th, 2024
Posted Feb 14, 2024 - 16:26 UTC
Update
We are continuing to investigate this issue.
Posted Feb 14, 2024 - 16:19 UTC
Investigating
Current Status: We are currently investigating a potential issue with the FreeIPA service.
Customer Experience: Customer may observe FreeIPA service in an unreachable status.
Posted Feb 14, 2024 - 16:05 UTC
This incident affected: Cloudera Data Platform (US) (DataFlow, Data Engineering, Data Warehouse, Operational Database, Machine Learning, Data Hub, Data Catalog).