The intermittent performance and access issues with the Cloudera Management Console and FreeIPA were triggered by a scheduled service upgrade intended to improve platform stability.
Our investigation determined that a change in a core component introduced a latent configuration issue. This specific condition was not exposed in our testing environments, preventing dependent services from applying dynamic configuration updates in production and leading to the outage.
We've taken immediate action to prevent this issue from recurring:
System Fix: The problematic component change was rolled back, and a permanent patch was deployed to restore proper dynamic configuration functionality.
Process Overhaul: We've implemented a more rigorous upgrade process with mandatory, near-production scale validation steps that specifically test for these types of configuration failures.
Enhanced Monitoring: We significantly improved our monitoring and alerting capabilities to detect these abnormal service behaviours much earlier, ensuring a faster response time.
We are dedicated to providing a reliable platform and will continue to invest in our infrastructure and processes. Thank you again for your patience and understanding