On July 6, 2021 at 08:50 PDT, the CDP Control Plane became inaccessible to new sessions. User login would complete successfully but the Control Plane welcome screen did not render and users observed a 503 error. This condition persisted for 57 minutes.
Analysis revealed that the active Vault Secrets Store had an expired certificate. Although the Vault Secrets Store certificate had been renewed prior to the expiration date, the Vault service didn’t recognize the new certificate. In previous years, Vault had been restarted for other reasons between the renewal and the expiration date, masking the process defect. An expired Vault certificate disables access to the Secrets Store. The Audit Service relies on the Secrets Store to function. Auditing is central to our application architecture. By design, a loss of the Auditing Service shuts down access to the Control Plane. The blast radius of this incident was limited to control plane and cluster management operations. No customer workload cluster was impacted by this incident.
Restarting the Vault Secrets Store Service allowed the service to recognize the new (valid) certificate and therefore resolving the incident
Cloudera has identified the following corrective actions: