Cloudera AI Inference and Model registry Creation Failures
Incident Report for Cloudera
Postmortem

On June 27, 2024, Our internal monitoring detected an issue with new deployments and upgrades of the Cloudera AI Inference and Model registry on our US control plane. Further investigation revealed configuration issues between our internal micro-services, the changes were made to address the issue and corrective actions have been implemented.  

We apologize for any inconvenience caused by the service disruption. We are fully committed to providing a reliable and robust platform and truly appreciate your understanding.

Posted Jul 16, 2024 - 06:08 UTC

Resolved
Current Status: Our teams have successfully deployed a fix for the issue and confirmed that the issue has been resolved. If you are still experiencing issues or have any questions please raise a support case with us. A root cause analysis (RCA) will be published within seven business days.
Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity.
Incident Start time: 5 PM UTC June 25th, 2024
Incident End time: 5:34 AM UTC June 28th, 2024
Posted Jun 28, 2024 - 11:50 UTC
Update
Current Status: We continue investigating the issue in AP and EU regions. US region is functioning as expected. We will have another update within 60 mins.

Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity.
Incident Start time: 5 PM UTC June 25th, 2024
Posted Jun 28, 2024 - 05:34 UTC
Update
Current Status: Our teams have identified and fixed the issue in US region. We are still investigating the issue in AP and EU regions. We will have another update within 60 mins.

Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity.
Incident Start time: 5 PM UTC June 25th, 2024
Posted Jun 28, 2024 - 04:26 UTC
Identified
Current Status: Our teams have identified the source of the issue. We are working on developing and implementing a solution to restore the service(s). We will have another update within 60 mins.

Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity.
Incident Start time: 5 PM UTC June 25th, 2024
Posted Jun 28, 2024 - 03:27 UTC
Investigating
Current Status: We are currently investigating a potential issue with Cloudera AI Inference and Model registry. We will have an update within 60 mins.

Customer Experience: Customers may observe issues while doing new deployments and model registry upgrade activity.
Incident Start time: 5 PM UTC June 25th, 2024
Posted Jun 28, 2024 - 02:25 UTC
This incident affected: Cloudera Data Platform (AP) (Machine Learning), Cloudera Data Platform (EU) (Machine Learning), and Cloudera Data Platform (US) (Machine Learning).