The service incident was caused by a failure in the running instance of a service we use from our cloud provider to run our AI service. This led to bots not responding as expected, as well as bots being unpublishable, and the AI being untrainable.
With the information gathered in the process of troubleshooting this incident, we will seek to implement more detailed monitoring specifically for this service. Additional monitoring will allow us to quickly and effectively diagnose any issue affecting the service and restore service more expediently if an event such as this should reoccur.
The issue pertained to our AI service which was not functioning as expected. The service has been restarted and is now working again. Root cause analysis to come.
We are seeing reports that bots are not responding as expected, as well as webhooks not firing correctly.
Developers have been notified and are actively investigating.
More information will be added to this post when available.
Please sign in to leave a comment.