In an effort to handle cases in which end users repeatedly write messages within a very short timeframe (before the bot responds), new functionality was added to the platform to account for this behaviour and respond in an appropriate manner. This functionality, when combined with an extremely large volume of messages, led to one of our services being blocked, subsequently resulting in downtime on the platform.
Mitigation actions & resolution:
In order to locate the root cause of the issue, extra logs have been implemented to allow for increased traceability. These enhanced logs allowed us to find the root cause of the issue, and will allow us to easier pinpoint issues like it in the future.
In order to resolve the issue, a patch was released to ensure that the problem does not occur again.
September 29th, 12:30 CEST:
Our developers have deployed the fix for this problem, and we do not expect any more downtime as a result of this issue. More information pertaining to root cause and how the issue was resolved will be added when available.
September 28th, 12:25 CEST:
A service incident took place at 11:15 CEST and lasted until 11:35 CEST. Users may have experienced that the platform is slower than usual and that bots were not responding as expected.
Service has been restored, but developers are investigating the issue.
More information will be added to this post when available.
Please sign in to leave a comment.