Back to overview
Degraded

Delivery API data is delayed

Dec 14 at 12:08pm CET
Affected services
Delivery API

Resolved
Dec 15 at 10:12am CET

We want to follow up on yesterday's incident.

This was unlike anything we have ever seen before. The trigger of the incident was a customer performing a re-seed on a medium to large-sized tenant. Especially one of the schemas was crafted very, unfortunately. Almost all source entities would trigger the schema and because the schema had an action to trigger its parent entity, that cascaded into millions of view generations in just 2 hours.

A lot of the view generation was in this scenario unnecessary, but due to the distributed and horizontal scaleability of our processing layer, we can't easily detect when a view will be superseded by a newer view, just seconds later. We have accepted a level of "over processing", as we have regarded this as the most effective strategy.

During the day we tweaked our processing capabilities to more than 3 times the speed of Tuesday.

It was not possible to purge the queue without compromising the data consistency. We had to take the decision that provided the safest possible way back to normal.

This incident, unfortunately, caused a delay for all of our customers yesterday. This delay is unsatisfying for us and the service we want to deliver to our customers. We have therefore planned the following mitigating actions:
Improved monitoring and alarming around processing queues and processing time
Isolation of tenant's processing queues
Detection of potential cascading events
Detection of potential "over processing"

If you have any questions, then please reach out to support@enterspeed.com.

Emil Rasmussen, CTO

Updated
Dec 14 at 08:39pm CET

The entire queue was at 19.31 CET processed and we are not experiencing any delays in our processing layer.

We are very sorry for the delays. We have multiple both short and longterm changes ahead, so a similar situation will not happen again.

Please reach out at support@enterspeed.com if you have any questions.

Emil Rasmussen, CTO

Updated
Dec 14 at 07:16pm CET

We are on track for our previously estimated end time of 20:00 CET. Last update indicates that we will be done slightly earlier.

We will update again at 20:45 CET.

Updated
Dec 14 at 05:01pm CET

We are on track for our previously estimated end time of 20:00 CET.

We will update again at 19:30 CET.

Updated
Dec 14 at 03:25pm CET

We are on track for our previously estimated end time of 20:00 CET.

We are very sorry for the delays. We have identified the root cause, and will work towards eliminating this is the future.

We will update again at 17:00 CET.

Updated
Dec 14 at 02:30pm CET

We are on track for our previously estimated end time of 20:00 CET.

We will update again at 15:30 CET.

If you have any questions please reach out to us at support@enterspeed.com

Updated
Dec 14 at 12:59pm CET

Our current estimate is that the processing queue will be done a 20.00 CET.

We will report with a new update at 14.30 CET.

Updated
Dec 14 at 12:19pm CET

We have identified a large amount of processing jobs in our queue. Since 03.07 (AM) CET our processing layer has been working through ~8 million events per hour.

(We previously reported an incorrect estimate on when all data will be processed. We are currently increasing the processing capabilities and for the next update a new estimate will be reported.)

We will continue monitoring and report a new status at 13:15 CET.

We are sorry for delays.

Created
Dec 14 at 12:08pm CET

The Delivery API data (views processing) are currently experiencing delays.

We are currently investigating a delay in our processing layer. We will update when we have more info (no later than 12.45)