Back to overview
Degraded

Ingest API suffers from very low volume of intermittant errors

Aug 23, 2024 at 12:36am UTC
Affected services
Ingest API

Resolved
Aug 29, 2024 at 6:56am UTC

After observing the metrics after the implemented workaround yesterday afternoon, we can now resolve the Ingest API degradation issue.

We will update with a full post mortem when we have debriefed with our cloud service provider.

Sorry for the inconvenience and please reach out if you have any questions.

Updated
Aug 28, 2024 at 7:48pm UTC

Our monitoring systems shows that we have successfully mitigated the issue. We will continue to monitor and engage with our cloud service provider's support team and investigate the cause of the issue.

We will keep the issue open until we are certain that no further issues affect our customers.

Updated
Aug 28, 2024 at 4:10pm UTC

The first results of our workaround seem to be working and effectively eliminating the 502 errors. We will continue to monitor and evaluate the next steps.

Updated
Aug 28, 2024 at 1:39pm UTC

Currently, we are working on two different strategies. First, we are trying a workaround using a different underlying resource type which we are monitoring the effects of now. Second, our cloud service provider has identified potential internal issues, which have now been escalated to their internal team.

We will keep you updated as we learn more.

Updated
Aug 28, 2024 at 10:23am UTC

Since the last update we have continued to rule things out and monitor the effects. We are working together with our cloud service provider's support team that are actively investigating the cause of the issue.

Updated
Aug 27, 2024 at 10:13am UTC

We continue to walk through the debugging and mitigation steps, but with the intermittent nature of this issue we are not able to determine the effects before a 2-12 hour periode has passed.

Please reach out on your Slack support channel or via support@enterspeed.com if you have any questions.

Updated
Aug 26, 2024 at 11:40am UTC

We are trying out various different debugging and mitigation strategies. It is proving to be a slow process as the issue is happening at random and with up to 12 hours in between. This off course has a big negative impact on the feedback cycle for our various initiatives.

We are working together with our cloud service providers support team to understand and mitigate the issue.

Updated
Aug 25, 2024 at 5:55pm UTC

We are continuing to monitor the situation and will update this page as we learn more.

Updated
Aug 24, 2024 at 5:29pm UTC

We are continuing to monitor the situation and the errors reported are still very low.

We will update as we learn more.

Created
Aug 23, 2024 at 12:36am UTC

Our monitoring systems are reporting a very number of Ingest API errors. We have many thousands of successful requests every hour, but an estimate 0,1 % of Ingest API requests returns a 502.

Our current analyses point to an intermittent failure in the our cloud partner's load balancer and we are working together with their support to further diagnose the issue.

As always we recommend to use a retry strategy when ingesting data into Enterspeed to protect against intermittent network issues.

We apologise for any inconvenience and please reach out via your Slack support channel or via email if you have any questions.