Ingest API suffers from very low volume of intermittant errors
Resolved
Aug 29 at 08:56am CEST
After observing the metrics after the implemented workaround yesterday afternoon, we can now resolve the Ingest API degradation issue.
We will update with a full post mortem when we have debriefed with our cloud service provider.
Sorry for the inconvenience and please reach out if you have any questions.
Affected services
Ingest API
Updated
Aug 28 at 09:48pm CEST
Our monitoring systems shows that we have successfully mitigated the issue. We will continue to monitor and engage with our cloud service provider's support team and investigate the cause of the issue.
We will keep the issue open until we are certain that no further issues affect our customers.
Affected services
Ingest API
Updated
Aug 28 at 06:10pm CEST
The first results of our workaround seem to be working and effectively eliminating the 502 errors. We will continue to monitor and evaluate the next steps.
Affected services
Ingest API
Updated
Aug 28 at 03:39pm CEST
Currently, we are working on two different strategies. First, we are trying a workaround using a different underlying resource type which we are monitoring the effects of now. Second, our cloud service provider has identified potential internal issues, which have now been escalated to their internal team.
We will keep you updated as we learn more.
Affected services
Ingest API
Updated
Aug 28 at 12:23pm CEST
Since the last update we have continued to rule things out and monitor the effects. We are working together with our cloud service provider's support team that are actively investigating the cause of the issue.
Affected services
Ingest API
Updated
Aug 27 at 12:13pm CEST
We continue to walk through the debugging and mitigation steps, but with the intermittent nature of this issue we are not able to determine the effects before a 2-12 hour periode has passed.
Please reach out on your Slack support channel or via support@enterspeed.com if you have any questions.
Affected services
Ingest API
Updated
Aug 26 at 01:40pm CEST
We are trying out various different debugging and mitigation strategies. It is proving to be a slow process as the issue is happening at random and with up to 12 hours in between. This off course has a big negative impact on the feedback cycle for our various initiatives.
We are working together with our cloud service providers support team to understand and mitigate the issue.
Affected services
Ingest API
Updated
Aug 25 at 07:55pm CEST
We are continuing to monitor the situation and will update this page as we learn more.
Affected services
Ingest API
Updated
Aug 24 at 07:29pm CEST
We are continuing to monitor the situation and the errors reported are still very low.
We will update as we learn more.
Affected services
Ingest API
Created
Aug 23 at 02:36am CEST
Our monitoring systems are reporting a very number of Ingest API errors. We have many thousands of successful requests every hour, but an estimate 0,1 % of Ingest API requests returns a 502.
Our current analyses point to an intermittent failure in the our cloud partner's load balancer and we are working together with their support to further diagnose the issue.
As always we recommend to use a retry strategy when ingesting data into Enterspeed to protect against intermittent network issues.
We apologise for any inconvenience and please reach out via your Slack support channel or via email if you have any questions.
Affected services
Ingest API