Back to overview
Degraded

Certain schemas doesn't create the expected output

Nov 24 at 02:29pm CET
Affected services
Delivery API

Resolved
Nov 25 at 03:49pm CET

After yesterday's incident, we have worked to understand the cause and scope of the issue.

The issue affected any tenants utilising the partial schema feature and that updated content in a roughly 90-minute period. The exact issues depended on the specific content and configuration of the customer solution.

Internally we categorise this incident as level 2 "High", and when these incidents happen we are not satisfied. This type of issue should never happen, but when they do we want to understand both what happen and where our tools or processes failed.

In this incident, two things happened. Firstly the actual issue causing the service to fail was a simple mistake in the deployed code. Our process with peer review of all code changes resulted in a last-minute change that was the cause of the failure. The mistake was a comparison logic mistake, that was not detected upon the subsequent review and approval.

This first issue is a simple mistake, that happens every day for software engineers. This very rarely slips through to end users, as we have both manual and automated testing procedures. Due to the simplicity of the change, the engineers skipped the manual test for the last iteration of the review process. This is allowed by our process, as we have multiple levels of automated tests. And now to the second issue causing this failure.

Secondly, the automated test that gives us the confidence to release often and fast did not function as intended. In, what we now see as, very unfortunate timing, two days before the failed deploy change our test suite. This resulted in the tests reporting green, while the actual test performed failed in the background. We, therefore, relied on a faulty test. We also want to note that the test suite goes through the same peer review process. This is a bit of speculation, but as the test suite was new, our collective experience is not as high as with most of our other code bases. Whatever the cause, the wrongly configured test suite was the single most important factor in this incident.

We work every day to keep Enterspeed running smoothly, and yesterday we failed. We will continue to evaluate our process and improve where needed. We are very sorry for the inconvenience. If you have further questions or comments, you are always welcome to reach out to support@enterspeed.com

Emil Rasmussen, CTO

Updated
Nov 24 at 03:57pm CET

After we received reports of failing schemes we initiated the rollback procedure. The rollback was complete at 15:57 CET.

The error was related to a planned deployment, we are investigating the exact cause of the error and under what circumstances it manifested it self.

We are very sorry for the inconvenience and don't hesitate to reach out on your normal Slack support channel or via support@enterspeed.com if you any information or help with your specific tenant.

We will followup with a more detailed rapport and what we are doing to avoid this in the future.

Emil Rasmussen, CTO

Created
Nov 24 at 02:29pm CET

Following a planned deployment with, we received reports that certain schemas doesn't create the expected output.