The Shadow Worrier: For a Future of Only Delightful API Surprises
Written by Simen Svale
Missing Image!
We needed a way to introduce changes in our APIs without breaking people’s code. So we made The Shadow Worrier. Here’s the story of how it works and why we made it.
At its best, the Cloud hides all the hard, boring yarn of machinery behind a cover of beautiful and fluffy APIs. It frees us to get on with what we actually want to do. However, as cloud dwellers, we live with worry. APIs that suddenly change in the night, breaking perfectly fine code we wrote months and years back, forcing us to do pointless firefighting when we should be getting on with the awesome.
Surprises should always be delightful and lovingly deliberate, like gifts. We had to contend with this rule a year back as we were fixing some completely innocuous inconsistencies in our APIs. It turned out that for every change we made, we broke someone's stuff. We'd reached that humbling, yet annoying milestone in the life of a cloud service where every inconsistency was being relied upon by someone, somewhere.
We detest APIs that go bump in the night. It feels awful to have to inflict that onto people using our services. Sanity's backend should feel light and easy yet solid — the Cloud upon where you can safely build your castle.
Our API inconsistencies are now part of our service offering. If anyone relies on some mistake we made and it does not represent a security issue, we aim to maintain it. Our mistakes should never become your problem. Maintaining mistakes, however, poses some challenges on our part. We have expansive test suites to help us keep our services working as intended as we change them. However, that doesn't capture all the possible permutations from real use. These are queries and edge cases not covered by our tests, by definition. It is about maintaining queries that are unknown to us. It's these you shouldn't worry about, but let us take that toll.
We too like to sleep at night, though! So we built The Shadow Worrier. The Shadow Worrier's purpose is to look out for everyone that depends on some behavior in our APIs. It runs the upcoming version of our APIs in the background, the Shadow Backend. When your apps access our APIs, we service your requests as fast as we can. Then, behind the scenes, the Worrier re-runs your requests on the Shadow Backend. Any discrepancies that emerge from the results mean we are about to break some queries that we didn't know of and did not cover in our tests. When we do, we add newly seen behaviors to test cases and make sure we maintain them going forward. That's how the Shadow Worrier fights unwanted surprises.

Does it mean we never fix bugs or introduce API-breaking changes? Of course not! That's where versioned APIs come in. Our goal is that even the weird stuff in any given version of our API is as dependable as the consistent, intended stuff. When you are ready for our fixes, you explicitly upgrade to the newer account and fix any breakage you might encounter in a controlled situation, with help from our migration guides. This way we get to keep moving, and you still get to sleep at night even when we realize the result of that all-important (to you for some inexplicable reason) count("banana") should be null instead of throwing Invalid function call: count(string).
The architecture of The Shadow Worrier and the Shadow Backend is largely made possible by running our backend with Kubernetes. When we're introducing new features, we put these on the shadow branch and create a new deployment internally on the cluster. Then we enable the endpoints to the ingress, which routes traffic to the production clusters as well as The Shadow Worrier. It's a pretty simple setup in terms of operations, but oh so useful.
The Shadow Worrier then produce logs and statistics for every deviation it finds. As more and more people have started to use Sanity, we don't need to run The Shadow Worrier for very long or particularly many times before having a decent amount of data on how queries behave across versions.
We're not saying we'll never take an API out of service. But when we eventually decide it is time, we intend to give you ample heads up before we do.

So let's talk a little bit about our versioning scheme.
We are not only an API provider. We also consume a lot of APIs, ranging from our platform vendor, marketing automation, analytics, service monitoring, payment processing … I could go on. We got inspired when implementing the API from our payment provider Stripe, who implemented a very similar scheme back in 2017.
We intend to introduce API improvements piecemeal and continuously. We could have used numbered versions, but at least the way we plan to do this, we would very soon be on v311. Also, you would have to look up the latest version every time you wanted to set up a new project. We like Stripe's approach to just version the API by date. So as I'm writing this, v2019-09-20 is guaranteed to give me all the latest changes. In the future, when you need an API feature, you'll have to look it up in our release log to check out what version-date you need to get it.
Only backward-incompatible changes are versioned. We might introduce new functionality that we deem purely supplementary, like new API endpoints, or pure extensions of the GROQ syntax without API versioning. We also need a way to beta test changes and do so by publishing version X (aka vX). This version contains our experimental, unstable API features. Version X may change at any time, and you must not rely on it. We intend to release new features early to gain experience before we completely lock it in. If you want to ride on the bleeding edge, X is for you.
You can read more about our versioning scheme [here] and how to use it with the JS-client (our other clients support it too). We'll soon announce some very subtle, yet useful improvements in GROQs handling of arrays. This will be our first versioned feature.
We believe that The Shadow Worrier and the new API versioning scheme makes a good foundation for providing a sustainable and reliable service for people's structured content in the years to come. We're excited about releasing new features without breaking people's existing code. Because, after all, our primary job is to move, and get out of the way.
