Over the years I’ve become the biggest advocate that I know of for deploying software frequently. Really frequently. There are two reasons I’ve come to a rather extreme view on the matter. First, the more often you deploy the higher the quality and reliability of your software – which tends to be counter intuitive. Second, deployment frequency is one of the few metrics in Product Management where Goodhart’s law doesn’t apply 1.
Below is my rational for how more frequent deployments increases reliability.
When you’re deploying rapidly you know your code is going to customers soon. You don’t want to hurt your customer or cause an embarrassing outage so you’re going to make sure your code works. When you have a chance to do some testing later, you’re more likely to take shortcuts. – I am pretty sure it works, but I’ll double check later you think to yourself, while you move onto that other thing you need to check from earlier.
When it is going to be in front of a customer immediately you’re going to do something more to make sure it works. You might spend more time reading it carefully or manually testing it locally. There is loads of evidence showing that test-driven development and automated testing is more productive, but whether you choose the proven productive route or not, you’re going to write higher quality code when you force yourself to deploy frequently.
More frequent deployments mean you make smaller changes at a time. Smaller changes are easier to review, test, and debug. This simple maxim that deploying more often means smaller batches - leads to higher quality. Small batches means you detect and fix issues earlier - generally before they get to a customer. Even if they do get to a customer, they’ll be easier to fix than if you took the time to build up a larger batch of changes. The reason why this “small batches” works because of the concept of “Small Batch Size” in lean software development (which comes from lean manufacturing). My favorite explanation of the power of small batch sizes is in Principles of Product Development Flow by Donald G. Reinertsen.
A good feedback loop helps. By a good feedback loop, I mean tests running automatically on every push to source code control, good monitoring in production, and the ability to use feature flags, canary deployments 2 or blue/green deployments 3 and so on. Yet, I have found that when you wait on those things before starting frequent deployments, somehow, several months later, you wonder why you still don’t deploy frequently. Usually the answer is “the features” made us not do it.
Yet, if you start frequent deployments now it will happen as long as you add one more thing: Written Post Incident Reviews (PIRs) with action items. PIRs will help you get the feedback loop in place. Not all at once, but the work will start immediately and it will happen (at least as long as you stay the course of frequent deployments). Each incident begets a tiny improvement to the feedback loop. A new test, a new monitor, a faster rollback, and soon things get better. Now, you might say “but I don’t want to have an outage to make things better.” Nor do I! I am not saying try to create outages - again at the very least write quality code, and deploy small changes, but if you don’t have these things in place, manual testing in a “test environment” will not save you from outages anyway. At best, you’ll be lucky, so take reasonable steps to mitigate outages, but deploy! Deploy! Deploy!
The summary of the above is this: You’ll learn faster. Your errors will be small and fixed before they get big and complex. You’ll learn how to test and monitor production. The bonus is that your work will get to customers faster and you’re going to learn more from them too.
I’ve had this conversation more than a couple times now, and below are some common rebuttals - especially after a bug or two in production happens.
Here are some fallacies of the “more time to test” argument: First, the premise of taking more time to test implies you’re manually testing. I am a huge advocate of automated tests but manual testing doesn’t work. Manual testing is arbitrary in practice and doesn’t provide even remotely comprehensive coverage. Especially for complex software applications that are hosted in the cloud 4. Since we’re trying to prevent bugs from being introduced into any part of the system, not just the part that you’re working on, it’s important to test the whole system.
Manual testing is also inconsistent. It depends on individual tester knowledge, skill, and experience. I know that you know what to test and you are good at it, but the next person doesn’t have the context and experience that you do. So their testing will be different, and thus, inconsistent. As the team and the application scale, the complexity only increases. So “more time to test” only increases in futility.
Manual test environments are different from production. Often fundamentals are different like services and capacities. Usually the actual data in production vs a test environment are different . Traffic patterns are always different because the customers are different. Those are big differences! So the reality is that you’ll test, but production is different than what you tested - so why wait?
Well that’s not a reason to manually test - that’s a reason to deploy! The longer you wait, the bigger your change gets. The bigger your change, the bigger the complexity and risk. “Testing” won’t save you here. Take a step back, break it down into small pieces and deploy it.
A dedicated test environment does have narrow application of value, but the value of manually testing there is even narrower. Manually testing in a dedicated environment should be the exception — the rare exception — not the default. So if you want to manually test in a dedicated test environment you should explain why this deployment is so unique that it requires it. And if you don’t have high coverage with unit tests for the change you’re attempting to deploy and mocks for the dependencies then just stop the manual testing and use that time to go write those and then we’ll continue the debate on the value of manual testing.
Thanks to my wife Oksana Willeke and my father Jim Willeke feedback on this article.
Goodhart’s law states that “When a measure becomes a target, it ceases to be a good measure.” This is because once a metric is used to measure performance, people will start to focus on optimizing the metric itself, rather than the underlying behavior that the metric is supposed to represent. In this case, I argue that you should generally let them.↩︎
A canary deployment strategy is one that gradually rolls out a new version of a service to users, starting with a small subset and increasing the percentage of users over time. An example of how to do this with Kubernetes is at https://kubernetes.io/docs/concepts/cluster-administration/manage-deployment/#canary-deployments↩︎
A blue/green deployment strategy is one that uses two identical production environments, one active and one inactive. New code is deployed to the inactive environment, tested, and then switched to the active environment, with zero downtime to users. An example implementation in Kubernetes is described at https://kubernetes.io/blog/2018/04/30/zero-downtime-deployment-kubernetes-jenkins/#blue-green-deployment↩︎
any micro-service application qualifies as “complex” if it lives a while↩︎