For many organisations, software deployment is a controlled risk event. A change window is agreed. Users are notified of maintenance. The team deploys, monitors for errors, and hopes the rollback procedure works if something goes wrong. For systems that run around the clock or process transactions continuously, this is an avoidable way to operate.
Deployment without downtime is not a luxury feature of large internet companies. It is the expected behaviour of any system where downtime has a real cost, and the patterns that make it possible are well understood and routinely applicable to business systems of ordinary size.
What makes deployment risky is usually not the application code. It is the database. Understanding how to handle both, and what organisational practices make this sustainable, is the substance of this piece.
Why deployment is risky in most systems
In a typical deployment, there is a moment where the old version of the application stops and the new version starts. During that moment, any requests already in progress fail. If the deployment takes thirty seconds, thirty seconds of requests return errors. If something goes wrong and the deployment stalls, the window extends.
The database schema migration compounds this. A migration that alters a table to support a new feature may lock that table during execution. If the migration takes two minutes on a table that receives writes throughout the day, those two minutes mean failed writes or accumulated blocked operations that cascade when the lock releases.
Rollback makes this worse. If a deployment goes wrong and needs to be reversed, rolling back the application code is generally fast. Rolling back a database migration that has already run against production data is often not fast, and sometimes not possible without data loss. The decision to roll back becomes constrained by the database state in ways that make the risk window feel even more precarious.
Blue green deployments: the simplest pattern
A blue green deployment maintains two identical production environments. At any given time, one is live and receiving traffic while the other is idle. Deployment consists of deploying to the idle environment, running validation, and then switching traffic from the live environment to the newly deployed one.
The switch is almost immediate, typically a load balancer or DNS change that routes incoming requests to the new environment. If the new version has a problem, the rollback is equally fast: route traffic back to the previous environment, which is still running and in a known good state.
The constraint is database migrations. Blue-green works cleanly for stateless application changes. For database schema changes, the migration has to be compatible with both the old and new versions of the application simultaneously, because the switch happens at the application layer while the database is shared. This constraint is manageable but has to be designed for.
Database migrations as a deployment concern
The pattern that makes database schema changes safe is expand and contract. Rather than making a breaking change in a single migration, the change is made in two phases across two deployments.
In the expand phase, the new column or table is added without removing anything old. The new version of the application code writes to both the old and new structure. The old version continues to work unchanged. In the contract phase, deployed separately after the first deployment has been verified, the old structure is removed. By the time it is removed, no code is using it.
This approach means migrations are additive rather than breaking. The database is always in a state that both the current and previous versions of the application can use, which is what makes rollback safe. The discipline required is not technical. It is the discipline of making schema changes deliberately across multiple deployments rather than making them all at once and hoping they land cleanly.
Feature flags: decoupling deployment from release
A feature flag separates the deployment of code from the release of functionality. New code is deployed to production but the feature it implements is controlled by a configuration flag that can be enabled or disabled without a new deployment.
This changes the risk profile of deployment substantially. The code is already in production and has been tested against production infrastructure before any users see the feature. If the feature causes problems after being enabled, it can be disabled immediately without a rollback, without a deployment, and without the database state complications that a rollback involves.
Feature flags also enable controlled rollouts: enabling a new feature for 1% of users, measuring its impact, and expanding gradually. This is how risk is managed in systems where a bad deployment can affect many users simultaneously. The feature reaches 100% of users only after it has been observed working correctly at smaller scale.
The deployment pipeline as infrastructure
Deployment without downtime depends on more than technical patterns. It requires a deployment pipeline that makes those patterns executable reliably: automated tests that run before every deployment, smoke tests that validate the new version before traffic is switched, and monitoring that makes it immediately apparent if the deployment has changed system behaviour.
Teams that deploy safely, frequently, and with low ceremony have invested in the pipeline as infrastructure. The deployment process is automated, understood by the team, and fast. Because it is reliable, it is used frequently. Because it is used frequently, problems are small and localised. A change that introduces a regression affects a narrow set of changes rather than three weeks of accumulated work.
The alternative is infrequent, manual deployments with long change windows. That concentrates risk rather than distributing it. Each deployment is larger, the team is more anxious, and when something goes wrong, understanding which of the many changes caused it takes longer.
Deployment risk is mostly optional
The patterns that enable deployment without downtime are not exotic. They require deliberate design of the database migration strategy, investment in the deployment pipeline, and the organisational discipline to maintain small, frequent deployments rather than large, infrequent ones.
Teams that make this investment find that deployment ceases to be a source of anxiety. It becomes an ordinary operational activity that happens several times a week, with a known and short resolution path when something goes wrong. The alternative is a deployment window on the calendar that everyone dreads.
More in this series
- The Difference Between a Demo and Production Software
- Why Software Gets Rebuilt Every Five Years and How to Avoid It
- Audit Ready by Design: Building Software That Passes Every Review
- Observability Is Not the Same as Logging
- The Real Cost of Technical Debt in Operational Systems
- Deployment Patterns for Business Critical Systems
- How to Design Software That Scales Without a RewriteComing soon
- Production Readiness Checklist: 12 Things Most Teams SkipComing soon
- Graceful Degradation: What Happens When Part of Your System FailsComing soon