#23610 closed Cleanup/optimization (wontfix)
Removing a null constraint can lead to race conditions with migrations
Reported by: | Josh Smeaton | Owned by: | Jacob Walls |
---|---|---|---|
Component: | Documentation | Version: | 1.7 |
Severity: | Normal | Keywords: | migrations |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I think a race condition exists when removing a null constraint that doesn't contain a default. This discussion started with regards to #23609. The flow here was changing a column from having a null to having a default:
IntegerField(null=True) -> IntegerField(default=42)
And the race condition:
- Deploy migration
- Update App Server 1 code
- App Server 2 tries to write to the table, and gets a write error (since it doesn't yet have the default in its models file)
- Update App Server 2 code
For a write heavy table, this could result in many failed writes.
I think we should be clear, and document, that removing a null on a column should always be done in two steps:
- Add a default to the column. Deploy migration. Deploy app code.
- Remove the null and the default from the column. Deploy migration. Deploy app code.
Perhaps there could be a check that ensures that removing a null always requires an existing default to exist.
Change History (5)
comment:1 by , 10 years ago
Component: | Migrations → Documentation |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 3 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
A short note is not enough to explain it without creating confusion. On the other hand I don't think we want to create an extra section for this.
comment:5 by , 3 years ago
FWIW I've been working on a third-party application that resolves this problem by introducing a concept of pre- and post-deployment migrations. We've been using it in production for a few months now.
It has a built-in notion of quorum which allows the following sequence of operation to be taken across multiple instances (e.g. multiple k8s clusters)
- Run the
migrate --pre-deploy
command to alter the column to have itNULL SET DEFAULT 42
- Wait for the N clusters to have completed their application rollout
- Run the
migrate
command to have itNOT NULL
andDROP DEFAULT
I created an issue to add support for this particular case as null constraint removal is not explicitly tested but it should only be matter of adjusting the auto-detector to have a IntegerField(null=True) -> IntegerField(default=42)
change generate an extra PreAlterField
operation meant to run on migrate --pre-deploy
.
There are many cases where old app code won't work after a migration applied (e.g. adding a new column with
NOT NULL
, removing a column, etc.). I reckon we should just document that people are expected to run the new app code right after applying a migration.There are certainly ways around it, but it's important people understand that nothing special is done at the framework level.