Opened 9 months ago
Closed 9 months ago
#35400 closed Cleanup/optimization (invalid)
SECRET_KEY_FALLBACKS documentation can be misleading when running multiple instances of an application
Reported by: | Ryan Siemens | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | 5.0 |
Severity: | Normal | Keywords: | SECRET_KEY_FALLBACKS |
Cc: | Ryan Siemens | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
Hi, I was looking at the documentation for SECRET_KEY_FALLBACKS and I think the documentations advice for how to rotate keys is a little to simplistic and will lead to unexpected issues with any modestly larger application that runs on multiple boxes (or pods).
"In order to rotate your secret keys, set a new SECRET_KEY and move the previous value to the beginning of SECRET_KEY_FALLBACKS."
Just following this advice can lead to a scenario where your request is routed to a rotated box that has the new SECRET_KEY
set, signs the session successfully and returns. A subsequent request could route to an un-rotated box which hasn't received the SECRET_KEY
update and fails to validate the session. This diagram aims to illustrate the problem .
I believe the way to handle this situation is to have a two phase rollout where:
- Phase 1: don't update
SECRET_KEY
, but updateSECRET_KEY_FALLBACKS
to['old_key', 'new_key']
and let all boxes sync. Everything is still being signed/validated with the current, unchangedSECRET_KEY
. - Phase 2: update
SECRET_KEY='new_key'
andSECRET_KEY_FALLBACKS=['old_key']
. Now boxes that aren't rotated can validate sessions signed from boxes with the new_key even though they will still sign with the old_key.
Things can then proceed as normal where the old_key
is dropped from SECRET_KEY_FALLBACKS
after some time. Visually it looks like this
My request is to update the documentation to at least indicate that the advice doesn't hold in scenarios where you are running the application across a fleet of boxes.
Change History (2)
comment:1 by , 9 months ago
Summary: | SECRET_KEY_FALLBACKS documentation can be misleading when running a multiple instances of an application → SECRET_KEY_FALLBACKS documentation can be misleading when running multiple instances of an application |
---|
comment:2 by , 9 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Hi Ryan, thank you for this ticket I've been thinking about it quite a while.
I think the docs specifically on SECRET_KEY_FALLBACKS is clear enough in terms of what the setting is and what Django uses it for.
I was trying to find existing warnings giving extra considerations that depend on a user's infrastructure and deployments. In general, the docs seem to assume a single-node deployment and rather than considering distributed environments. For example, there isn't an explicit warning in the docs around how adding a NOT NULL column on a model can lock up a table and cause deadlocks with a running application.
I could see a new topic for "Deployment considerations for distributed systems" (or something along those lines within the existing Deployment docs) being valuable. This is quite different from what you were suggesting originally, and quite a bit of work. I would also recommend anyone wanting to write this to first go to the forum to plan the outline and content.
For these reasons I'm going to close this ticket as "invalid" but I welcome more discussion. I'm also very happy to read suggested documentation wording tweaks.