Opened 8 months ago

Closed 8 months ago

#35400 closed Cleanup/optimization (invalid)

SECRET_KEY_FALLBACKS documentation can be misleading when running multiple instances of an application

Reported by: Ryan Siemens Owned by: nobody
Component: Documentation Version: 5.0
Severity: Normal Keywords: SECRET_KEY_FALLBACKS
Cc: Ryan Siemens Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Hi, I was looking at the documentation for SECRET_KEY_FALLBACKS and I think the documentations advice for how to rotate keys is a little to simplistic and will lead to unexpected issues with any modestly larger application that runs on multiple boxes (or pods).

"In order to rotate your secret keys, set a new SECRET_KEY and move the previous value to the beginning of SECRET_KEY_FALLBACKS."

Just following this advice can lead to a scenario where your request is routed to a rotated box that has the new SECRET_KEY set, signs the session successfully and returns. A subsequent request could route to an un-rotated box which hasn't received the SECRET_KEY update and fails to validate the session. This diagram aims to illustrate the problem https://cdn.zappy.app/f223668c63259e3316cfe0afb6bc97c3.png.

I believe the way to handle this situation is to have a two phase rollout where:

  • Phase 1: don't update SECRET_KEY, but update SECRET_KEY_FALLBACKS to ['old_key', 'new_key'] and let all boxes sync. Everything is still being signed/validated with the current, unchanged SECRET_KEY.
  • Phase 2: update SECRET_KEY='new_key' and SECRET_KEY_FALLBACKS=['old_key']. Now boxes that aren't rotated can validate sessions signed from boxes with the new_key even though they will still sign with the old_key.

Things can then proceed as normal where the old_key is dropped from SECRET_KEY_FALLBACKS after some time. Visually it looks like this https://cdn.zappy.app/0edeb791cdc40d9d8e3b7f3918b1b11b.png

My request is to update the documentation to at least indicate that the advice doesn't hold in scenarios where you are running the application across a fleet of boxes.

Change History (2)

comment:1 by Ryan Siemens, 8 months ago

Summary: SECRET_KEY_FALLBACKS documentation can be misleading when running a multiple instances of an applicationSECRET_KEY_FALLBACKS documentation can be misleading when running multiple instances of an application

comment:2 by Sarah Boyce, 8 months ago

Resolution: invalid
Status: newclosed

Hi Ryan, thank you for this ticket I've been thinking about it quite a while.

​I think the docs specifically on SECRET_KEY_FALLBACKS is clear enough in terms of what the setting is and what Django uses it for.

My request is to update the documentation to at least indicate that the advice doesn't hold in scenarios where you are running the application across a fleet of boxes.

I was trying to find existing warnings giving extra considerations that depend on a user's infrastructure and deployments. In general, the docs seem to assume a single-node deployment and rather than considering distributed environments. For example, there isn't an explicit warning in the docs around how adding a NOT NULL column on a model can lock up a table and cause deadlocks with a running application.

I could see a new topic for "Deployment considerations for distributed systems" (or something along those lines within the existing Deployment docs) being valuable. This is quite different from what you were suggesting originally, and quite a bit of work. I would also recommend anyone wanting to write this to first go to the forum to plan the outline and content.

For these reasons I'm going to close this ticket as "invalid" but I welcome more discussion. I'm also very happy to read suggested documentation wording tweaks.

Note: See TracTickets for help on using tickets.
Back to Top