Opened 14 years ago
Last modified 20 months ago
#15855 new Bug
cache_page decorator bypasses any Vary headers set in middleware
Reported by: | Carl Meyer | Owned by: | |
---|---|---|---|
Component: | Core (Cache system) | Version: | |
Severity: | Normal | Keywords: | |
Cc: | django@…, inactivist@…, denis.cornehl@…, someuniquename@…, Fernando Gutiérrez | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
A number of common response middlewares in Django (gzip, sessions, locale, csrf) add to the Vary header on responses, which is checked both by Django's caching system and upstream HTTP caches to determine what requests can safely be served that cached response. Getting the Vary header correct can be quite important, as failure to include it can mean upstream caches show session-private content to users who should not see it.
Since view decorators run on the outgoing response first, before response middleware, the cache_page decorator caches the response before any of the mentioned response middlewares have a chance to add their Vary headers. This means two things: 1) the cache key used won't include the headers the response ought to vary on, and Django may later serve that response to users who really shouldn't get it, and 2) when that cached response is later served to a user, it still won't include the Vary header that it should have, and thus may also be cached wrongly by an upstream HTTP cache.
I can't see a reasonable way of fixing this that maintains the current semantics of the cache_page decorator. The only option I've come up with is to require all users of full-response caching to always include the UpdateCacheMiddleware
in their MIDDLEWARE_CLASSES (in its recommended spot at the top of the list, thus last in response processing), and only do the actual caching there. We'd then have to provide some way to say "don't cache pages unless they are marked" (a setting? ugh), and then the cache_page decorator would just mark the response for later caching (which would override the global don't-cache flag).
Attachments (1)
Change History (25)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
by , 13 years ago
Attachment: | 15855_CSRF_per_view_caching_docs.diff added |
---|
follow-up: 3 comment:2 by , 13 years ago
Easy pickings: | unset |
---|---|
Has patch: | set |
comment:3 by , 13 years ago
UI/UX: | unset |
---|
Replying to idangazit:
I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The
vary_on_cookie
decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.
Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.
I'm not clear on the issue here - if vary_on_cookie doesn't solve the problem, then how is it solved by using the cache middleware rather than per-page caching? What, exactly, is the issue?
In any case, if vary_on_cookie doesn't solve this problem, then I think this problem deserves its own separate ticket and doesn't belong in this ticket, which is specifically about Vary header issues with the cache_page decorator. Even if the attached doc patch is correct, which is not yet clear to me, applying it would not be a fix for this ticket.
comment:4 by , 13 years ago
There's two problems here:
- The subject of this ticket -- that cache_page has problems because it caches content too early in the process. This is the root problem.
- The fact that the docs currently suggest that @vary_on_header and @cache_page can be used on CSRF pages.
The second problem will be resolved when/if we fix the first problem, but in the interim, the existence of the suggestion is pretty misleading. Idan's patch looks good for that purpose.
follow-up: 6 comment:5 by , 13 years ago
It still hasn't been explained why @vary_on_cookie
and @cache_page
don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding @csrf_protect
into the stack of decorators?
comment:6 by , 13 years ago
Replying to lukeplant:
It still hasn't been explained why
@vary_on_cookie
and@cache_page
don't work with CSRF pages. Idan had a sentence that looked like it was about to explain it and then stopped. I'm guessing it is do with the cookie being set by the middleware after the page has been cached. Would the documentation be fixed by adding@csrf_protect
into the stack of decorators?
Which is why a new ticket should be created, so the actual problem can be clearly identified and dealt with appropriately, without muddying the waters of this ticket. As far as this ticket goes, the current workaround documented in the CSRF docs is correct, and clearly so: vary_on_cookie does in fact add the Vary: Cookie header, which unequivocally fixes the problem of the response not having the Vary: Cookie header. And that's the only problem this ticket is concerned about.
If there is a different problem with the current CSRF docs that needs fixing, then it is a CSRF-specific problem unrelated to this ticket (even if it seems superficially related because it involves the cache_page header).
comment:7 by , 13 years ago
The reason why @vary_on_cookie and @cache_page doesn't work with CSRF pages is exactly what this ticket describes. The new-style CSRF page puts a CSRF token in the HTML. The page needs to be marked @vary_on_cookie because the CSRF token, and thus the cached page content, varies on a per-user basis. However, the @cache_page decorator caches the page before the CSRF middleware has a chance to set the CSRF cookie. As a result, @vary_on_cookie doesn't actually cause anything to vary -- there isn't a cookie to vary on at the time the @cache_page decorator determines the content to be cached.
This wasn't an issue with the old-style CSRF code because the page content was modified by the CSRF middleware. The current docs were accurate for the old CSRF approach, but not for the new, explicitly inserted {% csrf_token %}.
comment:8 by , 13 years ago
Thanks Russell for the clarification. I'm pretty sure it can be fixed by use of @csrf_protect
:
@cache_page(60 * 15) @vary_on_cookie @csrf_protect def my_view(request): pass
But however the docs are fixed on this issue, it doesn't technically address this bug, just one symptom of it.
comment:9 by , 13 years ago
Verified that Luke's fix works on both 1.3.X and trunk. In fact, vary_on_cookie isn't even needed, since csrf_protect patches the vary header -- the docs can just replace vary_on_cookie with csrf_protect.
Went a little ways down the path of an automated test to verify that this works, but since we have tests for all the component pieces already, the test ends up just being a verification that Python does indeed apply decorators in the expected order. Which is kind of a stupid thing to test.
Applying the doc fix to 1.3.X and trunk.
comment:12 by , 13 years ago
Has patch: | unset |
---|
comment:13 by , 13 years ago
Cc: | added |
---|
comment:14 by , 12 years ago
Cc: | added |
---|
comment:15 by , 12 years ago
Accepting this, since we're trying to eliminate the DDN state. There are still significant design decisions needed in terms of how this should be fixed, but I don't think there's any question it should be fixed (or else the cache_page
decorator should be removed). There's no scenario where you want full-response caching that omits key Vary headers.
comment:16 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:17 by , 12 years ago
Just ran into Pyramid's response callbacks concept, and I think we could consider introducing something like that as a fix for this bug. It would be a new method on HttpResponse
(in Pyramid it's on the request, but that's only because you often don't have access to the actual response yet; in Django you generally do) to register a callback to be executed right before the response is sent out (i.e. after all middleware). So the cache_page
decorator would just register a response callback to do the actual caching, thus the response that would be cached would be the final one, after it's been processed by all middleware. There may be other uses for this feature, but initially it could just be a private API until those uses become clearer.
Presuming cache_page
continued to be generated via decorator_from_middleware
from the cache middlewares, that would imply UpdateCacheMiddleware
would also register a response callback rather than doing the caching immediately. This would be a change in behavior for anyone who does not have UpdateCacheMiddleware
first in their middlewares (as recommended in the docs).
Actually, this would also obviate the need for the UpdateCacheMiddleware
/FetchFromCacheMiddleware
split; we could just go back to recommending the use of a single unified CacheMiddleware
placed last in MIDDLEWARE_CLASSES
, and the actual cached response would still always be the final about-to-be-sent-out response.
Thoughts?
comment:18 by , 10 years ago
Cc: | added |
---|
comment:19 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:21 by , 8 years ago
Needs documentation: | set |
---|
Reviewed PR which seems fine: Vary headers from cache_page decorator & other middlewares are taken into account by cachemiddleware, reg tests ok, new tests added.
Added comments for minor improvements in PR. Needs update in docs IMHO.
comment:23 by , 7 years ago
Cc: | added |
---|
comment:24 by , 20 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
I just got bitten by this; the docs about mixing CSRF and per-view caching are misleading. The
vary_on_cookie
decorator doesn't actually solve the fact that te CSRF token which is placed in the template of a cached view.Having sat with Russ, Alex, and Jannis on the matter, the conclusion is that the docs are simply wrong, and bear updating. To that end, see attached docs patch.