#19705 closed Bug (fixed)
CommonMiddleware handles If-None-Match incorrectly
Reported by: | Aymeric Augustin | Owned by: | Denis Cornehl |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | hirokiky@…, real.human@…, k@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Two middleware check ETags for unmodified responses: CommonMiddleware
and ConditionalGetMiddleware
and they do it inconsistently.
If the response's ETag matches the request's If-None-Match:
ConditionalGetMiddleware
changes the response code to 304, preserving all headers; the content gets removed later onCommonMiddleware
creates a newHttpResponseNotModified
without content and simply restores the cookies.
As a consequence, CommonMiddleware
returns a response without ETag, which is wrong. I detected this with RedBot on a Django site I run. Any site with USE_ETAGS = True
has this problem.
In general, wiping headers sounds like a bad idea. A 304 is supposed to have the same headers as the 200. (Well, the RFC is more complicated, but I think it's the general idea. Future versions of HTTP will likely require the Content-Length not to be 0.)
I believe that CommonMiddleware
should simply generate the ETag and not handle conditional content removal; that's the job of ConditionalGetMiddleware
.
For example, if one is using GzipMiddleware, the correct response chain is:
CommonMiddleware
computes the ETag,GzipMiddleware
compresses the content and modifies the ETag,ConditionalGetMiddleware
uses the modified ETag to decide if the response was modified or not.
This is a good reason to keep "ETag generation" and "Etag checking" concerns separate. The same argument applies to any middleware that sets or modifies ETags.
Unfortunately, CommonMiddleware
is documented to "take care of sending Not Modified responses, if appropriate", so this would be a backwards incompatible change.
Change History (24)
comment:1 by , 12 years ago
Description: | modified (diff) |
---|---|
Summary: | CommonMiddleware handles If-Modified-Since incorrectly → CommonMiddleware handles If-None-Match incorrectly |
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 12 years ago
comment:3 by , 12 years ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | changed from | to
Status: | new → assigned |
I send a pull-request to solve this problem https://github.com/django/django/pull/700.
comment:4 by , 12 years ago
This still leaves a dependency on the order of middleware for correct operation. I don't think Etags should be set or checked in middleware at all. We already have a USE_ETAGS
setting. Why not just return the HttpResponseNotModified
after all middleware has executed, when USE_ETAGS
is True
?
We don't need middleware to create Etags, either. We could hook into content and header assignment for HttpResponse
, so that Etags will always be set or updated whenever content or headers change, when USE_ETAGS
is True
.
Another benefit would be that we remove the repetitive requirement for ALL middleware that alters content or headers having to check the USE_ETAGS
setting and then conditionally recalculate and update the Etag header. It's currently very easy for a middleware author to alter content without realising that they must do this, which means we could be serving responses with stale Etag headers.
comment:5 by , 11 years ago
Cc: | added |
---|
comment:6 by , 11 years ago
Patch needs improvement: | set |
---|
Setting as "patch needs improvement" given the previous comment and the fact that the PR no longer merges cleanly.
comment:7 by , 11 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
I de-assign once time. because I won't try to fix this problem these days. sorry.
comment:8 by , 10 years ago
Since I stumbled on it while working with Django, I want to take care (if possible) of this issue.
What seems to be missing (apart from implementation) is this major design decision (of course there could be ways in between):
Way 1: Cleanup middlewares
- as aaugustin proposed
- change middleware behavior (backwards incompatible)
CommonMiddleware
only calculates the ETag, if setting is setGZipMiddleware
changes it probablyConditionalGetMiddleware
takes care about the response code- perhaps move
http.conditional_content_removal()
, since it then removes the content based on the response code - document usage and needed order of the middlewares
Way 2: remove ETag handling in middlewares
- as mrmachine proposed
- also backwards incompatible
- no ETag / Conditional Handling in middleware
- ETags are always calculated in the request (when the setting is set)
- conditional view processing is also moved, and always done (open on this, what about last-modified handling?)
- perhaps easier to use for users, since it hooks into the response-content
- open question would be, how the view decorators
@etag
and@condition
would work then
comment:9 by , 10 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:10 by , 10 years ago
after talking to apollo13 and mjtamlyn we lean towards way 1, for a first draft.
I'll look into the old PR and improve / finish it up.
comment:11 by , 10 years ago
This will get funny.
There is also some duplication between:
CommonMiddleware
ConditionalGetMiddleware
- the
@condition
decorator
also bugs fixed in only one of the places.
comment:12 by , 10 years ago
Was there any comment from those who prefer option 1 about all middleware that is ever written in the future having to check the value of the USE_ETAGS
setting if the middleware alters content/headers, and recalculate the etag? This seems very likely to not happen in a lot of cases. Not all middleware authors will know or care about etags, and if they write a middleware class that alters content/headers it will become buggy when deployed by other developers who have enabled this setting.
I think that option 1 is just patching the symptom of coupled middleware and repetitive etag handling in Django's own middleware, but doesn't do much for the wider middleware ecosystem or preventing developers from making the same mistakes in their own code.
comment:13 by , 9 years ago
Just rediscovered this issue.
- We refactored the conditional-get handling in 7a40fef17ab7918cbb1ddc3ba080f42b420f7a48. There we merged all the different implementations of conditional-get handling into one place.
- there, based on the original code, a new response is generated as not-modified. Only cookies are kept if they are there.
- I just started working on a new ticket that will deprecate / remove {{USE_ETAGS}} setting, and get the conditional-get handling out of {{CommonMiddleware}}. Then the {{ConditionalGetMiddleware}} will generate a {{ETag}} if there is none. By this middleware just being the last one, many of the edge-cases will be fixed.
Is there any knowledge on if we should keep all the headers in a 304 response? Before the refactor we had places that handled it differently.
At the moment we're returning a new 304 response all the time.
comment:14 by , 9 years ago
Cc: | added |
---|
At the moment there are several interrelated problems with the way Django handles ETags. Briefly:
- As stated in the original report above, we're not returning the correct headers with our
304s
. - Gzipped responses are not getting the benefits of
304s
since their ETags are not compared properly. See tickets #16035 and #26771. - Our processing of conditional requests does not follow the specification's rewritten precedence rules from 2014.
- The
condition()
,vary*()
,cache_control()
, andgzip_page()
decorators work properly in some undocumented orders but not in others.
Here are the changes I would suggest to address these problems:
- Move ETag handling out of
CommonMiddleware
and intoConditionalGetMiddleware
. This change has already been accepted (#26447) and partially implemented (PR 6393). (Though I'm less confident about the deprecation ofUSE_ETAGS
in that ticket.) - Change
GZipMiddleware
to remove the;gzip
token from the incoming ETags inprocess_request()
. That will allow comparisons to work properly. Document this. - Change the
304
headers to match the specification. - Change our conditional view logic to match the (simpler) specified precedence rules.
- Change our use of
GzipFile
to specify a modification time of 0. That will make our gzip output dependent only on the response body, which means that we can usefully compare ETags on gzipped content, which means that we don't have to care about the order of thegzip_page()
decorator. - Document that the
condition()
decorator should be belowvary()
andcache_control()
so that those headers can be set properly.
In more detail:
mrmachine's comment raises a fair point. The fundamental issue here is that Django's architecture strives to be layered and decoupled, but ETags are by definition highly coupled to the response, so it's difficult to isolate them to one layer (whether that be middleware or a decorator).
That said, the specification only requires that the ETag change when the response body changes, and I don't think that's a very common middleware behavior. Only GZipMiddleware
meets that definition among the middleware in core, for example. And if the body is changed, the ETag can be changed in a manner similar to what I'm suggesting for GZipMiddleware
. I do think this needs to be documented, though.
Regarding which headers to send back with 304s
, the specification says this:
The server generating a 304 response MUST generate any of the following header fields that would have been sent in a 200 (OK) response to the same request:
Cache-Control
,Content-Location
,Date
,ETag
,Expires
, andVary
. Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate representation metadata other than the above listed fields unless said metadata exists for the purpose of guiding cache updates (e.g.,Last-Modified
might be useful if the response does not have anETag
field).
So I think we should return those headers from the response, along with the cookies (since that is what Django already does, and since it's apparently very common despite not being mandated by the standard).
I'm able to work on this, but the next step depends on the fate of PR 6393 , since that accomplishes some of the reorganization mentioned above.
comment:15 by , 8 years ago
Another solution to the gzip problem (that is, that we never match a gzipped ETag
because we modify it on the way out but not on the way in) would be to have GZipMiddleware
change the ETag
to a weak ETag
. That would be simple, consistent with the specification, and well-precedented:
- nginx does this (as of version 1.7.3) when it gzips responses with an
ETag
- Rails and Rake have recently switched to using weak
ETags
comment:18 by , 8 years ago
PR for having GZipMiddleware
make ETags
weak to allow conditional responses.
comment:19 by , 8 years ago
PR for having GZipMiddleware set gzip modification times to 0.
I'm not aware of any downside, and the advantage is that ConditionalGetMiddleware
will work (that is, produce 304 Not Modified responses) on gzipped content (e.g. if the order of the middlewares is reversed, or if the gzip_page()
view decorator is used).
This usage is allowed by the specification ("MTIME = 0 means no time stamp is available", Section 2.3.1 of RFC 1952) and is in common use (for example, Java's GZipOutputStream
sets the MTIME to 0).
The discussion below https://code.djangoproject.com/ticket/16035#comment:9 was closed as a duplicate of this ticket.