Opened 3 months ago
Last modified 3 months ago
#35838 new New feature
Support Requests with Transfer-Encoding: Chunked
Reported by: | Klaas van Schelven | Owned by: | |
---|---|---|---|
Component: | HTTP handling | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Carlton Gibson, bcail, Natalia Bidart, Claude Paroz | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Django's request.read() returns 0 bytes when there's no Content-Length header.
i.e. it silently fails.
But not having a Content-Length header is perfectly fine when there's a HTTP/1.1 Transfer-Encoding: Chunked request.
WSGI servers like gunicorn and mod_wsgi are able to handle this just fine, i.e. Gunicorn handles the hexidecimally encoded lengths and just passes you the right chunks and Apache's mod_wsgi does the same I believe.
Discussions/docs over at Gunicorn / mod_wsgi:
- https://github.com/benoitc/gunicorn/issues/1264
- https://github.com/benoitc/gunicorn/issues/605
- https://github.com/benoitc/gunicorn/issues/2947
- https://modwsgi.readthedocs.io/en/develop/configuration-directives/WSGIChunkedRequest.html
The actual single line of code that's problematic is this one: https://github.com/django/django/blob/97c05a64ca87253e9789ebaab4b6d20a1b2370cf/django/core/handlers/wsgi.py#L77
My personal reason I ran into this:
which is basically solved by using a lot of lines to undo the effects of that single line of code:
import django from django.core.handlers.wsgi import WSGIHandler, WSGIRequest os.environ.setdefault('DJANGO_SETTINGS_MODULE', '.... class MyWSGIRequest(WSGIRequest): def __init__(self, environ): super().__init__(environ) if "CONTENT_LENGTH" not in environ and "HTTP_TRANSFER_ENCODING" in environ: # "unlimit" content length self._stream = self.environ["wsgi.input"] class MyWSGIHandler(WSGIHandler): request_class = MyWSGIRequest def my_get_wsgi_application(): # Like get_wsgi_application, but returns a subclass of WSGIHandler that uses a custom request class. django.setup(set_prefix=False) return MyWSGIHandler() application = my_get_wsgi_application()
But I'd rather not solve this in my own application only, but have it be a Django thing. In the Gunicorn links, there's some allusion to "this won't happen b/c wsgi spec", but that seems like a bad reason from my perspective. At the very least request.read() should not just silently give a 0-length answer. And having tools available such that you don't need to make a "MyXXX" hierarchy would also be nice.
That's the bit for "actually getting request.read() to work when behind gunicorn".
There's also the part where this doesn't work for the debugserver. Which is fine, given its limited scope. But an error would be better than silently returning nothing (again).
My solution for that one is the following middleware:
class DisallowChunkedMiddleware: def __init__(self, get_response): self.get_response = get_response def __call__(self, request): if "HTTP_TRANSFER_ENCODING" in request.META and \ request.META["HTTP_TRANSFER_ENCODING"].lower() == "chunked" and \ not request.META.get("wsgi.input_terminated"): # If we get here, it means that the request has a Transfer-Encoding header with a value of "chunked", but we # have no wsgi-layer handling for that. This probably means that we're running the Django development # server, and as such our fixes for the Gunicorn/Django mismatch that we put in wsgi.py are not in effect. raise ValueError("This server is not configured to support Chunked Transfer Encoding (for requests)") return self.get_response(request)
Some links:
- This one seemed related, but probably isn't (it's about forms): https://code.djangoproject.com/ticket/35289
- This seemed related, but is about uwsgi and uses special uwsgi features: https://github.com/btimby/uwsgi-chunked/blob/master/uwsgi_chunked/chunked.py
All of the above is when using WSGI (I did not test/think about ASGI)
Change History (17)
comment:1 by , 3 months ago
Summary: | request.read() returns empty for Rueqests w/ Transfer-Encoding: Chunked → request.read() returns empty for Requests w/ Transfer-Encoding: Chunked |
---|
comment:2 by , 3 months ago
Cc: | added |
---|---|
Component: | Uncategorized → HTTP handling |
comment:3 by , 3 months ago
Description: | modified (diff) |
---|
comment:4 by , 3 months ago
comment:5 by , 3 months ago
Cc: | added |
---|
comment:7 by , 3 months ago
WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.
That's the part that's relevant to the present discussion (I think it's also the only part, but I might be wrong)
What do you take that to say w/ regards to the above? Because e.g. Gunicorn "decodes" just fine (it parses the hex-coded chunk-lenghts and removes those from the stream as seen by Django) as long as you don't expect "decoding" to mean "also adds fake synthetic Content-Length header"
comment:8 by , 3 months ago
Here's the first part of that paragraph:
However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI internal communications. WSGI applications must not generate any “hop-by-hop” headers, attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming “hop-by-hop” headers in the environ dictionary.
There's also this quote, which is actually in a different section of the spec:
Applications and middleware are forbidden from using HTTP/1.1 “hop-by-hop” features or headers...
There's also these three quotes from the issues you listed.
It seems like the idea from the spec is that the WSGI servers would de-chunk the requests, so Django would see the full request, not chunks.
Maybe it would be good for Django to not follow the spec here, but if so, it would probably be good to keep that in mind and explicitly say so in the docs.
comment:9 by , 3 months ago
first part of that paragraph
but:
- "applications must not generate" does not apply here (that's for Response Transfer-Encoding)
- "attempt to use .. generate such headers" same
- "rely on the content" I don't think my proposed direction for Django is relying on anything other than _not_ relying on Content-Length
Regarding the quotes: admittedly I didn't have the full discussions top-of-mind after spending a day debugging. I suppose I also discounted some points I didn't agree with. But if I reread those whole threads, I mostly get the sense of Benoit asking "why can't we just fix this" and it falling on deaf ears... btw the "FWIW" by Graham actually points to him agreeing with Benoit.
By the way, I can't actually find the part of the spec that says Content-Length is required; nor can I find the part that says "if it's not there, just assume 0". I'm using CTRL-F... perhaps it's implied through something else?
having said all that, I have an open point and an open question:
- _even if_ one would say that Django cannot read from a request's body on the application-side of WSGI, as per the WSGI spec, that does not imply Django should just silently throw away the data for that case. Would raising an exception not be much better?
- does the "solution" (workaround) that I proposed (and for my own project: implemented) have any actual downsides other than not following the WSGI spec? I have personal reasons to ask, but the answer would also be generally useful in telling us whether it's a viable general solution
comment:10 by , 3 months ago
you would think that requests w/ chunked transfer-encoding are used often enough for this to have surfaced more broadly
though probably not much by browsers but I ran into this in the context of a non-browser client
comment:11 by , 3 months ago
_even if_ one would say that Django cannot read from a request's body on the application-side of WSGI, as per the WSGI spec, that does not imply Django should just silently throw away the data for that case. Would raising an exception not be much better?
Yes, I agree - if Django can see that there is data, but it's not correct according to the spec, it would be better to raise an exception than silently throw away that data.
does the "solution" (workaround) that I proposed (and for my own project: implemented) have any actual downsides other than not following the WSGI spec? I have personal reasons to ask, but the answer would also be generally useful in telling us whether it's a viable general solution
I don't know - it would need to be analyzed, tested, ... One immediate question to answer would be whether it causes any problems for WSGI servers that implement the spec correctly. (Note: if you think that the WSGI spec does allow for chunked transfer encoding at the Django app level, you can also try to convince people of that.)
One thing you could try is to open a PR that implements the change you want to make in Django, and let people give feedback on it and see if they like it.
comment:12 by , 3 months ago
Cc: | added |
---|---|
Summary: | request.read() returns empty for Requests w/ Transfer-Encoding: Chunked → Support Requests with Transfer-Encoding: Chunked |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → New feature |
I think historically there has been an assumption that Content-Length will always be present (see tickets such as #3057)
As I can't find any reference of support for Transfer-Encoding: Chunked being added, I believe we just don't have support for it and we should treat this as a new feature to Django
Tentatively accepted. We could also repurpose #35289 in order to add Transfer-Encoding: Chunked support more generally and mark this as a duplicate
Will cc a couple of folks involved in previous discussions to see if they agree
comment:13 by , 3 months ago
…add Transfer-Encoding: Chunked support more generally and mark this as a duplicate.
They're certainly both closely related. Django receives the de-chunked request, but then doesn't handle it as expected (because of the missing Content Length header).
...support more generally...
Given that both WSGI and ASGI specify that the server de-chunks the request, I'm not sure what else would be needed here? 🤔
comment:14 by , 3 months ago
Given that both WSGI and ASGI specify that the server de-chunks the request
What does it mean to "de-chunk" the request? Is there a description somewhere I could look at?
comment:15 by , 3 months ago
Given that both WSGI and ASGI specify that the server de-chunks the request, I'm not sure what else would be needed here?
whatever the precise definition (spec) that the first part of that sentence implies... "what else would be needed here" can be (and was) answered by me as "don't silently throw away request bodies when Content-Length is missing"
comment:16 by , 3 months ago
It seems like "de-chunk the request" is another way of saying "decoding any inbound Transfer-Encoding, including chunked encoding if applicable" (from WSGI PEP3333). RFC2616 describes how to decode a chunked request:
length := 0 read chunk-size, chunk-extension (if any) and CRLF while (chunk-size > 0) { read chunk-data and CRLF append chunk-data to entity-body length := length + chunk-size read chunk-size and CRLF } read entity-header while (entity-header not empty) { append entity-header to existing header fields read entity-header } Content-Length := length Remove "chunked" from Transfer-Encoding
This looks to me like the WSGI server should set the Content-Length header, and remove "chunked" from Transfer-Encoding, before passing the request to Django (unless Django and the WSGI servers decide to ignore the spec here).
comment:17 by , 3 months ago
RFC2616 describes how to decode a chunked request:
This looks to me like the WSGI server should set the Content-Length header
That is indeed one interpretation, but I'm not sure it's the only valid one. In the quoted RFC Content-Length is calculated, but the RFC does not say this should be translated into a header by a wsgi server (obviously, since the RFC doesn't deal with WSGI).
Side-note: I found stumbling upon this bug quite surprising: you would think that requests w/ chunked transfer-encoding are used often enough for this to have surfaced more broadly _at some point_, but I couldn't find much more than the things I already linked. Perhaps it's not broadly used? Or perhaps it is, but in contexts where it is Django is always behind some proxy that removes it, and sends the WSGI server (and Django) something "more plain"?