Opened 3 months ago

Last modified 3 months ago

#35838 new New feature

Support Requests with Transfer-Encoding: Chunked

Reported by: Klaas van Schelven Owned by:
Component: HTTP handling Version: 5.0
Severity: Normal Keywords:
Cc: Carlton Gibson, bcail, Natalia Bidart, Claude Paroz Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Klaas van Schelven)

Django's request.read() returns 0 bytes when there's no Content-Length header.
i.e. it silently fails.

But not having a Content-Length header is perfectly fine when there's a HTTP/1.1 Transfer-Encoding: Chunked request.

WSGI servers like gunicorn and mod_wsgi are able to handle this just fine, i.e. Gunicorn handles the hexidecimally encoded lengths and just passes you the right chunks and Apache's mod_wsgi does the same I believe.

Discussions/docs over at Gunicorn / mod_wsgi:

The actual single line of code that's problematic is this one: https://github.com/django/django/blob/97c05a64ca87253e9789ebaab4b6d20a1b2370cf/django/core/handlers/wsgi.py#L77

My personal reason I ran into this:

which is basically solved by using a lot of lines to undo the effects of that single line of code:

import django
from django.core.handlers.wsgi import WSGIHandler, WSGIRequest

os.environ.setdefault('DJANGO_SETTINGS_MODULE', '....


class MyWSGIRequest(WSGIRequest):

    def __init__(self, environ):
        super().__init__(environ)

        if "CONTENT_LENGTH" not in environ and "HTTP_TRANSFER_ENCODING" in environ:
            # "unlimit" content length
            self._stream = self.environ["wsgi.input"]


class MyWSGIHandler(WSGIHandler):
    request_class = MyWSGIRequest


def my_get_wsgi_application():
    # Like get_wsgi_application, but returns a subclass of WSGIHandler that uses a custom request class.
    django.setup(set_prefix=False)
    return MyWSGIHandler()


application = my_get_wsgi_application()

But I'd rather not solve this in my own application only, but have it be a Django thing. In the Gunicorn links, there's some allusion to "this won't happen b/c wsgi spec", but that seems like a bad reason from my perspective. At the very least request.read() should not just silently give a 0-length answer. And having tools available such that you don't need to make a "MyXXX" hierarchy would also be nice.

That's the bit for "actually getting request.read() to work when behind gunicorn".

There's also the part where this doesn't work for the debugserver. Which is fine, given its limited scope. But an error would be better than silently returning nothing (again).

My solution for that one is the following middleware:

class DisallowChunkedMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if "HTTP_TRANSFER_ENCODING" in request.META and \
                request.META["HTTP_TRANSFER_ENCODING"].lower() == "chunked" and \
                not request.META.get("wsgi.input_terminated"):

            # If we get here, it means that the request has a Transfer-Encoding header with a value of "chunked", but we
            # have no wsgi-layer handling for that. This probably means that we're running the Django development
            # server, and as such our fixes for the Gunicorn/Django mismatch that we put in wsgi.py are not in effect.
            raise ValueError("This server is not configured to support Chunked Transfer Encoding (for requests)")

        return self.get_response(request)

Some links:

All of the above is when using WSGI (I did not test/think about ASGI)

Change History (17)

comment:1 by Klaas van Schelven, 3 months ago

Summary: request.read() returns empty for Rueqests w/ Transfer-Encoding: Chunkedrequest.read() returns empty for Requests w/ Transfer-Encoding: Chunked

comment:2 by Carlton Gibson, 3 months ago

Cc: Carlton Gibson added
Component: UncategorizedHTTP handling

comment:3 by Klaas van Schelven, 3 months ago

Description: modified (diff)

comment:4 by Klaas van Schelven, 3 months ago

Side-note: I found stumbling upon this bug quite surprising: you would think that requests w/ chunked transfer-encoding are used often enough for this to have surfaced more broadly _at some point_, but I couldn't find much more than the things I already linked. Perhaps it's not broadly used? Or perhaps it is, but in contexts where it is Django is always behind some proxy that removes it, and sends the WSGI server (and Django) something "more plain"?

comment:5 by bcail, 3 months ago

Cc: bcail added

comment:6 by bcail, 3 months ago

Here's a relevant section of the WSGI spec.

comment:7 by Klaas van Schelven, 3 months ago

WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

That's the part that's relevant to the present discussion (I think it's also the only part, but I might be wrong)

What do you take that to say w/ regards to the above? Because e.g. Gunicorn "decodes" just fine (it parses the hex-coded chunk-lenghts and removes those from the stream as seen by Django) as long as you don't expect "decoding" to mean "also adds fake synthetic Content-Length header"

Last edited 3 months ago by Klaas van Schelven (previous) (diff)

comment:8 by bcail, 3 months ago

Here's the first part of that paragraph:

However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI internal communications. WSGI applications must not generate any “hop-by-hop” headers, attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming “hop-by-hop” headers in the environ dictionary.

There's also this quote, which is actually in a different section of the spec:

Applications and middleware are forbidden from using HTTP/1.1 “hop-by-hop” features or headers...

There's also these three quotes from the issues you listed.

It seems like the idea from the spec is that the WSGI servers would de-chunk the requests, so Django would see the full request, not chunks.

Maybe it would be good for Django to not follow the spec here, but if so, it would probably be good to keep that in mind and explicitly say so in the docs.

comment:9 by Klaas van Schelven, 3 months ago

first part of that paragraph

but:

  • "applications must not generate" does not apply here (that's for Response Transfer-Encoding)
  • "attempt to use .. generate such headers" same
  • "rely on the content" I don't think my proposed direction for Django is relying on anything other than _not_ relying on Content-Length

Regarding the quotes: admittedly I didn't have the full discussions top-of-mind after spending a day debugging. I suppose I also discounted some points I didn't agree with. But if I reread those whole threads, I mostly get the sense of Benoit asking "why can't we just fix this" and it falling on deaf ears... btw the "FWIW" by Graham actually points to him agreeing with Benoit.

By the way, I can't actually find the part of the spec that says Content-Length is required; nor can I find the part that says "if it's not there, just assume 0". I'm using CTRL-F... perhaps it's implied through something else?

having said all that, I have an open point and an open question:

  1. _even if_ one would say that Django cannot read from a request's body on the application-side of WSGI, as per the WSGI spec, that does not imply Django should just silently throw away the data for that case. Would raising an exception not be much better?
  1. does the "solution" (workaround) that I proposed (and for my own project: implemented) have any actual downsides other than not following the WSGI spec? I have personal reasons to ask, but the answer would also be generally useful in telling us whether it's a viable general solution
Last edited 3 months ago by Klaas van Schelven (previous) (diff)

comment:10 by Klaas van Schelven, 3 months ago

you would think that requests w/ chunked transfer-encoding are used often enough for this to have surfaced more broadly

though probably not much by browsers but I ran into this in the context of a non-browser client

comment:11 by bcail, 3 months ago

_even if_ one would say that Django cannot read from a request's body on the application-side of WSGI, as per the WSGI spec, that does not imply Django should just silently throw away the data for that case. Would raising an exception not be much better?

Yes, I agree - if Django can see that there is data, but it's not correct according to the spec, it would be better to raise an exception than silently throw away that data.

does the "solution" (workaround) that I proposed (and for my own project: implemented) have any actual downsides other than not following the WSGI spec? I have personal reasons to ask, but the answer would also be generally useful in telling us whether it's a viable general solution

I don't know - it would need to be analyzed, tested, ... One immediate question to answer would be whether it causes any problems for WSGI servers that implement the spec correctly. (Note: if you think that the WSGI spec does allow for chunked transfer encoding at the Django app level, you can also try to convince people of that.)

One thing you could try is to open a PR that implements the change you want to make in Django, and let people give feedback on it and see if they like it.

comment:12 by Sarah Boyce, 3 months ago

Cc: Natalia Bidart Claude Paroz added
Summary: request.read() returns empty for Requests w/ Transfer-Encoding: ChunkedSupport Requests with Transfer-Encoding: Chunked
Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

I think historically there has been an assumption that Content-Length will always be present (see tickets such as #3057)
As I can't find any reference of support for Transfer-Encoding: Chunked being added, I believe we just don't have support for it and we should treat this as a new feature to Django

Tentatively accepted. We could also repurpose #35289 in order to add Transfer-Encoding: Chunked support more generally and mark this as a duplicate
Will cc a couple of folks involved in previous discussions to see if they agree

comment:13 by Carlton Gibson, 3 months ago

…add Transfer-Encoding: Chunked support more generally and mark this as a duplicate.

They're certainly both closely related. Django receives the de-chunked request, but then doesn't handle it as expected (because of the missing Content Length header).

...support more generally...

Given that both WSGI and ASGI specify that the server de-chunks the request, I'm not sure what else would be needed here? 🤔

comment:14 by bcail, 3 months ago

Given that both WSGI and ASGI specify that the server de-chunks the request

What does it mean to "de-chunk" the request? Is there a description somewhere I could look at?

comment:15 by Klaas van Schelven, 3 months ago

Given that both WSGI and ASGI specify that the server de-chunks the request, I'm not sure what else would be needed here?

whatever the precise definition (spec) that the first part of that sentence implies... "what else would be needed here" can be (and was) answered by me as "don't silently throw away request bodies when Content-Length is missing"

comment:16 by bcail, 3 months ago

It seems like "de-chunk the request" is another way of saying "decoding any inbound Transfer-Encoding, including chunked encoding if applicable" (from WSGI PEP3333). RFC2616 describes how to decode a chunked request:

       length := 0
       read chunk-size, chunk-extension (if any) and CRLF
       while (chunk-size > 0) {
          read chunk-data and CRLF
          append chunk-data to entity-body
          length := length + chunk-size
          read chunk-size and CRLF
       }
       read entity-header
       while (entity-header not empty) {
          append entity-header to existing header fields
          read entity-header
       }
       Content-Length := length
       Remove "chunked" from Transfer-Encoding

This looks to me like the WSGI server should set the Content-Length header, and remove "chunked" from Transfer-Encoding, before passing the request to Django (unless Django and the WSGI servers decide to ignore the spec here).

comment:17 by Klaas van Schelven, 3 months ago

RFC2616 describes how to ​decode a chunked request:

This looks to me like the WSGI server should set the Content-Length header

That is indeed one interpretation, but I'm not sure it's the only valid one. In the quoted RFC Content-Length is calculated, but the RFC does not say this should be translated into a header by a wsgi server (obviously, since the RFC doesn't deal with WSGI).

Note: See TracTickets for help on using tickets.
Back to Top