Opened 9 months ago
Last modified 9 months ago
#35289 new Bug
Chunked transfer encoding is not handled correctly by MultiPartParser
Reported by: | Kyle Agronick | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | transfer-encoding, chunked, multi-part |
Cc: | bcail, Carlton Gibson | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I am trying to post large files from C#. The C# implementation conforms to the HTTP1.1 protocol and does not allow chunked transfer encoding and a content-length to be set at the same time.
Messages MUST NOT include both a Content-Length header field and a non-identity transfer-coding. If the message does include a non-identity transfer-coding, the Content-Length MUST be ignored." (RFC 2616, Section 4.4)
In the MultiPartParser
a non-existent content-length header is treated as 0:
https://github.com/django/django/blob/main/django/http/multipartparser.py#L96
When the 0 length is read, nothing gets parsed:
https://github.com/django/django/blob/main/django/http/multipartparser.py#L147
If you remove the check in the second link, everything works correctly. It works fine without an explicit content-length.
With Nginx to turn off request buffering for a single request you need to use chunked transfer encoding.
Change History (12)
comment:1 by , 9 months ago
comment:2 by , 9 months ago
It looks like the only place content_length
is used is in MemoryFileUploadHandler
where self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE
. Content-length can be spoofed causing the MemoryFileUploadHandler
to read everything into memory. It seems like the complexity of TemporaryFileUploadHandler
and MemoryFileUploadHandler
could be reduced by having one file handler that uses SpooledTemporaryFile(max_size=settings.FILE_UPLOAD_MAX_MEMORY_SIZE)
.
follow-up: 4 comment:3 by , 9 months ago
Cc: | added |
---|
This Daphne issue may be relevant. There's a quote in that issue (from the ASGI spec) that the ASGI server is supposed to dechunk the request - so shouldn't it come to Django as a regular (not chunked) request? Hopefully someone else will weigh in here as well.
What ASGI server are you using?
comment:4 by , 9 months ago
Replying to bcail:
This Daphne issue may be relevant. There's a quote in that issue (from the ASGI spec) that the ASGI server is supposed to dechunk the request - so shouldn't it come to Django as a regular (not chunked) request? Hopefully someone else will weigh in here as well.
What ASGI server are you using?
You can see in the ASGI spec mentioned in that ticket it says "a response that is given to the server with no Content-Length may be chunked as the server sees fit". Which is what is happening. Everything is streaming into Django correctly. It just doesn't know how to handle a non-existent content length.
That seems to be the same issue and I would argue this is a Django issue as ASGI servers are streaming the results correctly. If the ASGI servers were expected to buffer the files you would introduce blocking IO with tempfiles or OOM issues if it all happened in memory. I came across this issue while trying to implement large file uploads >6GB. The less buffering the better.
I am using Daphene in development and Uvicorn in production. My hacky middleware fix with the content-length seems to work on both of them.
comment:5 by , 9 months ago
Cc: | added |
---|
Hello Kyle!
RFC 2616 is obsoleted by RFC 7230 which in turn is obsoleted by RFC 9110 and RFC 9112, and as far as I read in RFC 9112, both headers could co-exist (but Content-Length
should be ignored if Transfer-Encoding
is sent):
Early implementations of Transfer-Encoding would occasionally send both a chunked transfer coding for message framing and an estimated Content-Length header field for use by progress bars. This is why Transfer-Encoding is defined as overriding Content-Length, as opposed to them being mutually incompatible.
But, at the same time, I found an analogous issue for Daphne, and I see that you have just commented on the other referenced Daphne issue, so this may be a valid issue. I will cc/Carlton to confirm before fully triaging.
comment:6 by , 9 months ago
You can see in the ASGI spec mentioned in that ticket it says "a response that is given to the server with no Content-Length may be chunked as the server sees fit". Which is what is happening. Everything is streaming into Django correctly. It just doesn't know how to handle a non-existent content length.
I actually understand that to be talking about responses coming *from* Django (or another ASGI app) back to the ASGI server, not a request going into Django.
comment:7 by , 9 months ago
This comment on the related request-centred Daphne issue suggests that that request body is correctly dechunked before being passed off to Django. The question we had there was "why it's not getting out of Django's HttpRequest._load_post_and_files()
" Kyle's suggestion looks on point: the missing content length isn't an error (IIUC).
I'd guess here we'd would set expected content length to len(request.body)
if the Transfer-Encoding header field was set 🤔
comment:8 by , 9 months ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 5.0 → dev |
Following Carlton's comment and the various links/issues, I think this is ticket should be accepted. Kyle, would you like to prepare a patch?
comment:9 by , 9 months ago
request body is correctly dechunked before being passed off to Django
If a request body is dechunked, should the Transfer-Encoding header still be set to "chunked" and the Content-Length header be missing? Would it make sense to change those if the request body is dechunked?
comment:10 by , 9 months ago
I can make a patch. There are a few ways to fix this. I think the best way should be a new upload file handler that is the default which uses SpooledTemporaryFile
so it can cleanly fall back if there is too much data while being fast for small requests. I think we also need to check DATA_UPLOAD_MAX_MEMORY_SIZE
as we are reading the stream into the temp file and throw a SuspiciousOperation
if we read too much. Looking at the docs it seems FILE_UPLOAD_MAX_MEMORY_SIZE
and DATA_UPLOAD_MAX_MEMORY_SIZE
are named similarly but one controls if we write to disk and one controls if we abort the request.
That is a bit ambitious considering I've never contributed to Django and would probably require deprecating the current handlers. If we don't want to do the ambitious approach, the check in MemoryFileUploadHandler
that does self.activated = content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE
could be self.activated = content_length and content_length <= settings.FILE_UPLOAD_MAX_MEMORY_SIZE
and cause all requests that don't have a content-length to fall back to TemporaryFileUploadHandler
in the default configuration. That would mean MemoryFileUploadHandler
won't work with chunked transfer-encoding without a content-length. Perhaps add a note in the docs since reading an unspecified content-length into memory is dangerous anyways.
Thoughts?
comment:12 by , 9 months ago
Replying to bcail:
The ASGI handler might be relevant.
That is interesting. So it seems like it could be optimized even further. In my use case with the 6GB upload it would create two 6GB files on disk. _load_post_and_files
seems like it should be hooked into that directly with the MultiPartParser
instead of buffering it all to disk and then reading it out but that seems even more ambitious. Another problem I see there is that once it spools to disk it is going to block the event loop because it is doing blocking IO. It should be doing something like asgiref's sync_to_async
.
I can confirm that just by making a middleware that does
everything gets parsed correctly even though the
CONTENT_LENGTH
is much more than 1.Maybe this issue went undetected for so long because WSGI does not have good support for chunked transfer encoding. But ASGI does.