#29510 closed Bug (invalid)
QueryDict.copy() returns closed files when the type of file is TemporaryUploadedFile
Reported by: | Liquid Scorpio | Owned by: | Dan Madere |
---|---|---|---|
Component: | File uploads/storage | Version: | 1.11 |
Severity: | Normal | Keywords: | QueryDict, upload, file |
Cc: | Jeff, Herbert Fortes, Dmytro Litvinov | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When uploaded file size is greater than FILE_UPLOAD_MAX_MEMORY_SIZE
, Django uses TemporaryUploadedFile
to represent the file object. However, when executing .copy()
on a QueryDict
containing such a file, the returned object has the file but it is in closed state (seekable()
is False
).
Expected: File should be present in open state (seekable()
should be True
)
Below is a reproducible example and also contains version details:
Python 2.7.12 (default, Dec 4 2017, 14:50:18) In [18]: import django In [19]: django.VERSION Out[19]: (1, 11, 11, u'final', 0) In [20]: from django.http.request import QueryDict In [21]: from django.core.files.uploadedfile import TemporaryUploadedFile In [22]: d = QueryDict(mutable=True) In [23]: f = TemporaryUploadedFile('test.jpg', 'image/jpeg', 100, 'utf-8') In [24]: f.seekable() Out[24]: True In [25]: d.appendlist('image', f) In [26]: d['image'].seekable() Out[25]: True In [27]: c = d.copy() In [28]: c['image'].seekable() Out[28]: False In [30]:
Change History (21)
comment:1 by , 7 years ago
Component: | Uncategorized → File uploads/storage |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 7 years ago
Description: | modified (diff) |
---|
comment:3 by , 7 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:4 by , 7 years ago
This is not the reported issue, however:
In python 3, the copy.deepcopy
function is not able to copy TemporaryUploadedFiles
because python's tempfile.TemporaryFile
is not serializable. I'm not certain how serious the issue is, or if we want to fix it, but I believe this makes QueryDicts unable to be copied if it contains a TemporaryUploadedFile
obj.
comment:5 by , 7 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:6 by , 7 years ago
Cc: | added |
---|
comment:7 by , 5 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:8 by , 5 years ago
The underlying issue here is a change in Python attempting to pickle the file.
On python 2.7, I can reproduce the reported issue:
In [19]: file = tempfile.NamedTemporaryFile(suffix='.upload') In [20]: copy.deepcopy(file) Out[20]: <closed file '<uninitialized file>', mode '<uninitialized file>' at 0x7f45f2d0b420>
On Python 3.6 I can't reproduce the original issue, instead I get Tim's issue:
In [17]: file = tempfile.NamedTemporaryFile(suffix='.upload') In [18]: copy.deepcopy(file) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-18-36f4ecfb14c1> in <module> ----> 1 copy.deepcopy(file) /usr/local/lib/python3.6/copy.py in deepcopy(x, memo, _nil) 167 reductor = getattr(x, "__reduce_ex__", None) 168 if reductor: --> 169 rv = reductor(4) 170 else: 171 reductor = getattr(x, "__reduce__", None) /usr/local/lib/python3.6/tempfile.py in func_wrapper(*args, **kwargs) 483 @_functools.wraps(func) 484 def func_wrapper(*args, **kwargs): --> 485 return func(*args, **kwargs) 486 # Avoid closing the file as long as the wrapper is alive, 487 # see issue #18879. TypeError: cannot serialize '_io.BufferedRandom' object
Personally I think the 2.7 issue is not as bad as the 3.x since it at least allows for the copy to proceed (while silently closing the file).
My suggestion would be to preemptively check that the objects being copied are serializable or to catch the TypeError and raise a different error indicating the user that the query dict contains a non-seralizable value (like file, etc..)
follow-up: 10 comment:9 by , 5 years ago
Did you explore possible handling of such file objects in QueryDict.__deepcopy__
?
follow-up: 12 comment:10 by , 5 years ago
Replying to Claude Paroz:
Did you explore possible handling of such file objects in
QueryDict.__deepcopy__
?
To keep the file on copy?
If so then yes, we can check value for UploadedFile and create a new instance that points to the same underlying file (although I guess that's more shallow than deep copy).
To have fully deep copy we'd have to create new tmp if is on disk or somehow copy the file stream.
That said, I think "soft" copy should be enough and is more expected behavior for the file itself.
My concern with this approach was if we have any other non file values that are not serializable, would this open the door to having to handle those too?
Guess not since there's only so many types that are handled by QueryDict.
comment:11 by , 2 years ago
Owner: | changed from | to
---|
follow-up: 17 comment:12 by , 2 years ago
Replying to Paulo:
Replying to Claude Paroz:
Did you explore possible handling of such file objects in
QueryDict.__deepcopy__
?
To keep the file on copy?
If so then yes, we can check value for UploadedFile and create a new instance that points to the same underlying file (although I guess that's more shallow than deep copy).
To have fully deep copy we'd have to create new tmp if is on disk or somehow copy the file stream.
After trying it, I don't think either of these suggestions is workable, considering the nature of TemporaryUploadedFile
. It has logic where the file is deleted as soon as it's closed. When copying/streaming the old file's content to the new one, we have a problem where the new file deletes itself immediately.
I stepped back, and pondered why are people copying a QueryDict
anyway, and what do they expect to happen to a TemporaryUploadedFile
inside? https://docs.djangoproject.com/en/dev/ref/request-response/#querydict-objects:
class QueryDict¶ In an HttpRequest object, the GET and POST attributes are instances of django.http.QueryDict, a dictionary-like class customized to deal with multiple values for the same key. This is necessary because some HTML form elements, notably <select multiple>, pass multiple values for the same key. The QueryDicts at request.POST and request.GET will be immutable when accessed in a normal request/response cycle. To get a mutable version you need to use QueryDict.copy().
That leads me to think that people are copying a QueryDict
because it's a nice starting point to have all the query params organized, but they want to modify it, and pass to the template. Doing so would result in a This QueryDict instance is immutable
exception, and the advice above, so they try to copy the QueryDict
before modifying it.
I propose we omit TemporaryUploadedFile
values on deep copy of a QueryDict
, and will open a PR for feedback.
follow-up: 15 comment:14 by , 2 years ago
I stepped back, and pondered why are people copying a QueryDict anyway...
This seems like the right thought to me. 🤔
I have to admit I'm struggling to see the use-case — Folks would have to be making a copy of request.FILES
for this to come up no? Why would you want a mutable copy of that? (Maybe more ☕️ is needed :)
Copying request.GET
in order to edit a query string makes sense. Uploaded files though... 🤔
(Sorry if I'm being dense)
comment:15 by , 2 years ago
Replying to Carlton Gibson:
I stepped back, and pondered why are people copying a QueryDict anyway...
This seems like the right thought to me. 🤔
I have to admit I'm struggling to see the use-case — Folks would have to be making a copy of
request.FILES
for this to come up no? Why would you want a mutable copy of that? (Maybe more ☕️ is needed :)
Thanks Carlton!
Not necessarily, it could also be making a copy of request.POST
, which surprisingly includes the files too. My theory is that most people believe that copying it only includes the query params, because that's the useful part to append more data to, and pass to the template.
Regardless of the use case though, it seems impossible to me to copy a TemporaryUploadedFile
. I don't see any reason people would expect it to actually copy the file.
comment:16 by , 2 years ago
Hey Dan — just to let you know, I'm looking at the request object now, with this half in mind. Will follow-up once I've done that. Thanks
comment:17 by , 2 years ago
This is exactly what happened in my case. I have a code that modifies some attributes of request.data
and then passes it to a serializer. At some point we started to notice This QueryDict instance is immutable
exceptions in our logs. Thus, decided to use built-in method which you are referenced to. Once, replaced direct modifications request.data
with request.data.copy()
, TypeError: cannot pickle '_io.BufferedRandom' object on...
started to araise.
Replying to Dan Madere:
After trying it, I don't think either of these suggestions is workable, considering the nature of
TemporaryUploadedFile
. It has logic where the file is deleted as soon as it's closed. When copying/streaming the old file's content to the new one, we have a problem where the new file deletes itself immediately.
I stepped back, and pondered why are people copying a
QueryDict
anyway, and what do they expect to happen to aTemporaryUploadedFile
inside? https://docs.djangoproject.com/en/dev/ref/request-response/#querydict-objects:
class QueryDict¶ In an HttpRequest object, the GET and POST attributes are instances of django.http.QueryDict, a dictionary-like class customized to deal with multiple values for the same key. This is necessary because some HTML form elements, notably <select multiple>, pass multiple values for the same key. The QueryDicts at request.POST and request.GET will be immutable when accessed in a normal request/response cycle. To get a mutable version you need to use QueryDict.copy().That leads me to think that people are copying a
QueryDict
because it's a nice starting point to have all the query params organized, but they want to modify it, and pass to the template. Doing so would result in aThis QueryDict instance is immutable
exception, and the advice above, so they try to copy theQueryDict
before modifying it.
I propose we omit
TemporaryUploadedFile
values on deep copy of aQueryDict
, and will open a PR for feedback.
comment:18 by , 2 years ago
Hey Dan. Sorry for the pause. Getting there now.
Can I ask you to clarify this:
Not necessarily, it could also be making a copy of request.POST, which surprisingly includes the files too.
I'm looking at % ./runtests.py file_uploads -k test_simple_upload --parallel=1
and % ./runtests.py file_uploads -k test_large_upload --parallel=1
and inspecting the request for the views:
(django) carlton@Carltons-MacBook-Pro tests % ./runtests.py file_uploads -k test_simple_upload --parallel=1 Testing against Django installed in '/Users/carlton/Projects/Django/django/django' Found 1 test(s). Creating test database for alias 'default'... System check identified no issues (0 silenced). <WSGIRequest: POST '/upload/'> POST: <QueryDict: {'name': ['Ringo']}> FILES: <MultiValueDict: {'file_field': [<InMemoryUploadedFile: tests.py (text/x-python)>]}> > /Users/carlton/Projects/Django/django/tests/file_uploads/views.py(31)file_upload_view() -> form_data = request.POST.copy() (Pdb) interact *interactive* >>> request.POST.copy() <QueryDict: {'name': ['Ringo']}> >>> request.FILES.copy() <MultiValueDict: {'file_field': [<InMemoryUploadedFile: tests.py (text/x-python)>]}> >>> request.FILES.copy()["file_field"] <InMemoryUploadedFile: tests.py (text/x-python)> >>> request.FILES.copy()["file_field"].seekable() True
And the same here:
(django) carlton@Carltons-MacBook-Pro tests % ./runtests.py file_uploads -k test_large_upload --parallel=1 Testing against Django installed in '/Users/carlton/Projects/Django/django/django' Found 1 test(s). Creating test database for alias 'default'... System check identified no issues (0 silenced). <WSGIRequest: POST '/verify/'> POST: <QueryDict: {'name': ['Ringo'], 'name_hash': ['a6d6d397a391d876d5aae28cdca9f25071906022'], 'file_field1_hash': ['b84b7b775430232c2ee4e2216db1a591d97b2db0'], 'file_field2_hash': ['fbcf384f1c96e3d24fb25b749b00241a5cc4690f']}> FILES: <MultiValueDict: {'file_field1': [<TemporaryUploadedFile: tmpx56phiic.file1 (application/octet-stream)>], 'file_field2': [<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}> > /Users/carlton/Projects/Django/django/tests/file_uploads/views.py(59)file_upload_view_verify() -> form_data = request.POST.copy() (Pdb) interact *interactive* >>> request.POST.copy() <QueryDict: {'name': ['Ringo'], 'name_hash': ['a6d6d397a391d876d5aae28cdca9f25071906022'], 'file_field1_hash': ['b84b7b775430232c2ee4e2216db1a591d97b2db0'], 'file_field2_hash': ['fbcf384f1c96e3d24fb25b749b00241a5cc4690f']}> >>> request.FILES.copy() <MultiValueDict: {'file_field1': [<TemporaryUploadedFile: tmpx56phiic.file1 (application/octet-stream)>], 'file_field2': [<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}> >>> request.FILES.copy()["file_field1"] <TemporaryUploadedFile: tmpx56phiic.file1 (application/octet-stream)> >>> request.FILES.copy()["file_field1"].seekable() True
So, it's neither the case that request.POST
includes the files data nor that copying request.FILES
leads to the error. 🤔
I do get the error creating the TemporaryUploadedFile
by-hand, so I need to see what's different about the upload flow.
If you were able to provide a (minimal but) full reproduce, it might save a cycle.
Thanks!
comment:19 by , 2 years ago
Presumably you're using DRF's request.data which "includes all parsed content, including file and non-file inputs."
request.POST
is still available there that gives you the QueryDict without the file data.
(Still looking at how I'm able to copy the TemporaryUploadedFile
in the test case.)
comment:20 by , 2 years ago
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
So (ruling out the test client) I've experimented with a real-view, uploading a test file with some POST data as well:
from django.urls import path from django.http import HttpResponse import pprint # With settings to ensure TemporaryUploadedFile: # FILE_UPLOAD_MAX_MEMORY_SIZE=10 def file_upload_view(request): pprint.pprint(request) print("\nPOST:") pprint.pprint(request.POST) print("\nFILES:") pprint.pprint(request.FILES) f = request.FILES.copy() assert f["file1"].seekable() return HttpResponse("Got it!") urlpatterns = [ path("upload", file_upload_view), ]
This passes without issue:
<WSGIRequest: POST '/upload'> POST: <QueryDict: {'key1': ['value1'], 'key2': ['value2']}> FILES: {'file1': <TemporaryUploadedFile: Large Test File.txt (text/plain)>}
Note the...
f = request.FILES.copy() assert f["file1"].seekable()
I've tested back to Python 3.9 and Django 3.2.
The reason this works is that FILES
is a MultiValueDict
not a QueryDict
:
>>> request.FILES.copy() <MultiValueDict: {'file_field1': [<TemporaryUploadedFile: tmpx56phiic.file1 (application/octet-stream)>], 'file_field2': [<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}>
...and so copy()
uses copy.copy()
rather than copy.deepcopy()
.
If you try and deepcopy()
the TemporaryUploadedFile
you indeed get the error, but that's not what Django does.
comment:21 by , 8 months ago
Cc: | added |
---|
I can reproduce with Python 2, even if I'm unsure that the behavior is incorrect:
However, the
deepcopy()
crashes on Python 3 with:TypeError: cannot serialize '_io.BufferedRandom' object
Accepting for further investigation.