#1484 closed defect (duplicate)
[patch] files uploads should be streamed to temporary files
Reported by: | anonymous | Owned by: | Adrian Holovaty |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | major | Keywords: | |
Cc: | Maniac@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In the current implementation the content of uploaded files is stored in the request.FILES dictionary.
This makes Django ill suited for any application where one might need to upload files beyond trivial sizes.
The content of the uploaded files should be stored in a temporary files and only the handle to these should be referenced in the request.FILES dictionary.
best,
i.
Attachments (14)
Change History (30)
comment:1 by , 19 years ago
Cc: | added |
---|
comment:2 by , 19 years ago
I have it basically working in my local tree (though not yet suitable for review). To finish it I'd like to hear your opinions:
- Temp files are read from stdin in 8K chunks which seems to me too conservative (and slow). What would be the reasonable amount? I would say 32K but it's based only on some dusty habits :-)
- I don't plan to leave 'read everything in memory' behavior for file uploads in place. However if it makes sense to process file uploads in memory I could make a setting switching between these behaviors.
comment:3 by , 19 years ago
Summary: | files uploads should be streamed to temporary files → [patch] files uploads should be streamed to temporary files |
---|
Here are patches for trunk and magic-removal. It also eliminates at least one copy of the same data from memory and works better than previous parser even without storing files in temp files.
Now items in request.FILES have are accesible as follows:
filefilename - filename, as it was
filecontent-type - content type, as it was
filefile - file-like object (StringIO or TempFile internally)
filecontent - file content that is filled lazily on first access
There is a new setting: STORE_UPLOAD_ON_DISK (feel free to rename it) which is False by default
In other words it's all backwards compatible.
comment:4 by , 19 years ago
2 days from the report to the solution ... this is awesome ... really saves our app ... very very impressed
comment:5 by , 19 years ago
I've made an error in one place that made multipart form submission work only when there are no more fields other than file upload. Corrected patches attached.
comment:6 by , 19 years ago
Found some more bugs working with these patches. New variants also happen to be more reviewable (i.e. diff blocks are less mixed)
comment:7 by , 19 years ago
New patch fix 2 things:
- it now works when multiple file upload controls have identical names
- it survives request.FILES.copy() (file objects don't get copied so it now loads content before copying)
by , 19 years ago
Attachment: | 1484.m-r.5.diff added |
---|
Patch 5 for magic-removal: updated for line count
by , 19 years ago
Attachment: | 1484.m-r.6.diff added |
---|
Patch 5 for magic-removal: corrected settings import
comment:8 by , 19 years ago
I just converted my app to magic-removal (that was most certainly loads of fun!) and found out that I messed up this patch with direct import of constant from settings. Attached is new version that actually works (now tested for real).
Sorry...
comment:9 by , 18 years ago
Version: | → SVN |
---|
Made FieldStorage read in chuncks of 64k, still needs fixing for related objects in MultiValueDict.
Error trace if you are intrested.
{{
Traceback (most recent call last):
File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response
- response = callback(request, *callback_args, callback_kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin
- return view_func(request, *args, kwargs)
File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func
- response = view_func(request, *args, kwargs)
File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage
- errors = manipulator.get_validation_errors(new_data)
File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors
- errors.update(field.get_validation_errors(new_data))
File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in get_validation_errors
- self.run_validator(new_data, validator)
File "/opt/apache/htdocs/python/bigfile/django/forms/init.py" in run_validator
- if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'):
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get
- val = self[key]
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in getitem
- raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self)
File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in repr
- return "<MultiValueDict: %s>" % dict.repr(self)
MemoryError at /admin/storfil/filelist/add/
}}
Tested with a 500meg file.
comment:10 by , 18 years ago
Traceback (most recent call last): File "/opt/apache/htdocs/python/bigfile/django/core/handlers/base.py" in get_response 74. response = callback(request, *callback_args, **callback_kwargs) File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/decorators.py" in _checklogin 54. return view_func(request, *args, **kwargs) File "/opt/apache/htdocs/python/bigfile/django/views/decorators/cache.py" in _wrapped_view_func 40. response = view_func(request, *args, **kwargs) File "/opt/apache/htdocs/python/bigfile/django/contrib/admin/views/main.py" in add_stage 253. errors = manipulator.get_validation_errors(new_data) File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors 58. errors.update(field.get_validation_errors(new_data)) File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in get_validation_errors 351. self.run_validator(new_data, validator) File "/opt/apache/htdocs/python/bigfile/django/forms/__init__.py" in run_validator 337. if self.is_required or new_data.get(self.field_name, False) or hasattr(validator, 'always_test'): File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in get 137. val = self[key] File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __getitem__ 114. raise MultiValueDictKeyError, "Key %r not found in %r" % (key, self) File "/opt/apache/htdocs/python/bigfile/django/utils/datastructures.py" in __repr__ 104. return "<MultiValueDict: %s>" % dict.__repr__(self) MemoryError at /admin/storfil/filelist/add/
comment:11 by , 18 years ago
3013-streaming-file_improved.diff
Reading bouandries done using 64kb chunks in parse_file_uploads to avoid reading a big file on one.
Validation on filefields done using size of temporary file.
File being moved in _save_FIELD_file instead of written from memory.
Removed settings since content is still accessible as new_data[field_name]content
Code cleanup still needed, will do this if the patch is acceptable.
comment:12 by , 18 years ago
In fact I'm against changing the whole interface of working with file content to file-like objects because it's not backwards compatible. Or at least not in this ticket since it decreases its chances to be commited.
comment:13 by , 18 years ago
The problem is you need all those changes I did to be able to upload a large file without a memoryerror, without them this patch has no purpose. But I could rework it to use the setting.
comment:14 by , 18 years ago
Sure, it has purpose! I use it every day since I've written it :-). Your changes are required only if you use manipulators to validate file contents. But it's not required in Django. And in some cases you just don't need a manipulator.
In other words this bug was intended to create a possibility to store uploaded files on disk. But not to convert automatic manipulators, admin and validation to this approach. This is better to have in another ticket because it's easier to review.
comment:16 by , 18 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
#2070 superseeds this ticket.
I'm starting to implement it for my project. I'll submit a patch if all goes well.
BTW, was there a special reason to parse incoming POST data with email package? cgi.FieldStorage seems already smart enough to put binary contents in temp files. I plan to use it.