#11030 closed Uncategorized (wontfix)
File uploads break on non english filesystem encoding
Reported by: | Honza Král | Owned by: | nobody |
---|---|---|---|
Component: | File uploads/storage | Version: | 1.2 |
Severity: | Normal | Keywords: | file path encoding |
Cc: | david.danier@…, lists@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The tests produce:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 43-44: character maps to <undefined>
The fix just converts file paths to bytestring.
Attachments (1)
Change History (9)
by , 16 years ago
Attachment: | 11030-against-trunk-10686.diff added |
---|
comment:1 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:2 by , 15 years ago
(In [12661]) Fixed #11030: Reverted a change that assumed the file system encoding was utf8, and changed a test to demonstrate how that assumption corrupted uploaded non-ASCII file names on systems that don't use utf8 as their file system encoding (Windows for one, specifically). Thanks for the report to vrehak.
comment:3 by , 15 years ago
comment:4 by , 14 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Version: | SVN → 1.2 |
This change broke my code when upgrading from 1.1.1 to 1.2.1, and this was not listed in the documentation…
If I upload a file with the name "André.jpg" these are the different results for django.core.file.FileSystemStorage.path():
input - '/servers/staging/sapoopenid/media' u'avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'
1.1.1 - '/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg'
1.2.1 - u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'
Then then os.path.exists() is called on this:
1.1.1
os.path.exists('/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg')
True
1.2.1
os.path.exists(u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/servers/python/lib/python2.6/genericpath.py", line 18, in exists
st = os.stat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)
This causes all sorts of trouble. The storage can't even delete files with names like these because exists() is called on them...
comment:5 by , 14 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
The correct fix here is to set up the environment for your running code so that unicode can be passed to the file system functions. Django assuming utf-8 is just wrong; some file systems do not use that encoding and thus Django assuming that encoding for uploaded files quietly corrupts file names on those systems. That's worse than a loud error. There is some doc on how to set up the environment for Apache here: http://docs.djangoproject.com/en/dev/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror. That doc does belong in a more prominent place, not buried in a section on a deployment method that is no longer the recommended one, but fixing the doc should be the subject of a different ticket.
comment:6 by , 14 years ago
Thanks for the info, I've already done the encoding setup now. This was more of an heads-up for people with similar problems since no mention of this is made in the release notes and upgrading to 1.2 broke running code.
As you said, this info should be somewhere where it gets more attention since it's not even specific to mod_python (I use mod_wsgi).
comment:7 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
Perhaps the low-level-storage API could use smart_str(filename, encoding=sys.getfilesystemencoding()) to solve this issue without having to modify the environment? Anyways I'd love to see the docs somewhere more prominent.
comment:8 by , 13 years ago
Cc: | added |
---|
(In [10695]) [1.0.X] Fixed #11030: fixed file uploads on non-utf8 filesystem encoding. Thanks, Honza Kral. Backport of [10693] from trunk.