#5778 closed (fixed)
Email subjects not encoded properly
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When providing an UTF-8 encoded subject to the EmailMessage class constructor, the subject is sent directly in UTF-8, without being encoded in Quoted-Printable or Base 64. When the MUA of the recepient is running on a machine with UTF-8, it "works", but with recipients having their machines running ISO-8859-x or other non-UTF-8 charset, the subject appears broken.
I think the problem comes from the implementation of the setitem method of the SafeMIMEText class. It only uses the Header() class when str(force_unicode(val)) raises an exception, which it doesn't do in my case (I suppose because my subject is properly UTF-8 encoded). However, I'd say it should *always* use Header(), which properly turns an UTF-8 string to a quoted-printable string.
I'm running Django trunk at r6526.
Don't hesitate to ask for further details if needed.
Attachments (1)
Change History (8)
by , 17 years ago
Attachment: | django-mail-encoding-header-fix added |
---|
comment:1 by , 17 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Yes. Good catch.
I'll have to check the behaviour of Header. This might need some tweaking from memory. The point is that when I was writing the current code, there were times when headers were being pointlessly encoded even when they could be represented directly (particularly ASCII text). So, providing it doesn't try to wrap ASCII up in anything fancy, this is the right fix. Otherwise, we need to check that the data really is non-ASCII before making a Header() out of it.
comment:2 by , 17 years ago
You're right, it does some pointless encoding when the string is pure ASCII, for example:
From: =?utf-8?q?Trivialibre?= <trivialibre@…>
In that case, the =?utf-8?q? stuff is useless.
So, instead of verifying if the string is Unicode (with force_unicode), the code should probably check if the string is ASCII 7bits only or not. Do you want me to provide an improved fix, or are you going to do it ?
comment:3 by , 17 years ago
I was thinking about this some more and I'm not sure I completely understand the problem any longer. force_unicode()
forces the input to a unicode object and str()
will raise an error for any data that isn't ASCII.
So, for example
>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m')) # A UTF-8 bytestring Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128)
The only way I could see this failing is if somebody had changed Python's default encoding, which is well advertised as being something that shouldn't be done, for exactly this sort of reason.
What is an example of a header string that is causing the problem? And what does sys.getdefaultencoding() return?
comment:4 by , 17 years ago
From a raw Python shell, with PYTHONPATH=/path/to/django:
thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ python Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> print sys.getdefaultencoding() ascii >>> from django.utils.encoding import force_unicode >>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m')) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128) >>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"')) Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 22: ordinal not in range(128)
So here, it works properly. Now, from a Python shell ran using "manage.py shell", still with PYTHONPATH=/path/to/django/:
thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ ./trivialibre/tvl/manage.py shell Python 2.4.4 (#2, Apr 5 2007, 20:11:18) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> import sys >>> print sys.getdefaultencoding() utf-8 >>> from django.utils.encoding import force_unicode >>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m')) '\xc3\x85ngstr\xc3\xb6m' >>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"')) 'Nouvelle question "Et \xc3\xa7a marche bien \xc3\xa9 ?"' >>>
The second string tested above is the one I was using for my tests. But yours also perfectly shows the problem.
comment:5 by , 17 years ago
Oh, that's tricky and not very nice behaviour. :-(
Okay, now I'm convinced. Will fix it in a minute.
comment:6 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Ugly patch that fixes the problem for me