#25986 closed Bug (fixed)
Django crashes on unicode characters in the local part of an e-mail address
Reported by: | Sergei Maertens | Owned by: | Sergei Maertens |
---|---|---|---|
Component: | Core (Mail) | Version: | 1.9 |
Severity: | Normal | Keywords: | |
Cc: | george@…, martin.pajuste@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With Python 3.5 and Django 1.9 I'm running into trouble with internationalized e-mail addresses. According to RFC 6532 it is possible to have unicode characters in the e-mail address: https://tools.ietf.org/html/rfc6532.html and this RFC *should* be supported in Python 3.5: https://docs.python.org/3/whatsnew/3.5.html#email. Steps to reproduce are at the bottom of the ticket.
Now, for validating e-mail addresses in various places, Django calls stdlib mail.formataddr
, which doesn't seem to respect this RFC - not even if you explicitly set the EmailPolicy.utf8
(see https://docs.python.org/3/library/email.policy.html#email.policy.EmailPolicy.utf8) as I see no reference to that in the source code.
This function in the stdlib blatantly calls address.encode('ascii')
. Luckily, it's quite short, and I would suggest rolling 'our own' formataddr
function (for the time being). I'll bring this issue up on the Python bug tracker as well. I think it's possible, as it's a relative simple function of only 40 LoC, of which 12 lines docstring.
Steps to reproduce
Basically shell output:
mkdir bug_email cd bug_email mkdir bug_email -p python3.5 (bug_email)➜ bug_email python --version Python 3.5.1 (bug_email)➜ bug_email pip install Django==1.9 Collecting Django==1.9 Using cached Django-1.9-py2.py3-none-any.whl Installing collected packages: Django Successfully installed Django-1.9 (bug_email)➜ bug_email django-admin.py startproject bug_email . # regular shell is enough to test (bug_email)➜ bug_email ./manage.py shell
Shell session
>>> from django.core.mail.message import sanitize_address >>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'utf8') Traceback (most recent call last): File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle self.run_shell(shell=options['interface']) File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell raise ImportError ImportError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address return formataddr((nm, addr)) File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr address.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128) >>> >>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'idna') Traceback (most recent call last): File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle self.run_shell(shell=options['interface']) File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell raise ImportError ImportError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address return formataddr((nm, addr)) File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr address.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128) >>>
Change History (14)
comment:1 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 9 years ago
The upstream ticket has been closed, with the following comment:
formataddr
is part of the legacy interface and has no knowledge of the current policy. So it doesn't support RFC 6532. For that you need to use the new API: just assign your address to the appropriate field, or create aheaderregistry.Address
object.
Looks like this will have to be solved on Django's end after all. I'm not too familiar with the e-mail library yet, but I could look into it when I get some more spare time.
comment:4 by , 9 years ago
Cc: | added |
---|
comment:5 by , 9 years ago
Cc: | added |
---|
comment:6 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 9 years ago
Has patch: | set |
---|
As I was implementing this, some extra information:
- The RFC 6532 is a red herring. The actual issue was that the unicode characters were not properly MIME word-encoded. To support said RFC, an extra mail-server plugin must be enabled, so I went the safe way for Django where that's not needed.
- There are major differences between Python 2 and 3. This is the case in:
- the FakeSMTPServer in the testcases, where the
mailfrom
variable might not be MIME-word-encoded all the way. - the usage of
Header(<string>, <encoding)
, where thestr
representation calls theencode
method on Python 2 and on Python 3 it simply returns the initial<string>
that was passed in.
- the FakeSMTPServer in the testcases, where the
Potential issue:
Because of the difference in str
representation, that has been altered to always call the encode
method. This causes simple ascii local parts to look garbled, for instance: to@example.com
becomes =?utf-8?q?to?=@example.com
. When django users test the e-mail messages generated by Django, they may have failing tests because they're not expecting the encoded version. I have not personally confirmed this yet though.
comment:8 by , 9 years ago
Summary: | RFC 6532 support for e-mail → Django crashes on unicode characters in the local part of an e-mail address |
---|
comment:10 by , 9 years ago
Patch needs improvement: | unset |
---|
Remarks were processed, up for review again (if the build succeeds).
comment:12 by , 5 months ago
Does anyone know of an email server or service where this sort of non-ASCII mailbox works?
Sorry to unearth this ancient ticket. I'm hoping one of the original participants or cc's might have more info.
This ticket attempted to add support for non-ASCII characters in the local part (username) of an email address: e.g., j.lópez@example.com
. But the implementation used an RFC 2047 encoded-word (=?utf-8?b?ai5sw7NwZXo=?=@example.com
). RFC 2047 specifically prohibits that in an addr-spec (email address). And I've been unable to find any email systems that actually support this behavior. At best, the message bounces—and sometimes it just disappears without a trace.
If anyone has an example where RFC 2047 encoded-word email addresses work, I'll put some effort into preserving the current behavior as part of updating to Python's modern email API (#35581). Or if nobody is actually using this successfully, I'm going to treat it as a bug.
More info:
- RFC 2047 section 5(3) specifies that: "An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'." (And I can't find any later RFC that removes this restriction.)
- I can't find any email software or services that support RFC 2047 in an email localpart. The closest is Exim, which can convert a localpart to/from an A-Label (an RFC 5891 IDNA encoding, a.k.a. "punycode":
j.xn--lpez-qqa@example.com
). But that's a different encoding, and this behavior seems to be unique to Exim. - From what I can tell, the only spec compliant way to use a non-ASCII localpart involves keeping utf-8 un-encoded in the headers, and then sending with the SMTPUTF8 extension (RFC 6530 / RFC 6531 / RFC 6532). There doesn't seem to be any spec that allows a non-ASCII localpart with 7-bit header encoding.
- The Python behavior seems to be a bug. Note that the same Python code also converts a non-ASCII domain name to an RFC 2047 encoded-word, which is definitely not deliverable:
juan@lópez.example.mx
→juan@=?utf-8?q?l=C3=B3pez?=.example.mx
. This is an acknowledged bug in Python's email package.
(Incidentally, RFC 2047 encoded words are allowed in a display-name: J. López <jlopez@example.com>
→ J. =?utf-8?q?L=C3=B3pez?= <jlopez@example.com>
. That worked before and after this ticket, and won't be changed by #35581.)
comment:13 by , 5 months ago
It's been a long time, but I *think* this had to do with a test suite of a third party package (I want to say django-yubin?) that started failing because of the Python 2 -> 3 differences in behaviour, and that made me dive into it. I've never actually come across such an email, and for some project recently looked up the address-part-with-non-ascii-characters possibility and reported to them that no additional work is needed because it shouldn't be allowed in the first place.
Likely I assumed at the time that the "new" Python behaviour was correct.
comment:14 by , 5 months ago
Thanks, Sergei, that's very helpful. I'll plan to remove this behavior.
Reporting the issue in Python would be the first step. Could you please report here the ticket number as soon as it is done?