Opened 9 years ago

Closed 8 years ago

Last modified 2 months ago

#25986 closed Bug (fixed)

Django crashes on unicode characters in the local part of an e-mail address

Reported by: Sergei Maertens Owned by: Sergei Maertens
Component: Core (Mail) Version: 1.9
Severity: Normal Keywords:
Cc: george@…, martin.pajuste@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With Python 3.5 and Django 1.9 I'm running into trouble with internationalized e-mail addresses. According to RFC 6532 it is possible to have unicode characters in the e-mail address: https://tools.ietf.org/html/rfc6532.html and this RFC *should* be supported in Python 3.5: https://docs.python.org/3/whatsnew/3.5.html#email. Steps to reproduce are at the bottom of the ticket.

Now, for validating e-mail addresses in various places, Django calls stdlib mail.formataddr, which doesn't seem to respect this RFC - not even if you explicitly set the EmailPolicy.utf8 (see https://docs.python.org/3/library/email.policy.html#email.policy.EmailPolicy.utf8) as I see no reference to that in the source code.

This function in the stdlib blatantly calls address.encode('ascii'). Luckily, it's quite short, and I would suggest rolling 'our own' formataddr function (for the time being). I'll bring this issue up on the Python bug tracker as well. I think it's possible, as it's a relative simple function of only 40 LoC, of which 12 lines docstring.

Steps to reproduce
Basically shell output:

mkdir bug_email
cd bug_email
mkdir bug_email -p python3.5
(bug_email)  bug_email  python --version
Python 3.5.1
(bug_email)  bug_email  pip install Django==1.9
Collecting Django==1.9
  Using cached Django-1.9-py2.py3-none-any.whl
Installing collected packages: Django
Successfully installed Django-1.9
(bug_email)  bug_email  django-admin.py startproject bug_email .

# regular shell is enough to test
(bug_email)  bug_email  ./manage.py shell

Shell session

>>> from django.core.mail.message import sanitize_address
>>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'utf8')
Traceback (most recent call last):
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle
    self.run_shell(shell=options['interface'])
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell
    raise ImportError
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address
    return formataddr((nm, addr))
  File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr
    address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128)
>>> 
>>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'idna')
Traceback (most recent call last):
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle
    self.run_shell(shell=options['interface'])
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell
    raise ImportError
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address
    return formataddr((nm, addr))
  File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr
    address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128)
>>> 

Change History (14)

comment:1 by Claude Paroz, 9 years ago

Triage Stage: UnreviewedAccepted

Reporting the issue in Python would be the first step. Could you please report here the ticket number as soon as it is done?

comment:2 by Sergei Maertens, 9 years ago

Python bug tracker issue: http://bugs.python.org/issue25955

comment:3 by Sergei Maertens, 9 years ago

The upstream ticket has been closed, with the following comment:

formataddr is part of the legacy interface and has no knowledge of the current policy. So it doesn't support RFC 6532. For > that you need to use the new API: just assign your address to the appropriate field, or create a headerregistry.Address object.

I'm in the process of rewriting the docs to make all of this clear, but, well, I'm slow...

Looks like this will have to be solved on Django's end after all. I'm not too familiar with the e-mail library yet, but I could look into it when I get some more spare time.

Version 0, edited 9 years ago by Sergei Maertens (next)

comment:4 by George Marshall, 9 years ago

Cc: george@… added

comment:5 by Martin Pajuste, 9 years ago

Cc: martin.pajuste@… added

comment:6 by Sergei Maertens, 8 years ago

Owner: changed from nobody to Sergei Maertens
Status: newassigned

comment:7 by Sergei Maertens, 8 years ago

Has patch: set

As I was implementing this, some extra information:

  • The RFC 6532 is a red herring. The actual issue was that the unicode characters were not properly MIME word-encoded. To support said RFC, an extra mail-server plugin must be enabled, so I went the safe way for Django where that's not needed.
  • There are major differences between Python 2 and 3. This is the case in:
    • the FakeSMTPServer in the testcases, where the mailfrom variable might not be MIME-word-encoded all the way.
    • the usage of Header(<string>, <encoding), where the str representation calls the encode method on Python 2 and on Python 3 it simply returns the initial <string> that was passed in.

Potential issue:
Because of the difference in str representation, that has been altered to always call the encode method. This causes simple ascii local parts to look garbled, for instance: to@example.com becomes =?utf-8?q?to?=@example.com. When django users test the e-mail messages generated by Django, they may have failing tests because they're not expecting the encoded version. I have not personally confirmed this yet though.

comment:8 by Sergei Maertens, 8 years ago

Summary: RFC 6532 support for e-mailDjango crashes on unicode characters in the local part of an e-mail address

comment:9 by Tim Graham, 8 years ago

Patch needs improvement: set

Left comments for improvement on the PR.

comment:10 by Sergei Maertens, 8 years ago

Patch needs improvement: unset

Remarks were processed, up for review again (if the build succeeds).

comment:11 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In ec009ef1:

Fixed #25986 -- Fixed crash sending email with non-ASCII in local part of the address.

On Python 3, sending emails failed for addresses containing non-ASCII
characters due to the usage of the legacy Python email.utils.formataddr()
function. This is fixed by using the proper Address object on Python 3.

comment:12 by Mike Edmunds, 2 months ago

Does anyone know of an email server or service where this sort of non-ASCII mailbox works?

Sorry to unearth this ancient ticket. I'm hoping one of the original participants or cc's might have more info.

This ticket attempted to add support for non-ASCII characters in the local part (username) of an email address: e.g., j.lópez@example.com . But the implementation used an RFC 2047 encoded-word (=?utf-8?b?ai5sw7NwZXo=?=@example.com). RFC 2047 specifically prohibits that in an addr-spec (email address). And I've been unable to find any email systems that actually support this behavior. At best, the message bounces—and sometimes it just disappears without a trace.

If anyone has an example where RFC 2047 encoded-word email addresses work, I'll put some effort into preserving the current behavior as part of updating to Python's modern email API (#35581). Or if nobody is actually using this successfully, I'm going to treat it as a bug.

More info:

  • RFC 2047 section 5(3) specifies that: "An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'." (And I can't find any later RFC that removes this restriction.)
  • I can't find any email software or services that support RFC 2047 in an email localpart. The closest is Exim, which can convert a localpart to/from an A-Label (an RFC 5891 IDNA encoding, a.k.a. "punycode": j.xn--lpez-qqa@example.com). But that's a different encoding, and this behavior seems to be unique to Exim.
  • From what I can tell, the only spec compliant way to use a non-ASCII localpart involves keeping utf-8 un-encoded in the headers, and then sending with the SMTPUTF8 extension (RFC 6530 / RFC 6531 / RFC 6532). There doesn't seem to be any spec that allows a non-ASCII localpart with 7-bit header encoding.
  • The Python behavior seems to be a bug. Note that the same Python code also converts a non-ASCII domain name to an RFC 2047 encoded-word, which is definitely not deliverable: juan@lópez.example.mxjuan@=?utf-8?q?l=C3=B3pez?=.example.mx. This is an acknowledged bug in Python's email package.

(Incidentally, RFC 2047 encoded words are allowed in a display-name: J. López <jlopez@example.com>J. =?utf-8?q?L=C3=B3pez?= <jlopez@example.com>. That worked before and after this ticket, and won't be changed by #35581.)

comment:13 by Sergei Maertens, 2 months ago

It's been a long time, but I *think* this had to do with a test suite of a third party package (I want to say django-yubin?) that started failing because of the Python 2 -> 3 differences in behaviour, and that made me dive into it. I've never actually come across such an email, and for some project recently looked up the address-part-with-non-ascii-characters possibility and reported to them that no additional work is needed because it shouldn't be allowed in the first place.

Likely I assumed at the time that the "new" Python behaviour was correct.

comment:14 by Mike Edmunds, 2 months ago

Thanks, Sergei, that's very helpful. I'll plan to remove this behavior.

Note: See TracTickets for help on using tickets.
Back to Top