Opened 9 years ago

Closed 8 years ago

Last modified 2 months ago

#25986 closed Bug (fixed)

Django crashes on unicode characters in the local part of an e-mail address

Reported by: Sergei Maertens Owned by: Sergei Maertens
Component: Core (Mail) Version: 1.9
Severity: Normal Keywords:
Cc: george@…, martin.pajuste@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With Python 3.5 and Django 1.9 I'm running into trouble with internationalized e-mail addresses. According to RFC 6532 it is possible to have unicode characters in the e-mail address: https://tools.ietf.org/html/rfc6532.html and this RFC *should* be supported in Python 3.5: https://docs.python.org/3/whatsnew/3.5.html#email. Steps to reproduce are at the bottom of the ticket.

Now, for validating e-mail addresses in various places, Django calls stdlib mail.formataddr, which doesn't seem to respect this RFC - not even if you explicitly set the EmailPolicy.utf8 (see https://docs.python.org/3/library/email.policy.html#email.policy.EmailPolicy.utf8) as I see no reference to that in the source code.

This function in the stdlib blatantly calls address.encode('ascii'). Luckily, it's quite short, and I would suggest rolling 'our own' formataddr function (for the time being). I'll bring this issue up on the Python bug tracker as well. I think it's possible, as it's a relative simple function of only 40 LoC, of which 12 lines docstring.

Steps to reproduce
Basically shell output:

mkdir bug_email
cd bug_email
mkdir bug_email -p python3.5
(bug_email)  bug_email  python --version
Python 3.5.1
(bug_email)  bug_email  pip install Django==1.9
Collecting Django==1.9
  Using cached Django-1.9-py2.py3-none-any.whl
Installing collected packages: Django
Successfully installed Django-1.9
(bug_email)  bug_email  django-admin.py startproject bug_email .

# regular shell is enough to test
(bug_email)  bug_email  ./manage.py shell

Shell session

>>> from django.core.mail.message import sanitize_address
>>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'utf8')
Traceback (most recent call last):
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle
    self.run_shell(shell=options['interface'])
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell
    raise ImportError
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address
    return formataddr((nm, addr))
  File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr
    address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128)
>>> 
>>> sanitize_address(('dummy', u'juan.lópez@abc.com'), 'idna')
Traceback (most recent call last):
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 69, in handle
    self.run_shell(shell=options['interface'])
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/management/commands/shell.py", line 61, in run_shell
    raise ImportError
ImportError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/bbt/coding/.virtualenvs/bug_email/lib/python3.5/site-packages/django/core/mail/message.py", line 118, in sanitize_address
    return formataddr((nm, addr))
  File "/usr/lib64/python3.5/email/utils.py", line 91, in formataddr
    address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xf3' in position 6: ordinal not in range(128)
>>> 

Change History (14)

comment:1 by Claude Paroz, 9 years ago

Triage Stage: UnreviewedAccepted

Reporting the issue in Python would be the first step. Could you please report here the ticket number as soon as it is done?

comment:2 by Sergei Maertens, 9 years ago

Python bug tracker issue: http://bugs.python.org/issue25955

comment:3 by Sergei Maertens, 9 years ago

The upstream ticket has been closed, with the following comment:

formataddr is part of the legacy interface and has no knowledge of the current policy. So it doesn't support RFC 6532. For that you need to use the new API: just assign your address to the appropriate field, or create a headerregistry.Address object.

Looks like this will have to be solved on Django's end after all. I'm not too familiar with the e-mail library yet, but I could look into it when I get some more spare time.

Last edited 8 years ago by Tim Graham (previous) (diff)

comment:4 by George Marshall, 9 years ago

Cc: george@… added

comment:5 by Martin Pajuste, 9 years ago

Cc: martin.pajuste@… added

comment:6 by Sergei Maertens, 8 years ago

Owner: changed from nobody to Sergei Maertens
Status: newassigned

comment:7 by Sergei Maertens, 8 years ago

Has patch: set

As I was implementing this, some extra information:

  • The RFC 6532 is a red herring. The actual issue was that the unicode characters were not properly MIME word-encoded. To support said RFC, an extra mail-server plugin must be enabled, so I went the safe way for Django where that's not needed.
  • There are major differences between Python 2 and 3. This is the case in:
    • the FakeSMTPServer in the testcases, where the mailfrom variable might not be MIME-word-encoded all the way.
    • the usage of Header(<string>, <encoding), where the str representation calls the encode method on Python 2 and on Python 3 it simply returns the initial <string> that was passed in.

Potential issue:
Because of the difference in str representation, that has been altered to always call the encode method. This causes simple ascii local parts to look garbled, for instance: to@example.com becomes =?utf-8?q?to?=@example.com. When django users test the e-mail messages generated by Django, they may have failing tests because they're not expecting the encoded version. I have not personally confirmed this yet though.

comment:8 by Sergei Maertens, 8 years ago

Summary: RFC 6532 support for e-mailDjango crashes on unicode characters in the local part of an e-mail address

comment:9 by Tim Graham, 8 years ago

Patch needs improvement: set

Left comments for improvement on the PR.

comment:10 by Sergei Maertens, 8 years ago

Patch needs improvement: unset

Remarks were processed, up for review again (if the build succeeds).

comment:11 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In ec009ef1:

Fixed #25986 -- Fixed crash sending email with non-ASCII in local part of the address.

On Python 3, sending emails failed for addresses containing non-ASCII
characters due to the usage of the legacy Python email.utils.formataddr()
function. This is fixed by using the proper Address object on Python 3.

comment:12 by Mike Edmunds, 2 months ago

Does anyone know of an email server or service where this sort of non-ASCII mailbox works?

Sorry to unearth this ancient ticket. I'm hoping one of the original participants or cc's might have more info.

This ticket attempted to add support for non-ASCII characters in the local part (username) of an email address: e.g., j.lópez@example.com . But the implementation used an RFC 2047 encoded-word (=?utf-8?b?ai5sw7NwZXo=?=@example.com). RFC 2047 specifically prohibits that in an addr-spec (email address). And I've been unable to find any email systems that actually support this behavior. At best, the message bounces—and sometimes it just disappears without a trace.

If anyone has an example where RFC 2047 encoded-word email addresses work, I'll put some effort into preserving the current behavior as part of updating to Python's modern email API (#35581). Or if nobody is actually using this successfully, I'm going to treat it as a bug.

More info:

  • RFC 2047 section 5(3) specifies that: "An 'encoded-word' MUST NOT appear in any portion of an 'addr-spec'." (And I can't find any later RFC that removes this restriction.)
  • I can't find any email software or services that support RFC 2047 in an email localpart. The closest is Exim, which can convert a localpart to/from an A-Label (an RFC 5891 IDNA encoding, a.k.a. "punycode": j.xn--lpez-qqa@example.com). But that's a different encoding, and this behavior seems to be unique to Exim.
  • From what I can tell, the only spec compliant way to use a non-ASCII localpart involves keeping utf-8 un-encoded in the headers, and then sending with the SMTPUTF8 extension (RFC 6530 / RFC 6531 / RFC 6532). There doesn't seem to be any spec that allows a non-ASCII localpart with 7-bit header encoding.
  • The Python behavior seems to be a bug. Note that the same Python code also converts a non-ASCII domain name to an RFC 2047 encoded-word, which is definitely not deliverable: juan@lópez.example.mxjuan@=?utf-8?q?l=C3=B3pez?=.example.mx. This is an acknowledged bug in Python's email package.

(Incidentally, RFC 2047 encoded words are allowed in a display-name: J. López <jlopez@example.com>J. =?utf-8?q?L=C3=B3pez?= <jlopez@example.com>. That worked before and after this ticket, and won't be changed by #35581.)

comment:13 by Sergei Maertens, 2 months ago

It's been a long time, but I *think* this had to do with a test suite of a third party package (I want to say django-yubin?) that started failing because of the Python 2 -> 3 differences in behaviour, and that made me dive into it. I've never actually come across such an email, and for some project recently looked up the address-part-with-non-ascii-characters possibility and reported to them that no additional work is needed because it shouldn't be allowed in the first place.

Likely I assumed at the time that the "new" Python behaviour was correct.

comment:14 by Mike Edmunds, 2 months ago

Thanks, Sergei, that's very helpful. I'll plan to remove this behavior.

Note: See TracTickets for help on using tickets.
Back to Top