#31784 closed Bug (fixed)
Emails name over 75 characters are incompatible with the latest versions of python.
Reported by: | Nick Orr | Owned by: | Florian Apolloner |
---|---|---|---|
Component: | Core (Mail) | Version: | 2.2 |
Severity: | Normal | Keywords: | |
Cc: | Joachim Jablon, Florian Apolloner | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In the process of sending an email the addresses is sanatized:
django/core/mail/message.py:98 => def sanitize_address(addr, encoding)
The Name portion is encoded via the Header class email.header.Header.encode which will introduce newlines at 75 characters.
Unfortunately the most recent python security update no longer allows that to happen. So when Address(nm, addr_spec=addr) is called in sanitize_address a new error is raised from the Python Standard library.
The update to python can be found here: https://github.com/python/cpython/commit/f91a0b6df14d6c5133fe3d5889fad7d84fc0c046#diff-3c5a266cd05e7d4173bf110ee93edd16
Essentially Django can no longer send emails with names longer then 75 chracters.
Change History (22)
comment:1 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Summary: | When sending emails with a name over 75 characters long new lines are introduced which is incompatible with the latest minor versions of python 3.6/7/8 → Emails name over 75 characters are incompatible with the latest minor versions of python 3.6/7/8. |
comment:2 by , 4 years ago
Summary: | Emails name over 75 characters are incompatible with the latest minor versions of python 3.6/7/8. → Emails name over 75 characters are incompatible with the latest versions of python. |
---|
comment:3 by , 4 years ago
RFC 2822:
Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters
Instead of Django being ok with feeding an underlying Python library data that will cause an exception as with all things Python it seems like Django would be better off getting out of the way and letting the users shoot themselves in the foot.
I'd suggest that this call:
nm = Header(nm, encoding).encode()
Include the actual RFC line length limit of 998 in either the class instantiation or the encode method as either of those accept maxlinelen as a kwarg.
Or as a non-changing compromise it could be hived off into setting with the default being 75 as it stands now. That would allow developers to overwrite the setting if needed. Though it would also add yet another setting for people to know about.
Either way the current implementation is likely to cause a great deal of pain as developers find out that an email addressed to: "Django with a really long name for reasons that make sense in the context of a project <noreply@…>" are inexplicably crashing in core Python code they've never seen before.
Most of us would rather send a wonky, if technically correct, email versus just having that email vanish into the ether.
follow-up: 6 comment:4 by , 4 years ago
Hi Nick.
If this is an issue, surely it's an issue in the Python stdlib? Is there a discussion there that we can point to?
What do the Python core devs says when you quote that line of RFC 2822?
Since this is a security sensitive issue, we can't bypass the stdlib implementation without being 100% sure that the judgement is correct. (If we were able to make that case, surely it would be convincing for the stdlib implementation too?)
Maybe there's an issue for Django here, but we need to pursue the Python course first. (Perhaps folks are already raising it there...?)
I hope that makes sense.
comment:5 by , 4 years ago
Include the actual RFC line length limit of 998 in either the class instantiation or the encode method as either of those accept maxlinelen as a kwarg.
Maybe we could do this.
comment:6 by , 4 years ago
https://github.com/python/cpython/pull/19007/files
This issue is caused by a recently issued patch for Python as linked above. As far as I can tell the standard library does not really care about the length in this context. And you are correct we wouldn't want to bypass using the stdlib here we might just not want to insert newlines when they aren't necessarily required.
I haven't brought it up with the Python folks but from their point of view I'm sure they see their implementation as fine. More so since the headerregistry.py file in question has this in its preamble:
Eventually HeaderRegistry will be a public API, but it isn't yet, and will probably change some before that happens.
Just for clarity this is the line in question in Django. https://github.com/django/django/blob/master/django/core/mail/message.py#L99
comment:8 by , 4 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
comment:9 by , 4 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:10 by , 4 years ago
@Nick: How are you generating those emails in the first place currently? I nowhere see a possibility to specify an actual name in https://docs.djangoproject.com/en/3.0/topics/email/
I also cannot reproduce it with:
In [12]: e = EmailMessage('subject', 'content', 'from@example.com', ['to@example.com' * 99]) In [13]: e.message() Out[13]: <django.core.mail.message.SafeMIMEText at 0x7fc900437050> In [14]: e.message().as_bytes() Out[14]: b'Content-Type: text/plain; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nSubject: subject\nFrom: from@example.com\nTo: \n to@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.comto@example.com\nDate: Tue, 14 Jul 2020 20:41:37 -0000\nMessage-ID: <159475929766.21330.4162456845040318158@apollo13>\n\ncontent'
I assume you are manually injecting to
headers?
comment:11 by , 4 years ago
Oh, it needs a non-ascii character to trigger encoding in the first place:
from django.conf import settings settings.configure() from django.core.mail.message import EmailMessage e = EmailMessage('subject', 'content', 'from@example.com', ['"TestUser ä%s" <to@example.com>' % ('0' * 100)]) print(e.message().as_string())
raises
Traceback (most recent call last): File "/home/florian/sources/django.git/django/core/mail/message.py", line 62, in forbid_multi_line_headers val.encode('ascii') UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 10: ordinal not in range(128) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "testing.py", line 7, in <module> print(e.message().as_string()) File "/home/florian/sources/django.git/django/core/mail/message.py", line 242, in message self._set_list_header_if_not_empty(msg, 'To', self.to) File "/home/florian/sources/django.git/django/core/mail/message.py", line 397, in _set_list_header_if_not_empty msg[header] = value File "/home/florian/sources/django.git/django/core/mail/message.py", line 154, in __setitem__ name, val = forbid_multi_line_headers(name, val, self.encoding) File "/home/florian/sources/django.git/django/core/mail/message.py", line 65, in forbid_multi_line_headers val = ', '.join(sanitize_address(addr, encoding) for addr in getaddresses((val,))) File "/home/florian/sources/django.git/django/core/mail/message.py", line 65, in <genexpr> val = ', '.join(sanitize_address(addr, encoding) for addr in getaddresses((val,))) File "/home/florian/sources/django.git/django/core/mail/message.py", line 107, in sanitize_address parsed_address = Address(nm, username=localpart, domain=domain) File "/usr/lib64/python3.7/email/headerregistry.py", line 37, in __init__ raise ValueError("invalid arguments; address parts cannot contain CR or LF") ValueError: invalid arguments; address parts cannot contain CR or LF
comment:12 by , 4 years ago
This patch:
diff --git a/django/core/mail/message.py b/django/core/mail/message.py index 607eb4af0b..2a70528644 100644 --- a/django/core/mail/message.py +++ b/django/core/mail/message.py @@ -96,8 +96,8 @@ def sanitize_address(addr, encoding): nm, address = addr localpart, domain = address.rsplit('@', 1) - nm = Header(nm, encoding).encode() # Avoid UTF-8 encode, if it's possible. + # TODO: Is anything non-ascii even allowed in the local part? try: localpart.encode('ascii') except UnicodeEncodeError: @@ -105,7 +105,11 @@ def sanitize_address(addr, encoding): domain = punycode(domain) parsed_address = Address(nm, username=localpart, domain=domain) - return str(parsed_address) + if nm: + display_name = Header(parsed_address.display_name, encoding).encode() + return f'{display_name} <{parsed_address.addr_spec}>' + else: + return '' if parsed_address.addr_spec=='<>' else parsed_address.addr_spec class MIMEMixin:
creates To headers as follows:
Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: subject From: from@example.com To: =?utf-8?q?TestUser=2C_=C3=A4000000000000000000000000000000000000000000000000?= =?utf-8?q?0000000000000000000000000000000000000000000000000000?= <to@example.com>, =?utf-8?q?TestUser=2C_=C3=A4000000000000000000000000000000000000000000000000?= =?utf-8?q?0000000000000000000000000000000000000000000000000000?= <to@example.com> Date: Tue, 14 Jul 2020 22:17:04 -0000 Message-ID: <159476502450.14868.2537973479953610602@apollo13> content
which looks correct at a first glance.
comment:13 by , 4 years ago
I somewhat feel that the security fix in Python is weird. As far as I understand you are supposed to pass in properly encoded names; but if you don't encode them with linebreaks who does it later on? (noone I feel)
comment:14 by , 4 years ago
Btw, does using Address
offer us much over just keeping our code and manually concatenating the display_name, local_part and domain? Our code already checks for header injections if any header where to contain a newline in the first place…
comment:16 by , 4 years ago
@felixmm I appreciate the ping but I’m afraid I’m not familiar enough with the intricacies of that part of the email RFCs to be really of any help. Florian’s patch seems to make quite some sense given the bug, I’m not sure of a compelling reason not to go in this direction.
I’ll review the PR :)
comment:17 by , 4 years ago
Owner: | changed from | to
---|---|
Triage Stage: | Accepted → Ready for checkin |
name-addr
like other message headers has line length limit (see RFC 2822) on the other hand it cannot contain CR and LF newline characters because they will be vulnerable for header injection attacks (see issue39073). As far as I'm aware there is not much we can do to keep it safe and allowed for thename-addr
longer than 78 chars.