Opened 5 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#36119 closed Bug (fixed)

Attaching email file to email fails if the attachment is using 8bit Content-Transfer-Encoding

Reported by: Trenton H Owned by: Gregory Mariani
Component: Core (Mail) Version: 5.1
Severity: Normal Keywords: compat32
Cc: Mike Edmunds Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If the attached email file is attached to an email, such as the code snipped below, the sending will fail due to the presence of non-ASCII content in the attachment.

email = EmailMessage(
    subject="subject",
    body="body",
    to="someone@somewhere.com",
)
email.attach_file(original_file)
n_messages = email.send()
Traceback (most recent call last):
  File "/usr/src/paperless/src/documents/signals/handlers.py", line 989, in email_action
    n_messages = email.send()
                 ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 301, in send
    return self.get_connection(fail_silently).send_messages([self])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 136, in send_messages
    sent = self._send(message)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 156, in _send
    from_email, recipients, message.as_bytes(linesep="\r\n")
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 148, in as_bytes
    g.flatten(self, unixfrom=unixfrom, linesep=linesep)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 286, in _handle_multipart
    g.flatten(part, unixfrom=False, linesep=self._NL)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 372, in _handle_message
    g.flatten(msg.get_payload(0), unixfrom=False, linesep=self._NL)
  File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
    self._write(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
    self._dispatch(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
    meth(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 446, in _handle_text
    super(BytesGenerator,self)._handle_text(msg)
  File "/usr/local/lib/python3.12/email/generator.py", line 263, in _handle_text
    self._write_lines(payload)
  File "/usr/local/lib/python3.12/email/generator.py", line 156, in _write_lines
    self.write(line)
  File "/usr/local/lib/python3.12/email/generator.py", line 420, in write
    self._fp.write(s.encode('ascii', 'surrogateescape'))

                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)

Attachments (1)

problem.eml (3.3 KB ) - added by Trenton H 5 weeks ago.

Download all attachments as: .zip

Change History (17)

by Trenton H, 5 weeks ago

Attachment: problem.eml added

comment:1 by Tim Graham, 5 weeks ago

Trenton, It would be helpful if you could provide some more information about this because I'm not sure we have any triagers who are experts in this area. Where is Django at fault and how can we fix it?

comment:2 by Trenton H, 5 weeks ago

Assuming SMTP is configured for a project:

from django.core.mail import EmailMessage

email = EmailMessage(
    subject="subject",
    body="body",
    to="someone@somewhere.com",
)
email.attach_file("problem.eml")
# or
email.attach_file("problem.eml", "message/rfc822")
n_messages = email.send()

I would expect this to work without issue, but as shown above, Django appear to either assume ASCII or uses the standard library in a way that assumes ASCII. As a user, I would not expect to need extra steps or processing for this code to work without a crash and just attach the file instead.

I don't know about the internals to say how it would be fixed. Perhaps checking the headers of attached messages? Defaulting to utf-8 somewhere?

comment:3 by Sarah Boyce, 5 weeks ago

Cc: Mike Edmunds added

comment:4 by Mike Edmunds, 5 weeks ago

Keywords: compat32 added
Triage Stage: UnreviewedAccepted

This came up in the forum. Here's a minimal case to reproduce:

from django.core.mail import EmailMessage

# Content of message to attach, using 8bit CTE with raw utf-8:
att = """\
Subject: attachment
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

¡8-bit content!
""".encode()

email = EmailMessage(to=["test@example.com"])
email.attach("attachment.eml", att, "message/rfc822")

email.message().as_bytes()
# ...python3.12/email/generator.py", line 409, in write
#    self._fp.write(s.encode('ascii', 'surrogateescape'))
#                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128)

The problem is that EmailMessage._create_mime_attachment() uses message_from_string(force_str(content)) to convert the attachment content to a Python Message object. But message_from_string() doesn't properly handle Unicode characters. (See python/cpython#83565. This dates back to Python 2 when strings couldn't include Unicode.)

A working alternative is message_from_bytes():

from email import message_from_string, message_from_bytes

message_from_string(att.decode()).as_bytes()
# UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128)

message_from_bytes(att).as_bytes()
# b'Subject: attachment\nContent-Type: text/plain; charset=utf-8\nContent-Transfer-Encoding: 8bit\n\n\xc2\xa18-bit content!\n'

So the simplest fix is probably to change Django to use message_from_bytes(force_bytes(content)).

(This would also be fixed by upgrading Django to use Python's modern email APIs, #35581.)

comment:5 by Gregory Mariani, 5 weeks ago

I can have a try but #35581 looks like close to be released ? Any advice to start the work ?

comment:6 by Mike Edmunds, 5 weeks ago

#35581 is stalled waiting for some Python core library fixes, so won't get merged before Django 6.0 at the earliest. I think it's worth fixing this bug before then.

Maybe start by adding a new, failing test case in django/tests/mail/tests.py. (That's a long file, and not particularly well organized, but most of the attachment-related tests are kind of grouped together in the middle, so maybe somewhere near them.) You could probably even use the minimal example from my earlier comment. (Or make it even more minimal: neither to nor Subject are relevant to the bug.)

Once that test fails, I'm pretty sure you can fix it in django/core/mail/message.py by changing message_from_string(force_str(content)) to message_from_bytes(force_bytes(content)) (and fixing up the imports). And then running all the tests to see if that breaks anything else. (I wouldn't expect it to, but I'm often wrong, so it's good we have tests.)

comment:7 by Gregory Mariani, 5 weeks ago

Owner: set to Gregory Mariani
Status: newassigned

comment:8 by Gregory Mariani, 5 weeks ago

Patch should be done, CI is runnning and I can't stay awake to see the end

comment:9 by Gregory Mariani, 5 weeks ago

Has patch: set

comment:10 by Mike Edmunds, 3 weeks ago

Triage Stage: AcceptedReady for checkin

comment:11 by Mike Edmunds, 3 weeks ago

Triage Stage: Ready for checkinAccepted

comment:12 by Sarah Boyce, 3 weeks ago

Patch needs improvement: set

comment:13 by Gregory Mariani, 3 weeks ago

Patch needs improvement: unset

@sarah it's ok I think

comment:14 by Sarah Boyce, 3 weeks ago

Triage Stage: AcceptedReady for checkin

comment:15 by Sarah Boyce <42296566+sarahboyce@…>, 3 weeks ago

Resolution: fixed
Status: assignedclosed

In 89e28e13:

Fixed #36119 -- Fixed UnicodeEncodeError when attaching a file with 8bit Content-Transfer-Encoding.

comment:16 by Sarah Boyce <42296566+sarahboyce@…>, 3 weeks ago

In 2146bd1:

[5.2.x] Fixed #36119 -- Fixed UnicodeEncodeError when attaching a file with 8bit Content-Transfer-Encoding.

Backport of 89e28e13ecbf9fbcf235e16d453c08bbf2271244 from main.

Note: See TracTickets for help on using tickets.
Back to Top