Opened 5 weeks ago

Closed 11 days ago

#36023 closed Bug (fixed)

Update content_disposition_header to handle control chars.

Reported by: Alex Vandiver Owned by: Alex Vandiver
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

content_disposition_header does not correctly handle newlines and other control characters in values that are given to it.

See https://github.com/django/django/pull/18890 for a fix.

Change History (7)

comment:1 by Natalia Bidart, 5 weeks ago

Component: UncategorizedHTTP handling
Resolution: needsinfo
Status: newclosed
Type: UncategorizedBug
Version: 5.1dev

Hello Alex, thank you for taking the time to create this report. I have seen the PR but I think this ticket need more background information about why this change is needed. What I mean is that the PR shows the *what*, but in order to accept the ticket, we need to understand the *why*.

I'll close as needsinfo but please reopen when you can provide a use case describing the need/issue being solved. A small Django project or a way to trigger the bug would also help. Thanks again!

comment:2 by Alex Vandiver, 4 weeks ago

Resolution: needsinfo
Status: closednew

Headers cannot contain newline characters and other control characters. The current code passes them through unchanged, and filenames should not be assumed to be free from them. For instance:

def index(request):
    filename = "foo\nbar.pdf"
    return HttpResponse(
        "Some PDF content",
        headers={"Content-Disposition": content_disposition_header(True, filename)},
    )

...will produce:

Traceback (most recent call last):
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/zulipdev/newline-disposition/newlines/views.py", line 7, in index
    return HttpResponse(
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/http/response.py", line 374, in __init__
    super().__init__(*args, **kwargs)
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/http/response.py", line 115, in __init__
    self.headers = ResponseHeaders(headers)
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/http/response.py", line 41, in __init__
    self[header] = value
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/http/response.py", line 87, in __setitem__
    value = self._convert_to_charset(value, "latin-1", mime_encode=True)
  File "/srv/zulip-py3-venv/lib/python3.10/site-packages/django/http/response.py", line 61, in _convert_to_charset
    raise BadHeaderError(
django.http.response.BadHeaderError: Header values can't contain newlines (got 'attachment; filename="foo\nbar.pdf"')

The PR switches that to correctly percent-encode them, so that they are can be passed as an HTTP header.

comment:3 by JaeHyuckSa, 4 weeks ago

Owner: set to Alex Vandiver
Status: newassigned

in reply to:  2 comment:4 by Sarah Boyce, 4 weeks ago

Triage Stage: UnreviewedAccepted

Replying to Alex Vandiver:

filenames should not be assumed to be free from them.

I guess it is possible control characters might be in a filename but very rare. This is also mentioned here: https://datatracker.ietf.org/doc/html/rfc6266#section-4.3

Recipients SHOULD strip or replace character sequences that are known to cause confusion both in user interfaces and in filenames, such as control characters and leading and trailing whitespace.

comment:5 by Alex Vandiver, 4 weeks ago

We ran into this in the context of Zulip; it stores files uploaded by arbitrary web browsers, and attempts to serve the files back to them with the same filenames they were uploaded as. In our production deploy, we observe filenames with newlines on an infrequent but non-zero frequency.

I agree that non-newline control characters is a bit more of a "did you really mean that," but filtering them should be the job of the application calling the function. The function should either produce the requested output correctly, or throw an exception -- not corrupt the HTTP response. Given that it's a SHOULD, not a MUST, I feel it's reasonable to produce the requested output.

comment:6 by Sarah Boyce, 12 days ago

Triage Stage: AcceptedReady for checkin

comment:7 by Sarah Boyce <42296566+sarahboyce@…>, 11 days ago

Resolution: fixed
Status: assignedclosed

In 8914b57:

Fixed #36023 -- Handled controls chars in content_disposition_header.

To use the simple filename="..." form, the value must conform to the
official grammar from RFC6266[1]:

filename-parm = "filename" "=" value
value = <value, defined in [RFC2616], Section 3.6>

; token | quoted-string

The quoted-string definition comes from RFC 9110[2]:

`

quoted-string = DQUOTE *( qdtext / quoted-pair ) DQUOTE
qdtext = HTAB / SP / %x21 / %x23-5B / %x5D-7E / obs-text

The backslash octet ("\") can be used as a single-octet quoting
mechanism within quoted-string and comment constructs. Recipients that
process the value of a quoted-string MUST handle a quoted-pair as if
it were replaced by the octet following the backslash.

quoted-pair = "\" ( HTAB / SP / VCHAR / obs-text )

A sender SHOULD NOT generate a quoted-pair in a quoted-string except
where necessary to quote DQUOTE and backslash octets occurring within
that string.
`

That is, quoted strings are able to express horizontal tabs, space
characters, and everything in the range from 0x21 to 0x7e, expect for
0x22 (") and 0x5C (\), which can still be expressed but must be
escaped with their own \.

We ignore the case of obs-text, which is defined as the range
0x80-0xFF, since its presence is there for permissive parsing of
accidental high-bit characters, and it should not be generated by
conforming implementations.

Transform this character range into a regex and apply it in addition
to the "is ASCII" check. This ensures that all simple filenames are
expressed in the simple format, and that all filenames with newlines
and other control characters are properly expressed with the
percent-encoded filename*=...form.

[1]: https://datatracker.ietf.org/doc/html/rfc6266#section-4.1
[
2]: https://datatracker.ietf.org/doc/html/rfc9110#name-quoted-strings

Note: See TracTickets for help on using tickets.
Back to Top