Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#21574 closed Bug (fixed)

Different behaviour in Python 2 and 3 when normalizing newlines with django.utils.text.normalize_newlines

Reported by: Vajrasky Kok Owned by: Vajrasky Kok
Component: Utilities Version: dev
Severity: Normal Keywords:
Cc: sky.kok@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Python 3.3:

>>> from django.utils.text import normalize_newlines
>>> normalize_newlines("abc\r\ndef")
'abc\ndef'
>>> normalize_newlines(b"abc\r\ndef")
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/sky/Code/python/env/django/lib/python3.3/site-packages/django/utils/functional.py", line 213, in wrapper
    return func(*args, **kwargs)
  File "/home/sky/Code/python/env/django/lib/python3.3/site-packages/django/utils/text.py", line 252, in normalize_newlines
    return force_text(re.sub(r'\r\n|\r|\n', '\n', text))
  File "/usr/lib64/python3.3/re.py", line 170, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: can't use a string pattern on a bytes-like object

Python 2.7:

>>> from django.utils.text import normalize_newlines
>>> normalize_newlines(u"abc\r\ndef")
u'abc\ndef'
>>> normalize_newlines("abc\r\ndef")
u'abc\ndef'

I can produce the patch but I need to know who is in fault here: Python 2 or Python 3? Should Python 2 rejects binary or should Python 3 accepts binary?

Change History (4)

comment:1 by Vajrasky Kok, 11 years ago

Cc: sky.kok@… added
Owner: changed from nobody to Vajrasky Kok
Status: newassigned

comment:2 by Vajrasky Kok, 11 years ago

Or maybe when we give bytes to normalize_newlines, we will get bytes. But if it is string, then we'll get string. But I believe this will break backward compatibility. My vote is on banning bytes.

comment:3 by Baptiste Mispelon <bmispelon@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 2c837233f5de7d5e309833e39782c7a208a03880:

Fixed #21574 -- Handle bytes consistently in utils.text.normalize_newlines.

All input is now coerced to text before being normalized.
This changes nothing under Python 2 but it allows bytes
to be passed to the function without a TypeError under Python3
(bytes are assumed to be utf-8 encoded text).

Thanks to trac user vajrasky for the report.

comment:4 by Baptiste Mispelon <bmispelon@…>, 11 years ago

In db41778e8ccbbba19954c3b47853b8520ab263a1:

Removed unnecessary call to force_text in utils.html.clean_html.

Refs #21574

Note: See TracTickets for help on using tickets.
Back to Top