Opened 7 years ago

Closed 6 years ago

#29086 closed Bug (wontfix)

Add a warning when saving bytestrings to CharFields

Reported by: Collin Anderson Owned by: nobody
Component: Database layer (models, ORM) Version: 2.0
Severity: Release blocker Keywords:
Cc: Collin Anderson Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Collin Anderson)

Django 1.11 behavior (python 3):

o = MyModel()
o.myfield = b'123'
o.save()
o.refresh_from_db()
o.myfield == "123"

Django 2.0 behavior:

o = MyModel()
o.myfield = b'123'
o.save()
o.refresh_from_db()
o.myfield == "b'123'"

Could we restore the old behavior, maybe with a deprecation warning? Either that or raise an error?

Change History (13)

comment:1 by Collin Anderson, 7 years ago

Description: modified (diff)

comment:2 by Collin Anderson, 7 years ago

Cc: Collin Anderson added

comment:3 by Tim Graham, 7 years ago

How does that mistake (assigning bytes) happen? I doubt that it's very common but am open to being persuaded otherwise.

comment:4 by Simon Charette, 7 years ago

As I've mentioned in #28748's PR I feel like this should be the the correct behavior in a Python 3 only code base and this was probably caused by an overlooked case during the porting of the affected code bases from Python 2 to Python 3.

In other words, if you are passing binary data to an interface expecting text you should expect bad things to happen -- Python 3 is stricter about this concept and it's a good thing IMHO.

Now, this change should at least have been mentioned in the 2.0 release notes. The fact it wasn't makes me think there might other instances changed by #27795 related commits that might have introduced similar regressions.

comment:5 by Jon Dufresne, 7 years ago

Now, this change should at least have been mentioned in the 2.0 release notes.

The release notes say:

To support native Python 2 strings, older Django versions had to accept both bytestrings and unicode strings. Now that Python 2 support is dropped, bytestrings should only be encountered around input/output boundaries (handling of binary fields or HTTP streams, for example). You might have to update your code to limit bytestring usage to a minimum, as Django no longer accepts bytestrings in certain code paths.

For example, reverse() now uses str() instead of force_text() to coerce the args and kwargs it receives, prior to their placement in the URL. For bytestrings, this creates a string with an undesired b prefix as well as additional quotes (str(b'foo') is "b'foo'"). To adapt, call decode() on the bytestring before passing it to reverse().

Does that cover what you're looking for?

comment:6 by Collin Anderson, 7 years ago

"Django no longer accepts bytestrings in certain code paths"

"Python 3 is stricter about this concept and it's a good thing IMHO." - Right, it raises an error when you pass bytes, it seems to me Django should do the same thing.

It seems to me the current behavior is really bad for projects that are upgrading. Maybe the force_text() calls (#27795) need to be bytes_not_allowed() or something. (Too late to raise an error for 2.0, but maybe we could start raising a warning?)

Version 0, edited 7 years ago by Collin Anderson (next)

comment:7 by Jon Dufresne, 7 years ago

It seems to me the current behavior is really bad for projects that are upgrading, because there's no warning or error, just silent bad data. Maybe the force_text() calls (#27795) need to be bytes_not_allowed() or something. (Too late to raise an error for 2.0, but maybe we could start raising a warning?)

This idea of warn/error when mixing bytes with text is actually built into Python with the `-b` and `-bb` options. I've used it to great success in my project to discover mishandling of bytes and text. I don't think Django needs to duplicate this functionality, but maybe the release notes could be modified to recommend using -b to help discover missed cases when upgrading.

comment:8 by Simon Charette, 7 years ago

Thanks for release note mention Jon, I completely missed it when I skimmed through the docs.

+1 on mentioning the -b option!

comment:9 by Tim Graham, 7 years ago

Resolution: wontfix
Status: newclosed
Summary: Bytestrings on CharFields getting saved as "b'data'"Add a warning when saving bytestrings to CharFields

PR to mention -b in the release notes.

comment:10 by GitHub <noreply@…>, 7 years ago

In c10cb97:

Refs #29086 -- Doc'd how to detect bytestring mistakes.

comment:11 by Tim Graham <timograham@…>, 7 years ago

In ddc4982:

[2.0.x] Refs #29086 -- Doc'd how to detect bytestring mistakes.

Backport of c10cb9716f8fd7398a8206cd8b33ed2f03065f85 from master

comment:12 by Adam, 6 years ago

Resolution: wontfix
Status: closednew

I'd like to re-open this for discussion, if possible.

Ideally, saving bytes to a CharField wouldn't give a warning. It would just trust that the developer knows what they're doing. Namely, that they took care of the decode step, and to just pass the bytes right onto the database.

I have a situation here where I have a CharField, I have some byte data, I know it's encoded in the same manner as the destination varchar field in the database. All these things are known to me. And yet, Django attempts to coerce the byte value into a string (adding a "b" in front and wrapping the thing in quotes as it goes), just before it's converted back into bytes when its sent to the database. That doesn't really make sense.

The best course of action here is to just trust the developer. If the CharField value is bytes, leave it be.

comment:13 by Mariusz Felisiak, 6 years ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top