Context Navigation

← Previous Ticket
Next Ticket →

#13758 closed

MySQLdb utf8_bin and django causes UnicodeDecodeError — at Initial Version

Reported by:	sam.vevang@…	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:	utf8_binMySQLdb collation unicode bytestring
Cc:		Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	yes	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

Issue:
I have a Model with a FileField. When I delete that instances of that model that have unicode characters in their filenames, I get a

'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in
range(128)

I finally traced the problem back to my database collation: utf8_bin. I chose utf8_bin so I could order the strings in a case-sensitive manner. FYI, MySQLdb does not return python unicode strings with a utf8_bin collation, it returns utf8 bytestrings. for a brief description of that issue see:
http://code.djangoproject.com/ticket/8340#comment:4

The traceback from my exception reveals the exception being thrown in
"django/db/models/fields/files.py" in get_prep_value (line 248).
FileField is a subclass of Field, but implements the same backend
MySQL type (varchar) as a CharField. However it seems that FileField
and CharField have completely different implementations of
get_prep_db.

Here is CharField's implementation:

def to_python(self, value):

if isinstance(value, basestring) or value is None:

return value

return smart_unicode(value)

def get_prep_value(self, value):

return self.to_python(value)

Here is Filefield's:

def get_prep_value(self, value):

"Returns field's value prepared for saving into a database."
# Need to convert File objects provided via a form to unicode for database insertion
if value is None:

return None

return unicode(value)

My experimentations revealed that if I replace the FileField
implementation of get_prep_value with CharField's implementation, the exception
goes away. The issue is that the default encoding is ascii and so
unicode() called on a utf8 byte str blows up. The CharField
implementation simply checks if the value is an instance of basestring
and quietly passes it through.

Note: See TracTickets for help on using tickets.

Issues

Context Navigation

#13758 closed

MySQLdb utf8_bin and django causes UnicodeDecodeError — at Initial Version

Description

Change History (0)

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us