Context Navigation

← Previous Ticket
Next Ticket →

#307 closed defect (invalid)

Use unicode strings u"bla-bla" in SQL-queries for compatibility with national languages

Reported by:	mordaha@…	Owned by:	Adrian Holovaty
Component:	Metasystem	Version:
Severity:	trivial	Keywords:	unicode strings in sql queries
Cc:		Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

Use unicode string in SQL-queries for compatibility with national languages (when you pass SQL-query as python unicode - database backend (MySQLdb) authomaticaly converts it from python encoding to mysql-connection encoding)

I found it in meta/fields.py (may be in some other places):

def get_db_prep_lookup(self, lookup_type, value):
        ...skip...

        elif lookup_type in ('contains', 'icontains'):
            return ["%%%s%%" % prep_for_like_query(value)]
            # above string must be:
            # return [u"%%%s%%" % prep_for_like_query(value)] # using unicode
        elif lookup_type == 'iexact':

without that u queries like field_contains=unicode_string_with_national_characters will returns nothing

Change History (3)

comment:1 by hugo <gb@…>, 19 years ago

Hey, say hello to a can of worms :-)

The problem isn't really solved by just passing in unicode strings - actually it highly depends on the backend and the server setting on what will happen (and on the DBAPI implementation used). And you can't just do u"" string interpolation - stuff within django is allways bytestrings encoded in utf-8, so to get the unicode version of data you would have to use pre_for_like_query(value).decode('utf-8').

BTW: the mysql never sees any direct unicode stuff, it only sees utf-8 encoded strings - so if we pass u"" strings to the mysql driver, the driver code re-encodes those as utf-8 and passes that along to your database. And hopefully your database is running in utf-8 charset, because otherwise it might break on any char that's not in your home encoding.

PostgreSQL has something similar: with set clientencoding we could tell the database that we have all our client stuff encoded in utf-8 and then the database should convert into the native database encoding. With sqlite it's different: it allways stores utf-8 strings and returns u"" strings with the python DBAPI implementation. Except if it doesn't - for example if you hook up converters/transformations, because those will receive and send utf-8 encoded bytestrings and not unicode strings.

Maybe the right way would be to go for utf-8 client encoding in the database drivers and to make sure that we allways pass them utf-8 strings (or unicode strings if the driver accepts that). But then we would have to require the users to set up their databases with utf-8 encoding, because otherwise they will sooner or later get unicode encoding/decoding errors in the database connection.

comment:2 by Adrian Holovaty, 19 years ago

priority:	high → normal
Severity:	critical → normal

comment:3 by anonymous, 19 years ago

priority:	normal → lowest
Resolution:	→ invalid
Severity:	normal → trivial
Status:	new → closed

Ok, i always will use .encode('utf8')

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#307 closed defect (invalid)

Use unicode strings u"bla-bla" in SQL-queries for compatibility with national languages

Description

Change History (3)

comment:1 by hugo <gb@…>, 19 years ago

comment:2 by Adrian Holovaty, 19 years ago

comment:3 by anonymous, 19 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us