Opened 19 years ago

Closed 19 years ago

#307 closed defect (invalid)

Use unicode strings u"bla-bla" in SQL-queries for compatibility with national languages

Reported by: mordaha@… Owned by: Adrian Holovaty
Component: Metasystem Version:
Severity: trivial Keywords: unicode strings in sql queries
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Use unicode string in SQL-queries for compatibility with national languages (when you pass SQL-query as python unicode - database backend (MySQLdb) authomaticaly converts it from python encoding to mysql-connection encoding)

I found it in meta/fields.py (may be in some other places):

def get_db_prep_lookup(self, lookup_type, value):
        ...skip...

        elif lookup_type in ('contains', 'icontains'):
            return ["%%%s%%" % prep_for_like_query(value)]
            # above string must be:
            # return [u"%%%s%%" % prep_for_like_query(value)] # using unicode
        elif lookup_type == 'iexact':

without that u queries like field_contains=unicode_string_with_national_characters will returns nothing

Change History (3)

comment:1 by hugo <gb@…>, 19 years ago

Hey, say hello to a can of worms :-)

The problem isn't really solved by just passing in unicode strings - actually it highly depends on the backend and the server setting on what will happen (and on the DBAPI implementation used). And you can't just do u"" string interpolation - stuff within django is allways bytestrings encoded in utf-8, so to get the unicode version of data you would have to use pre_for_like_query(value).decode('utf-8').

BTW: the mysql never sees any direct unicode stuff, it only sees utf-8 encoded strings - so if we pass u"" strings to the mysql driver, the driver code re-encodes those as utf-8 and passes that along to your database. And hopefully your database is running in utf-8 charset, because otherwise it might break on any char that's not in your home encoding.

PostgreSQL has something similar: with set clientencoding we could tell the database that we have all our client stuff encoded in utf-8 and then the database should convert into the native database encoding. With sqlite it's different: it allways stores utf-8 strings and returns u"" strings with the python DBAPI implementation. Except if it doesn't - for example if you hook up converters/transformations, because those will receive and send utf-8 encoded bytestrings and not unicode strings.

Maybe the right way would be to go for utf-8 client encoding in the database drivers and to make sure that we allways pass them utf-8 strings (or unicode strings if the driver accepts that). But then we would have to require the users to set up their databases with utf-8 encoding, because otherwise they will sooner or later get unicode encoding/decoding errors in the database connection.

comment:2 by Adrian Holovaty, 19 years ago

priority: highnormal
Severity: criticalnormal

comment:3 by anonymous, 19 years ago

priority: normallowest
Resolution: invalid
Severity: normaltrivial
Status: newclosed

Ok, i always will use .encode('utf8')

Note: See TracTickets for help on using tickets.
Back to Top