Opened 19 years ago
Closed 19 years ago
#307 closed defect (invalid)
Use unicode strings u"bla-bla" in SQL-queries for compatibility with national languages
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Metasystem | Version: | |
Severity: | trivial | Keywords: | unicode strings in sql queries |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Use unicode string in SQL-queries for compatibility with national languages (when you pass SQL-query as python unicode - database backend (MySQLdb) authomaticaly converts it from python encoding to mysql-connection encoding)
I found it in meta/fields.py (may be in some other places):
def get_db_prep_lookup(self, lookup_type, value): ...skip... elif lookup_type in ('contains', 'icontains'): return ["%%%s%%" % prep_for_like_query(value)] # above string must be: # return [u"%%%s%%" % prep_for_like_query(value)] # using unicode elif lookup_type == 'iexact':
without that u queries like field_contains=unicode_string_with_national_characters will returns nothing
Change History (3)
comment:1 by , 19 years ago
comment:2 by , 19 years ago
priority: | high → normal |
---|---|
Severity: | critical → normal |
comment:3 by , 19 years ago
priority: | normal → lowest |
---|---|
Resolution: | → invalid |
Severity: | normal → trivial |
Status: | new → closed |
Ok, i always will use .encode('utf8')
Note:
See TracTickets
for help on using tickets.
Hey, say hello to a can of worms :-)
The problem isn't really solved by just passing in unicode strings - actually it highly depends on the backend and the server setting on what will happen (and on the DBAPI implementation used). And you can't just do u"" string interpolation - stuff within django is allways bytestrings encoded in utf-8, so to get the unicode version of data you would have to use pre_for_like_query(value).decode('utf-8').
BTW: the mysql never sees any direct unicode stuff, it only sees utf-8 encoded strings - so if we pass u"" strings to the mysql driver, the driver code re-encodes those as utf-8 and passes that along to your database. And hopefully your database is running in utf-8 charset, because otherwise it might break on any char that's not in your home encoding.
PostgreSQL has something similar: with set clientencoding we could tell the database that we have all our client stuff encoded in utf-8 and then the database should convert into the native database encoding. With sqlite it's different: it allways stores utf-8 strings and returns u"" strings with the python DBAPI implementation. Except if it doesn't - for example if you hook up converters/transformations, because those will receive and send utf-8 encoded bytestrings and not unicode strings.
Maybe the right way would be to go for utf-8 client encoding in the database drivers and to make sure that we allways pass them utf-8 strings (or unicode strings if the driver accepts that). But then we would have to require the users to set up their databases with utf-8 encoding, because otherwise they will sooner or later get unicode encoding/decoding errors in the database connection.