#3454 closed (fixed)
sqlite backend is using row_factory when it should be using text_factory
Reported by: | (removed) | Owned by: | Adrian Holovaty |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | unicode-branch | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
currently, sqlite has
def utf8rowFactory(cursor, row): def utf8(s): if type(s) == unicode: return s.encode("utf-8") return s return [utf8(r) for r in row]
for row_factory; problem here is that it's rebuilding each record regardless of whether or not the utf8 conversion is required. doing
Database.text_factory = lambda s:s.decode("utf-8")
limits the conversion to just TEXT objects.
This is a bit faster; that said, I'm wondering why the forced conversion- sqlite stores data in utf8, if
Database.text_factory = str
ware set, the whole decoding/encoding would be bypassed, and the native encoding (utf8) would be passed back.
In terms of performance, using Database.text_factory = lambda s:s.decode("utf-8")
gains are dependant upon the column types; greater # of non-text fields, greater the gain.
Real gain is via turning off the encode/decode and using str directly (underlying utf8); same gain in terms of avoiding extra inspection, but avoids all the extra work.
Only downside to either change I can see is that raw sql queries would return str instead of sqlites unicode. Not really sure if this is an actual issue however (don't see any other such limitation in the backends).
Patch is attached for the encode/decode variant; unless there are good reasons, would just bypass the encoding/decoding entirely.
Attachments (1)
Change History (7)
by , 18 years ago
comment:1 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 18 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Design decision needed → Accepted |
I don't think the use of decode()
in this patch is correct. s.decode('utf-8')
is for converting a UTF-8 encoded string into a unicode object, but I suspect you want to be converting unicode objects into UTF-8 for storage purposes (that was what we were doing previously).
We cannot bypass the encoding step entirely, because there is no guarantee at all that internal strings will be UTF-8 encoded streams of bytes (or immediately convertible to such by Python). We are going to move to unicode everywhere outside of the internal/external interfaces (which will be conversion points), but that hasn't happened yet.
Other than that, the motivation behind the patch looks like a good find. Thanks.
comment:3 by , 18 years ago
Yeah, that ought to be s.encode('utf-8').
Would suggest sticking a note in the sqlite backend code that when unicode is default, to just disable these conversions since sqlite already spits back unicode.
comment:4 by , 17 years ago
Keywords: | unicode-branch added |
---|
This ticket has become a non-issue in the unicode branch (no converter is needed at all). Will close it when that branch is merged into trunk.
comment:5 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
use text_factory instead of row_factory for unicode conversion