Opened 19 years ago
Closed 18 years ago
#952 closed defect (wontfix)
[patch] Allow for database client encoding configuration from project settings
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | normal | Keywords: | |
Cc: | Maniac@…, jm.bugtracking@… | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
This allows the user to define the client encoding of his database and get it automatically used throughout Django.
Includes Postgres and MySQL implementations.
Attachments (4)
Change History (65)
by , 19 years ago
Attachment: | encoding.diff added |
---|
comment:1 by , 19 years ago
The second diff fixes a typo in postgres driver. This supercedes and closes #814
comment:2 by , 19 years ago
please keep in mind that the DEFAULT_CHARSET can be changed, so it would be better to not introduce a new setting DATABASE_ENCODING, but just to use the DEFAULT_ENCODING and put some hook into the backends that will do database specific deeds with that (the postgresql backend can then change 'utf-8' to 'unicode').
comment:3 by , 19 years ago
This would require storing a table of encoding names for every driver. Besides this configuration setting is useful. For example - you might be using charset A for your database - maybe for legacy reasons - and charset B for the middleware and the frontend.
I thought that it would be both simpler and more flexible to allow the developer to set up his database settings separately (most ASCII-speaking developers won't need it anyway). But reusing DEFAULT_CHARSET is neat.
comment:4 by , 19 years ago
Besides creating the translation tables per driver would require parsing http://www.iana.org/assignments/character-sets for every possible character set name and alias, finding out all of the mappings for these names within PostgreSQL and MySQL and storing these tables in Django sources. These tables would probably exceed the line count of the drivers themselves. Is this really necessary?
comment:5 by , 19 years ago
you can't use different encodings for Django and your database backend, because Django never sends Unicode strings but allways sends bytestrings encoded in DEFAULT_ENCODING and so the database driver can't know what to do with it besides passing it along. So the DEFAULT_ENCODING will allways have to match DATABASE_ENCODING, so it's a much better idea to directly tie them together.
comment:6 by , 19 years ago
Point taken, I agree. How many encodings should be included in the dicts per driver if we decide to handle it that way? IANA list is just insanely long, especially including all the possible aliases.
comment:7 by , 19 years ago
Hugo, this is exactly what this patch is intended to do (as far as I understand). If you set client encoding for database this is the encoding which database expects on client regardless of its internal encoding. If you have legacy database in, say, windows-1251 (cyrillic) and want to use your Django app modern proper way with utf-8 then you're in trouble since noone tells your database to convert windows-1251 to utf-8 upon SELECT.
The separate setting is required because you can't use DEFAULT_CHARSET directly for SET ENCODING in database because databases do the lame thing with using some non standard names for encodings. So DATABASE_ENCODING is merely intended as a 'translation' of DEFAULT_CHARSET for your database driver. Doing it automatically would require a huge mapping table in a form
(('utf-8','unicode'),('windows-1251','cp1251') ... )
for each backend as Julik pointed. I, for example, wouldn't start implementing a new backend if I knew it would require to match some several dozens strings manually. On the contrary each Django user would wrk with maybe one or two legacy encodings.
Julik, may be document it in the comment to the setting since it's not entirely obvious.
comment:8 by , 19 years ago
Ok, we can do it the other way.
Let's have a dict of, say, 6 most common encodings per driver (unicode, JIS, cp-1251 and maybe some more). If the user is using some DEFAULT_CHARSET which is not covered by the dict and does NOT specify DATABASE_ENCODING an exception is raised along the lines of "You are using some charset but you also need to tell your database to support it, and Django doesn't know how to do it for you". Otherwise, if the DEFAULT_CHARSET is mapped by a dict then we just use it straight away without requiring any bothering with DATABASE_ENCODING.
Will that be reasonable?
comment:9 by , 19 years ago
sounds reasonable to me - in most cases users won't need to change that setting, as they will just stick with one of the standard encodings. And for unusual situations you can specify what it should do.
comment:10 by , 19 years ago
Ok, I'm cooking the new patch. I think DATABASE_ENCODING be better renamed DATABASE_CLIENT_CHARSET (if we already have default charset) and still kept in the settings.py but left empty.
comment:11 by , 19 years ago
Cc: | added |
---|---|
Keywords: | i18n added |
Ok, this is the patch. I constrained the dicts to the absolute minimum. There is also a unicode flag that you can set in the driver as well as a charset flag, I don't really know how to deal with these properly at the moment - if Hugo can lend a hand would be great. But it worksforme now.
I also renamed the variable and added bailing if this can't be set up (because it's really wrong when it's not being taken care of explicitly).
comment:12 by , 19 years ago
Hm, wait a second. Let's see. Is it DEFAULT_ENCODING as the sys.defaultencoding or the DEFAULT_CHARSET defined in Django?
by , 19 years ago
Attachment: | corrected.diff added |
---|
Why I never get it right the first time I wonder
comment:13 by , 19 years ago
Cc: | added; removed |
---|---|
Keywords: | encodings database i18n added; encodings%2Bdatabase%2Bi18n removed |
Summary: | %5Bpatch%5D+Allow+for+database+client+encoding+configuration+from+project+settings → [patch] Allow for database client encoding configuration from project settings |
comment:14 by , 19 years ago
Component: | Translations → Database wrapper |
---|---|
Owner: | changed from | to
repair spam damage at least a bit
comment:15 by , 19 years ago
Component: | Database wrapper → Documentation |
---|---|
milestone: | Version 0.91 → Version 1.0 |
Owner: | changed from | to
Severity: | minor → major |
Type: | → defect |
Version: | SVN → new-admin |
comment:16 by , 19 years ago
Type: | defect |
---|
comment:17 by , 19 years ago
Type: | → Submit changes |
---|
comment:18 by , 19 years ago
Component: | Documentation → Contrib apps |
---|---|
milestone: | Version 1.0 → Version 0.93 |
Owner: | changed from | to
priority: | low → lowest |
Type: | Submit changes → enhancement |
Version: | new-admin |
comment:19 by , 19 years ago
Version: | → Contrib apps |
---|
comment:20 by , 19 years ago
Type: | enhancement → defect |
---|
comment:21 by , 18 years ago
Component: | Contrib apps → Cache system |
---|---|
milestone: | Version 0.93 → Version 1.0 |
Owner: | changed from | to
priority: | lowest → high |
Severity: | major → blocker |
Type: | defect → enhancement |
comment:22 by , 18 years ago
milestone: | Version 1.0 → Version 0.92 |
---|---|
priority: | high → low |
Severity: | blocker → major |
Type: | enhancement → defect |
Version: | Contrib apps → new-admin |
Hello. nice site!
comment:23 by , 18 years ago
Type: | defect |
---|
comment:24 by , 18 years ago
Type: | → Submit changes |
---|
comment:25 by , 18 years ago
Type: | Submit changes → defect |
---|
comment:26 by , 18 years ago
Type: | defect |
---|
comment:27 by , 18 years ago
Type: | → defect |
---|
comment:28 by , 18 years ago
Cc: | added; removed |
---|---|
Keywords: | empty.com added; None removed |
Summary: | [patch] Allow for database client encoding configuration from project settings → empty.com |
comment:29 by , 18 years ago
Cc: | removed |
---|---|
Component: | Cache system → Database wrapper |
Keywords: | empty.com removed |
milestone: | Version 0.92 |
Owner: | changed from | to
priority: | low → normal |
Severity: | major → normal |
Summary: | empty.com → [patch] Allow for database client encoding configuration from project settings |
Type: | defect → enhancement |
Version: | new-admin → SVN |
comment:30 by , 18 years ago
Cc: | added |
---|---|
Keywords: | empty.com added |
Summary: | [patch] Allow for database client encoding configuration from project settings → empty.com |
comment:31 by , 18 years ago
Cc: | removed |
---|---|
Keywords: | empty.com removed |
Summary: | empty.com → [patch] Allow for database client encoding configuration from project settings |
comment:32 by , 18 years ago
milestone: | → normal |
---|
comment:33 by , 18 years ago
Type: | enhancement |
---|
comment:34 by , 18 years ago
Type: | → defect |
---|
comment:35 by , 18 years ago
Type: | defect |
---|
comment:36 by , 18 years ago
Type: | → defect |
---|
comment:37 by , 18 years ago
Type: | defect |
---|
comment:38 by , 18 years ago
Type: | → defect |
---|
comment:39 by , 18 years ago
Type: | defect |
---|
comment:40 by , 18 years ago
Type: | → defect |
---|
comment:41 by , 18 years ago
Type: | defect |
---|
comment:42 by , 18 years ago
Type: | → defect |
---|
comment:43 by , 18 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
This is a tricky one.
Whatever we choose is going to change when we deal with the whole sticky unicodification aspect... so for now, I'm going to punt and mark this WONTFIX with the understanding that we'll fix it in a comprehensive manner in the future. For the record, the answer now is to just make sure your database's default encoding *does* match, which hopefully isn't too much of a nasty hack.
comment:44 by , 18 years ago
Type: | defect |
---|
comment:45 by , 18 years ago
Type: | → defect |
---|
comment:46 by , 18 years ago
Type: | defect |
---|
comment:47 by , 18 years ago
Type: | → Submit changes |
---|
comment:48 by , 18 years ago
Type: | Submit changes |
---|
comment:50 by , 18 years ago
Cc: | added |
---|
Jacon, this ticket was mentioned in a thread on django-developers but I suppose those messages are easy to miss so I repost the relevant bits here.
The ongoing unicodification will actually require support for legacy encoded DBs and this ticket looks like a best place to do this. So may be instead of wontfixing it it's better to rework patches for the new vision (which means just dropping encoding into DEFAULT_CHARSET leaving data from DB in unicode)?
comment:52 by , 18 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
comment:55 by , 18 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:56 by , 18 years ago
Triage Stage: | Accepted → Design decision needed |
---|
We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.
As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)
http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf
comment:59 by , 18 years ago
Please ACCEPT this commit, it would help. I had also this error ; I modified structure of the tables and set UTF8 like suggested there, and I don't have it anymore.
I am using MySQL 5.0.24. Maybe this bug is not your fault, but it wouldn't cost anything to add the encoding declaration in the source... and that would solve some problems. But not all that exist with accents...
(about ticket #2896)
comment:60 by , 18 years ago
Cc: | added; removed |
---|
comment:61 by , 18 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
In light of the changes in the unicode branch, this setting is no longer needed. All our supported database backends that can handle variable server encodings take care of that information transparently. We just hand them unicode strings or UTF-8 bytestrings and it should work transparently.
So this is wontfix and the root problem will go away once unicode branch merges into trunk.
Necessary changes