Opened 19 years ago

Closed 18 years ago

#952 closed defect (wontfix)

[patch] Allow for database client encoding configuration from project settings

Reported by: me@… Owned by: Adrian Holovaty
Component: Database layer (models, ORM) Version: dev
Severity: normal Keywords:
Cc: Maniac@…, jm.bugtracking@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

This allows the user to define the client encoding of his database and get it automatically used throughout Django.
Includes Postgres and MySQL implementations.

Attachments (4)

encoding.diff (2.6 KB ) - added by me@… 19 years ago.
Necessary changes
encoding.2.diff (3.1 KB ) - added by me@… 19 years ago.
necessary changes
database_client_charset.diff (5.8 KB ) - added by me@… 19 years ago.
Automagic added
corrected.diff (5.8 KB ) - added by me@… 19 years ago.
Why I never get it right the first time I wonder

Download all attachments as: .zip

Change History (65)

by me@…, 19 years ago

Attachment: encoding.diff added

Necessary changes

by me@…, 19 years ago

Attachment: encoding.2.diff added

necessary changes

comment:1 by me@…, 19 years ago

The second diff fixes a typo in postgres driver. This supercedes and closes #814

comment:2 by hugo, 19 years ago

please keep in mind that the DEFAULT_CHARSET can be changed, so it would be better to not introduce a new setting DATABASE_ENCODING, but just to use the DEFAULT_ENCODING and put some hook into the backends that will do database specific deeds with that (the postgresql backend can then change 'utf-8' to 'unicode').

comment:3 by me@…, 19 years ago

This would require storing a table of encoding names for every driver. Besides this configuration setting is useful. For example - you might be using charset A for your database - maybe for legacy reasons - and charset B for the middleware and the frontend.

I thought that it would be both simpler and more flexible to allow the developer to set up his database settings separately (most ASCII-speaking developers won't need it anyway). But reusing DEFAULT_CHARSET is neat.

comment:4 by me@…, 19 years ago

Besides creating the translation tables per driver would require parsing http://www.iana.org/assignments/character-sets for every possible character set name and alias, finding out all of the mappings for these names within PostgreSQL and MySQL and storing these tables in Django sources. These tables would probably exceed the line count of the drivers themselves. Is this really necessary?

comment:5 by hugo, 19 years ago

you can't use different encodings for Django and your database backend, because Django never sends Unicode strings but allways sends bytestrings encoded in DEFAULT_ENCODING and so the database driver can't know what to do with it besides passing it along. So the DEFAULT_ENCODING will allways have to match DATABASE_ENCODING, so it's a much better idea to directly tie them together.

comment:6 by me@…, 19 years ago

Point taken, I agree. How many encodings should be included in the dicts per driver if we decide to handle it that way? IANA list is just insanely long, especially including all the possible aliases.

comment:7 by Maniac <Maniac@…>, 19 years ago

Hugo, this is exactly what this patch is intended to do (as far as I understand). If you set client encoding for database this is the encoding which database expects on client regardless of its internal encoding. If you have legacy database in, say, windows-1251 (cyrillic) and want to use your Django app modern proper way with utf-8 then you're in trouble since noone tells your database to convert windows-1251 to utf-8 upon SELECT.

The separate setting is required because you can't use DEFAULT_CHARSET directly for SET ENCODING in database because databases do the lame thing with using some non standard names for encodings. So DATABASE_ENCODING is merely intended as a 'translation' of DEFAULT_CHARSET for your database driver. Doing it automatically would require a huge mapping table in a form

  (('utf-8','unicode'),('windows-1251','cp1251') ... )

for each backend as Julik pointed. I, for example, wouldn't start implementing a new backend if I knew it would require to match some several dozens strings manually. On the contrary each Django user would wrk with maybe one or two legacy encodings.

Julik, may be document it in the comment to the setting since it's not entirely obvious.

comment:8 by me@…, 19 years ago

Ok, we can do it the other way.
Let's have a dict of, say, 6 most common encodings per driver (unicode, JIS, cp-1251 and maybe some more). If the user is using some DEFAULT_CHARSET which is not covered by the dict and does NOT specify DATABASE_ENCODING an exception is raised along the lines of "You are using some charset but you also need to tell your database to support it, and Django doesn't know how to do it for you". Otherwise, if the DEFAULT_CHARSET is mapped by a dict then we just use it straight away without requiring any bothering with DATABASE_ENCODING.

Will that be reasonable?

comment:9 by hugo, 19 years ago

sounds reasonable to me - in most cases users won't need to change that setting, as they will just stick with one of the standard encodings. And for unusual situations you can specify what it should do.

comment:10 by me@…, 19 years ago

Ok, I'm cooking the new patch. I think DATABASE_ENCODING be better renamed DATABASE_CLIENT_CHARSET (if we already have default charset) and still kept in the settings.py but left empty.

by me@…, 19 years ago

Automagic added

comment:11 by me@…, 19 years ago

Cc: me@… added
Keywords: i18n added

Ok, this is the patch. I constrained the dicts to the absolute minimum. There is also a unicode flag that you can set in the driver as well as a charset flag, I don't really know how to deal with these properly at the moment - if Hugo can lend a hand would be great. But it worksforme now.

I also renamed the variable and added bailing if this can't be set up (because it's really wrong when it's not being taken care of explicitly).

comment:12 by me@…, 19 years ago

Hm, wait a second. Let's see. Is it DEFAULT_ENCODING as the sys.defaultencoding or the DEFAULT_CHARSET defined in Django?

by me@…, 19 years ago

Attachment: corrected.diff added

Why I never get it right the first time I wonder

comment:13 by anonymous, 19 years ago

Cc: me@… added; me%40julik.nl removed
Keywords: encodings database i18n added; encodings%2Bdatabase%2Bi18n removed
Summary: %5Bpatch%5D+Allow+for+database+client+encoding+configuration+from+project+settings[patch] Allow for database client encoding configuration from project settings

comment:14 by anonymous, 18 years ago

Component: TranslationsDatabase wrapper
Owner: changed from anonymous to Adrian Holovaty

repair spam damage at least a bit

comment:15 by Greg, 18 years ago

Component: Database wrapperDocumentation
milestone: Version 0.91Version 1.0
Owner: changed from Adrian Holovaty to Jacob
Severity: minormajor
Type: defect
Version: SVNnew-admin

comment:16 by candisil, 18 years ago

Type: defect

comment:17 by Old Lady, 18 years ago

Type: Submit changes

comment:18 by Philip, 18 years ago

Component: DocumentationContrib apps
milestone: Version 1.0Version 0.93
Owner: changed from Jacob to Adrian Holovaty
priority: lowlowest
Type: Submit changesenhancement
Version: new-admin

comment:19 by Soma Pill, 18 years ago

Version: Contrib apps

comment:20 by valbienn, 18 years ago

Type: enhancementdefect

comment:21 by Philip, 18 years ago

Component: Contrib appsCache system
milestone: Version 0.93Version 1.0
Owner: changed from Adrian Holovaty to Jacob
priority: lowesthigh
Severity: majorblocker
Type: defectenhancement

comment:22 by Mike, 18 years ago

milestone: Version 1.0Version 0.92
priority: highlow
Severity: blockermajor
Type: enhancementdefect
Version: Contrib appsnew-admin

Hello. nice site!

comment:23 by winstrol, 18 years ago

Type: defect

comment:24 by Vicodin, 18 years ago

Type: Submit changes

comment:25 by Licex, 18 years ago

Type: Submit changesdefect

comment:26 by Licex, 18 years ago

Type: defect

comment:27 by Licex, 18 years ago

Type: defect

comment:28 by allegra 180mg, 18 years ago

Cc: empty.com added; me@… removed
Keywords: empty.com added; None removed
Summary: [patch] Allow for database client encoding configuration from project settingsempty.com

comment:29 by Adrian Holovaty, 18 years ago

Cc: empty.com removed
Component: Cache systemDatabase wrapper
Keywords: empty.com removed
milestone: Version 0.92
Owner: changed from Jacob to Adrian Holovaty
priority: lownormal
Severity: majornormal
Summary: empty.com[patch] Allow for database client encoding configuration from project settings
Type: defectenhancement
Version: new-adminSVN

comment:30 by (none), 18 years ago

Cc: empty.com added
Keywords: empty.com added
Summary: [patch] Allow for database client encoding configuration from project settingsempty.com

comment:31 by Adrian Holovaty, 18 years ago

Cc: empty.com removed
Keywords: empty.com removed
Summary: empty.com[patch] Allow for database client encoding configuration from project settings

comment:32 by Levitra Cheap, 18 years ago

milestone: normal

comment:33 by vanity, 18 years ago

Type: enhancement

comment:34 by Cupcakes Kid, 18 years ago

Type: defect

comment:35 by Adipx, 18 years ago

Type: defect

comment:36 by zetacap, 18 years ago

Type: defect

comment:37 by tranny movies, 18 years ago

Type: defect

comment:38 by offshore online gambling, 18 years ago

Type: defect

comment:39 by Hyper Hhyroid, 18 years ago

Type: defect

comment:40 by Maxoderm, 18 years ago

Type: defect

comment:41 by Low Thyroid, 18 years ago

Type: defect

comment:42 by martin, 18 years ago

Type: defect

comment:43 by Jacob, 18 years ago

Resolution: wontfix
Status: newclosed

This is a tricky one.

Whatever we choose is going to change when we deal with the whole sticky unicodification aspect... so for now, I'm going to punt and mark this WONTFIX with the understanding that we'll fix it in a comprehensive manner in the future. For the record, the answer now is to just make sure your database's default encoding *does* match, which hopefully isn't too much of a nasty hack.

comment:44 by roullette, 18 years ago

Type: defect

comment:45 by hoodia gordonii, 18 years ago

Type: defect

comment:46 by lavalife, 18 years ago

Type: defect

comment:47 by Tramadol, 18 years ago

Type: Submit changes

comment:48 by auto refiance loans, 18 years ago

Type: Submit changes

comment:49 by anonymous, 18 years ago

Type: defect

comment:50 by Ivan Sagalaev <Maniac@…>, 18 years ago

Cc: Maniac@… added

Jacon, this ticket was mentioned in a thread on django-developers but I suppose those messages are easy to miss so I repost the relevant bits here.

The ongoing unicodification will actually require support for legacy encoded DBs and this ticket looks like a best place to do this. So may be instead of wontfixing it it's better to rework patches for the new vision (which means just dropping encoding into DEFAULT_CHARSET leaving data from DB in unicode)?

comment:51 by Ivan Sagalaev <Maniac@…>, 18 years ago

Oh... Jacob, sorry for misspelling your name :-(

comment:52 by Adrian Holovaty, 18 years ago

Resolution: wontfix
Status: closedreopened

Reopening. #1528, #2514, #1987, #2810 and #2896 were duplicates.

comment:53 by Adrian Holovaty, 18 years ago

See also the UnicodeInDjango wiki page.

comment:54 by Adrian Holovaty, 18 years ago

#3115 is related for PostgreSQL.

comment:55 by Chris Beaven, 18 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

comment:56 by Michael Radziej <mir@…>, 18 years ago

Triage Stage: AcceptedDesign decision needed

We have a bit of chaos here ... Tickets #3370, #1356 and probably #952 all are about this problem, all are accepted, and #3370 and #1356 have very similar patches. I ask everybody to continue discussion in django-developers ("unicode issues in multiple tickets"), and I ask the authors of these three tickets to work together to find out how to proceed.

As long as it's not clear which path to take, I mark all bugs as "design decision needed." (I assume that the other reviews were not aware of the competing tickets.)

http://groups.google.com/group/django-developers/browse_thread/thread/4b71be8257d42faf

comment:57 by anonymous, 18 years ago

#2584 marked as duplicate.

comment:58 by mir@…, 18 years ago

sorry, it was me.

comment:59 by anonymous, 18 years ago

Please ACCEPT this commit, it would help. I had also this error ; I modified structure of the tables and set UTF8 like suggested there, and I don't have it anymore.
I am using MySQL 5.0.24. Maybe this bug is not your fault, but it wouldn't cost anything to add the encoding declaration in the source... and that would solve some problems. But not all that exist with accents...

(about ticket #2896)

comment:60 by anonymous, 18 years ago

Cc: Maniac@… jm.bugtracking@… added; Maniac@… removed

comment:61 by Malcolm Tredinnick, 18 years ago

Resolution: wontfix
Status: reopenedclosed

In light of the changes in the unicode branch, this setting is no longer needed. All our supported database backends that can handle variable server encodings take care of that information transparently. We just hand them unicode strings or UTF-8 bytestrings and it should work transparently.

So this is wontfix and the root problem will go away once unicode branch merges into trunk.

Note: See TracTickets for help on using tickets.
Back to Top