#3976 closed (wontfix)
[unicode] Unicode models
Reported by: | Owned by: | Adrian Holovaty | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | ||
Cc: | Malcolm Tredinnick, Maniac@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Models has to output their data from __unicode__ method, not __str__ as it is used now. I have a working code that does roughly this: changes current __str__ to __unicode__ and defines a __str__ as unicode(self).encode(settings.DEFAULT_CHARSET).
The big part is documenting the change: users should use __unicode__ instead of __str__ for their model classes. If there's no principal objections I could make a patch.
Change History (4)
comment:1 by , 18 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 18 years ago
I think this is creating unnecessary work for the developer when it won't be required in a number of cases.
Only for data that is known to be in ASCII. But for more interesting data I'd say in most cases some kind of attention is required because people use model fields in __str__
and these fields now changed their type from str to unicode. Thus developers should anyway check all their __str__
's and:
- either replace it with
__unicode__
changing all '...' to u'...' inside - or add .encode('utf-8') to self.field_name's
We can encourage people (in docs) to use the first approach since it's better for consistency and for performance (slightly). And we can provide a default __str__
for models that looks like this:
def __str__(self): if hasattr(self, '__unicode__'): return unicode(self).encode('utf-8') # current __str__ implementation with class name
Leaving models with only __str__
asks for troubles since simple unicode(instance) will try to decode it without utf-8 and there's no guaranty that everyone and everywhere will use smart_unicode.
What do you think?
P.S. I agree on changing settings.DEFAULT_CHARSET to utf-8 everywhere except HTTP IO.
P.P.S. I don't know if it's better to discuss this in django-developers?
comment:3 by , 18 years ago
Their code is already broken if they are sending around random bytestrings. We won't make it any less broken by interpreting the str() strings as UTF-8 bytestrings. In fact, we will catch the errors faster. People will also possibly want to write {{{unicode_}} methods; that's a given. But we shouldn't force them if they don't want to. The developer knows (or should know) how their class behaves. We leave it up to them to make a decision.
I understand the change you want to make (it's exactly the same as the StrAndUnicode class, for example), but I don't think it's a good idea/necessary to do it by default.
comment:4 by , 18 years ago
Ok, agreed on not prematurely fixing current __str__
's.
People will also possibly want to write {{{unicode}} methods
Yes. But they will want to use str(obj)
somewhere and it won't automatically pick up their unicode. This is why I'm suggesting an addition in current Model.str to pick it up and convert using smart_unicode. It won't break anything and is useful.
I think this is creating unnecessary work for the developer when it won't be required in a number of cases.
Firstly, models can quite happily use the str() method. It just has to return UTF-8 bytestrings. So many models will not need any changes at all. It's not really practical to say that we are only going to handle unicode strings. We also have to be able to handle bytestrings -- we just have to have a convention about the encoding of those bytestrings and for consistency with the rest of Python, we'll use UTF-8.
Secondly, encoding to settings.DEFAULT_CHARSET is wrong (it's something Django does a lot at the moment, but I'm slowly burying all of them in the unicode branch), because that will return a bytestring that is not necessarily UTF-8 and you will have no way of knowing its encoding. So we would have inconsistent internal bytestring encoding. The key phrase in the unicode branch is consistency. :-)
Remember that
Model.__str__
is not only used for template output. It can also be used in logs or other forms of output that is not necessarily controlled by Django. We should only convert to DEFAULT_CHARSET at the last minute (e.g. Template.render()).We do need to document a couple of coding practices and assumptions (e.g. all bytestrings passed to Django are assumed to be in UTF-8). However, it shouldn't be too painful.
I'm going to close this as wontfix, because I think the key point is not quite right. I'm keeping a list of doc updates required (it's only two points at the moment) and will update the Wiki page shortly so that we don't lose them.