#924 closed defect (duplicate)
[patch] String filters (lower, upper, capfirst etc.) don't work with international strings
Reported by: | Owned by: | hugo | |
---|---|---|---|
Component: | contrib.admin | Version: | |
Severity: | normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
String filters operate on 8-bit strigns but Python's useful string functions only work with international characters in unicode strings. UTF8ed strings are not changed.
Patch follows.
This, however, will break the patch in ticket 393 where it assumes a string passed to a filter to be 8-bit str. But I beleive this whole thing should be unicode.
Attachments (5)
Change History (24)
by , 19 years ago
Attachment: | 924.1.diff added |
---|
comment:1 by , 19 years ago
Actually this will break much more, I think. We should decide on wether Django should use byte strings or should use unicode strings internally (I would really prefer to change to unicode-only before 1.0, because that's a much saner environment internally than the current utf-8 bytestrings) and then do it completely for the whole system. With your patch programmers would have to keep in mind where there code is running - is it in a filter, it sees unicode, is it outside, it sees utf-8 strings.
So for the filters, I would say that if a filter needs the unicode handling for it's string function to work correctly, it should itself convert to unicode and convert back - at least for so long as we keep Django running on utf-8 bytestrings.
comment:2 by , 19 years ago
I can see that it would break something only if people make custom filters that do explicit str(). But it will be broken anyway in the long run as you suggest unicodisation. So why not earlier? Also I think that such thing can (and should) be made step-by-step instead of making it in the form of a huge patch, but this is a personal preference :-)
Considering you proposal to change specific functions... This means that this rather badly readable conversion will be in 16 places.
comment:3 by , 19 years ago
BTW... I should confess that I never run any test because I just don't know how :-(. Could someone point me to a doc or describe shortly how to run Django's tests?
comment:4 by , 19 years ago
The breakage is somewhere else: you will call code in your filter code. Not all filters are necessarily just using local code. If you call code from outside, that code - if it is Django code - will currently assume that strings are utf-8 encoded bytestrings. But your filters will handle unicode strings, so you will either have to do conversion yourself (so you could have done them in the first place already instead of changing the filter interface) or you will pass them on unchanged and then code outside will break.
A -1 from me on this change. Unicode vs. utf-8 needs to be handled consistently and the current - mostly consistent - way is to have utf-8 bytestrings and to convert to unicode yourself. It's not as if every filter _requires_ unicode strings, it's only those that use functions that themselves are unicode aware, so in the current situation it would be "the right thing to do"[tm] to add the conversions and back-conversions to exactly those filters.
The other thing - possible unicodeization - would first need a round of discussion on the list and quite some talking over, because it's a major step. And if it is done, it should be done _complete_. But that's most definitely outside the scope of this ticket ;-)
Running tests: you need a settings file that has the database set (and it should be a specific testing database, as tables are created content added and removed) and then you can go to django_src/tests and do ./runtests with that settings file active.
comment:5 by , 19 years ago
I got your point. My thought were that if a filter accepts a str and uses str's methods then it will be happy with unicode also since these types are very similar by interface. But I haven't really looked if some filter does something specific to str...
Ok. I'll redo the patch with a decorator converting from and to DEFAULT_CHARSET and hook it to every string function that needs it.
comment:6 by , 19 years ago
Ouch... I really should have looked more closely before. There are plenty of explicit str(value) all over the filters code. Then it won't be that fast and simple :-)
comment:7 by , 19 years ago
BTW, string functions do everywhere str(value) whereas it would be better, imho, to do repr(value). That way one could use string filters on model instances in templates. Is there a reason for str() here?
comment:8 by , 19 years ago
Maniac, see #710; models are supposed to use __str__
for their string values now, and __repr__
goes back to being a proper representation useful for, e.g., debugging. (The old way will still work, since str()
will just grab the value of __repr__
if there is no __str__
.)
comment:9 by , 19 years ago
Here's the patch as discussed with Hugo.
Just one glitch. There is a filter 'title' which uses regexp ([a-z])'([A-Z]). Does anyone knows if it can be internationalised? I mean, are there general 'capital letter' and 'small letter' in Python regexp?
comment:10 by , 19 years ago
Forgot to add... The patch is real and working. Mentioned glitch is not breaking anything it just doesn't work for non-ascii as it never was.
comment:11 by , 19 years ago
Summary: | String filters (lower, upper, capfirst etc.) don't work with international strings → [patch] String filters (lower, upper, capfirst etc.) don't work with international strings |
---|
Just read the FAQ. Adding [patch] to summary. Sorry for spamming :-(
comment:12 by , 19 years ago
Uhm... Not wanting to be annoying, just need to clarify... This ticket was inactive for some time, is it just lack of time or there's something broken in my patch that should be fixed?
comment:13 by , 19 years ago
Owner: | changed from | to
---|
by , 19 years ago
Attachment: | 924.3.diff added |
---|
Updated to current trunk (was broken by fixing #1110)
comment:14 by , 19 years ago
There is an open discussion on wether we switch to full unicode - so this patch won't be needed. Either this or a full unicodefication should go in.
comment:15 by , 19 years ago
Hugo, you said in this very ticket earlier that unicodization is out of scope of it. And this it's true because that thing is huge. Then I made this small patch that completely solves one specific problem that exists today (people already do step on it as seen in django-users). Commiting it won't hurt anyone and will make the world a little better until the bright unicode future comes. Isn't it? :-)
comment:16 by , 19 years ago
Actually unicodefication is out of the scope of this _ticket_, but not so huge as not to do it :-)
At the moment unicodefication is under analysis - if the result is that we will do it, it will definitely be something in the near future. So it wouldn't make sense to apply this to later rip it out again. If the result is to not do it, I am all for adding this as an alternative. But please let us first analyse wether to do unicodefication or not, and if doing it, on the time frame in which to do it.
comment:17 by , 19 years ago
Unicodification is a Good Thing and even Python-3000 is going to make all strings unicode. Better do it now guys, or you are risking to get buried in bug reports some day. =)
comment:18 by , 18 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Closing in favor of #2489.
Patch: operate on unicode strings