Opened 11 years ago

Closed 10 years ago

#22171 closed Cleanup/optimization (fixed)

sanitize_separators throws away too many thousand separators

Reported by: Klaas van Schelven Owned by: nobody
Component: Internationalization Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Users in countries that use symbols other than the period (e.g. comma) as a decimal separator, may still occasionally use the period as a decimal separator. (Because they've come to expect that computers want input with a period). If the period coincides with the THOUSANDS_SEPARATOR, and USE_THOUSANDS_SEPARATOR is turned on, this results in the decimal separator basically being thrown out, but still with a validated result.

Example shell output (THOUSANDS_SEPARATOR == '.'; USE_THOUSANDS_SEPARATOR = True)):

>>> from django.utils.translation import activate
>>> activate('nl-nl')
>>> 
>>> class F(forms.Form):
...   f = forms.DecimalField(max_digits=15, decimal_places=2)
... 
>>> f = F({'f': '10.10'})
>>> print f.is_valid()
True
>>> print f.cleaned_data['f']
1010

Why does this happen? django/utils/formats.py has a function sanitize_separators, that basically tosses all thousands separators even if they do not actually separate thousands

My solution would be to only toss thousands separators which have at least 3 digits on their right hand side.

Admittedly this is not a full solution: the value 100.000 is still ambiguous if we assume that users may mean either "thousands separator" or "decimal separator" with the period. But at least it does not silently convert 10.00 (which doesn't mean 1000 to anyone) into 1000. This solution also happens to cover all scenarios where cents are involved, which

A better solution would not validate "10.00" at all (since it does not contain either a decimal separator nor a thousands separator separating thousands). However, the current interface of sanitize_operators does not allow for it: it attempts to sanitize towards a string w/ period, and does not allow us to complain if the input is a string w/ period, but we wouldn't want to allow that as a valid input.

The/a solution for Django 1.4 is attached. I imagine it may be a bit more hairy in the up-to-date version, as this supports full unicode for the input, and I'm not entirely sure on the interaction of regex & unicode.

I've reproduced this in 1.4, but can see in the repo that as of today the code still experiences this problem.

Attachments (2)

thousands.diff (817 bytes ) - added by Klaas van Schelven 11 years ago.
22171-2.diff (1.8 KB ) - added by Claude Paroz 11 years ago.
Alternate patch

Download all attachments as: .zip

Change History (11)

by Klaas van Schelven, 11 years ago

Attachment: thousands.diff added

comment:1 by Klaas van Schelven, 11 years ago

... I cannot edit ...

Finishing the sentence above:

This solution also happens to cover all scenarios where cents are involved, which happens to be an important subclass of dealing with Decimals (Money!)

comment:2 by Claude Paroz, 11 years ago

Strange, I cannot reproduce your shell example. However, I can see that 12.210.10 is cleaned to 12.210.10, so no cleaning happens in that case. That last issue could be fixed by:

         if decimal_separator in value:
-            value, decimals = value.split(decimal_separator, 1)
+            value, decimals = value.rsplit(decimal_separator, 1)
             parts.append(decimals)

Can you check once more?

comment:3 by wim@…, 11 years ago

Hi Klaas,

I tried to reproduce the error and failed (Django 1.4.10 and Django 1.4.3). Although I recall having encountered such a weird situation before. Maybe my os was set to Dutch then?

Good idea to raise an error when invalid input has been entered.

comment:4 by Klaas van Schelven, 11 years ago

I forgot to include localize=True in my code example. A more complete shell example (that could form the basis for a unit test) is this one:

from django.conf import settings
from django.utils.translation import activate, get_language
from django import forms
from django.utils.formats import sanitize_separators

old_THOUSAND_SEPARATOR = settings.THOUSAND_SEPARATOR
old_USE_THOUSAND_SEPARATOR = settings.USE_THOUSAND_SEPARATOR
old_language = get_language()

settings.THOUSAND_SEPARATOR == '.'
settings.USE_THOUSAND_SEPARATOR = True

activate('nl-nl')

class F(forms.Form):
    f = forms.DecimalField(max_digits=15, decimal_places=2, localize=True)

f = F({'f': '10.10'})
f.is_valid() # True

assert f.cleaned_data['f'] != 1010
assert sanitize_separators('10.10') != '1010'

settings.THOUSAND_SEPARATOR = old_THOUSAND_SEPARATOR
settings.USE_THOUSAND_SEPARATOR = old_USE_THOUSAND_SEPARATOR
activate(old_language)

Both assertions fail for me on an out-of-the box installation of Django 1.6

comment:5 by Claude Paroz, 11 years ago

Component: UncategorizedInternationalization
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: 1.4master

Sorry, I misread part of the original summary.
So basically, with DECIMAL_SEPARATOR = ',' and THOUSAND_SEPARATOR = '.', you'd like that 10.10 still return 10.10 because the thousand separator is not in the right position. I see your point, but would like to limit this special casing only when THOUSAND_SEPARATOR == '.', because in some countries grouping is not necessarily by 3 (India, China), but for them the thousand separator is not the dot.

by Claude Paroz, 11 years ago

Attachment: 22171-2.diff added

Alternate patch

comment:6 by Claude Paroz, 11 years ago

Has patch: set

in reply to:  5 comment:7 by Klaas van Schelven, 11 years ago

Replying to claudep:

you'd like that 10.10 still return 10.10 because the thousand separator is not in the right position.

Strictly speaking a validation error would be even more correct, but I'd certainly consider returning 10.10 a major improvement over returning 1010.

comment:8 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

Claude, code looks fine to me if you think your approach is best. SQLite tests pass on Python 2.7 and 3.2.

comment:9 by Claude Paroz <claude@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 132d0e516e5881cb3f6445e9d84e01c9f2123254:

Fixed #22171 -- Improved sanitize_separators cleverness

Thanks Klaas van Schelven for the report and Tim Graham for the
review.

Note: See TracTickets for help on using tickets.
Back to Top