Opened 14 years ago
Closed 4 years ago
#14317 closed Bug (fixed)
numberformat.format produces wrong results
Reported by: | Anssi Kääriäinen | Owned by: | Bartosz Grabski |
---|---|---|---|
Component: | Internationalization | Version: | dev |
Severity: | Normal | Keywords: | Localization, number formatting |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The problem:
>>> from django.utils.numberformat import format >>> print format(0.000000001, ',', 2) 1e-10,2
It can be fixed with using '%.2f' % number (the format string can be constructed dynamically), and then replacing the decimal separator and doing the grouping if needed. As a bonus, it is faster that way.
Attachments (3)
Change History (20)
by , 14 years ago
Attachment: | ticket14317.diff added |
---|
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Needs tests: | set |
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Design decision needed |
The proposed patch above does not work correctly. Decimal(10100) is formatted in floating point precision.
Formatting every type passed in in a way that always results in correct formatting is messy, and will perform poorly. For example large float passed in needs to either be casted to decimal, and after that back to string, or the exponent form needs to be transformed directly to string, which is slow and error prone. To make things worse, the formatting will change depending on underlying C library (http://docs.python.org/library/functions.html#float).
So, to solve this ticket, one possibility is just check that if unicode(number) has e in it, return it directly. This way everything else will work, giving right result, except for large floats (will return 1.2e+100), small floats (will return 1.2e-100), or small decimals (again will return 1.2e-100). It is possible to fix the small numbers by using '%.(decimal_pos)f', but this will do rounding. This means that decimals and floats are rounded, while strings are not. Still, it is not wise to use '%.(decimal_pos)f' when handling large numbers (Decimal(10100) or large numbers passed in as strings) will return the number in floating point precision.
As said, it is possible to get almost any case to work if enough special casing is done. But it will require much code and will probably be much slower than the current code, at least for some cases.
I would say the best solution at least for now is to return the number in exponent format if that is what unicode(number) returns. Then document that this is how things work. This is backwards incompatible as the expectation is currently that any number is formatted in non-exponent format. However, this is not how things actually work. Still one possibility is to cast everything as float and if the result has e in it, return it directly. After that continue as now. This way number formatting is most consistent...
Some code to show the problems:
from decimal import Decimal as d >>> str(10000000000000000000000000000000000000.0) '1e+37' # Wrong result. >>> str(10000000000000000000000000000000000000) '10000000000000000000000000000000000000' # ints work correctly >>> str(d('10000000000000000000000000000000000000.000000000001')) '10000000000000000000000000000000000000.000000000001' # as do decimals >>> '%.2f' % d('10000000000000000000000000000000000000.000000000001') '9999999999999999538762658202121142272.00' # formatting large decimals using '%.nf' is bad idea. >>> str(d('0.00000000000000000000000000001')) '1E-29' # small decimals do not work correctly >>> '%.2f' % d('0.00000000000000000000000000001') '0.00' # for small decimals this is a nice way >>> '%.2f' % d('0.666') '0.67' # and we get rounding >>> str(d('0.666')) '0.666' # but if we want convert this string to 2 decimal_pos format, it will be messy with rounding >>> float('inf') 'inf' # just a reminder that there are still more possibilities...
Still one more thing: when considering alternatives, it is good to remember that this is very performance sensitive area. After all the standard template rendering benchmark is to render large table of numbers, and for a reason too.
comment:3 by , 14 years ago
I can’t duplicate this error. I've used Django vr.1.3 and vr.1.2 to achieve the same error, but in all cases I’ve drawn correct results.
>>> from django.utils.numberformat import format >>> print format(0.000000001, ',', 2) 1e-10,00
comment:4 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:7 by , 13 years ago
Easy pickings: | set |
---|---|
Triage Stage: | Design decision needed → Accepted |
I am marking this accepted as there is a definite bug:
In [1]: from django.utils.numberformat import format In [2]: format(0.00000000000099, ',', 2) Out[2]: u'9,9e'
An idea for solving this ticket without too much complications: large numbers are returned in exponent format if str(number) contains 'e' in it - this is, floats get returned in exponent format, other numbers in decimal format. Small numbers are truncated to the decimal places (rounding happens except for strings). So, you would get something like this
> format('0.666', ',', 2) OUT: 0,66 > format(0.666, ',', 2) OUT: 0,67 > format(decimal(1e37), ',', 2) OUT: 10000000000000000000000000000000000000,00 > format(float(1e37), ',', 2) OUT: 1e37 > format(float(1e-37), ',', 2) OUT: 0,00 > format(decimal(1e-37), ',', 2) OUT: 0,00
I think the above might be straightforward to code and would get rid of the worst inconsistencies. Comments?
If somebody wants a somewhat easy ticket to work on, I am willing to let go of my ownership of this ticket...
comment:8 by , 12 years ago
Owner: | changed from | to
---|
I will let go of this ticket, I currently have no time to work on this issue.
comment:9 by , 12 years ago
Owner: | changed from | to
---|
I have created a pull request for this here. This pull request does not treat '0.666' and 0.666 differently. i.e. they are both formatted as 0.67. Please review and let me know.
Thanks!
comment:10 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:11 by , 12 years ago
Version: | 1.2 → 1.5 |
---|
comment:12 by , 12 years ago
Owner: | changed from | to
---|
by , 12 years ago
Attachment: | diff_14317_rvarshney.patch added |
---|
by , 12 years ago
Attachment: | ticket14317.2.diff added |
---|
comment:14 by , 11 years ago
Easy pickings: | unset |
---|---|
Needs tests: | unset |
Haven't reviewed the patch in detail, but the tests contain the u''
prefix on some strings which is a syntax error on Python 3.2. It also no longer applies cleanly.
comment:15 by , 6 years ago
Owner: | changed from | to
---|
comment:16 by , 6 years ago
Version: | 1.5 → master |
---|
comment:17 by , 4 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Resolution: | → fixed |
Status: | assigned → closed |
Cannot reproduce the original bug report in master. I believe two commits addressed this:
- the security patch 9cc6a60040b0f64f8ea066dd215176d4bd16621d
- the followup optimization for decimals smaller than (i.e. more digits than) 1e-200 in 614250d90c011fffb63e6cc460aa881a206b8d53
After this patch, the working of number_format is as follows:
Before this patch:
The main disadvantage of this patch is that large int, decimal and string numbers will be returned in exponent format, whereas before they were returned in correct format. Also, small numbers passed as strings when not using number_pos would be correctly returned. Now they are returned in exponent format.
So, all in all, the behavior is more consistent now, as any number passed in, no matter in what type, will be returned in same format. On the other hand, this means loss of correct formatting for some number types. If correct formatting for large numbers passed as strings or decimals is wanted, it gets a bit messy to achieve that and correct decimal_pos handling at the same time. It would at least require different formatting paths for strings and other types.
A simple fix for the bug described in the original report is to test if 'e' is in unicode(number) and if it is, return the e format directly...
Oh, BTW after this patch number formatting should be a bit faster (0% to almost 50% depending on what is being formatted). 1/3 of time is used in testing settings. (It seems a bit strange to me that number_format is testing any settings, but that is another matter.)