Opened 14 years ago

Closed 4 years ago

#14317 closed Bug (fixed)

numberformat.format produces wrong results

Reported by: Anssi Kääriäinen Owned by: Bartosz Grabski
Component: Internationalization Version: dev
Severity: Normal Keywords: Localization, number formatting
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The problem:

>>> from django.utils.numberformat import format
>>> print format(0.000000001, ',', 2)
1e-10,2

It can be fixed with using '%.2f' % number (the format string can be constructed dynamically), and then replacing the decimal separator and doing the grouping if needed. As a bonus, it is faster that way.

Attachments (3)

ticket14317.diff (4.7 KB ) - added by Anssi Kääriäinen 14 years ago.
diff_14317_rvarshney.patch (2.9 KB ) - added by lukenio 12 years ago.
copy of https://github.com/rvarshney/django/commit/fda8ffc71d69eeee1721e5f595fc48e8abaab84c
ticket14317.2.diff (5.4 KB ) - added by Florian Apolloner 12 years ago.

Download all attachments as: .zip

Change History (20)

by Anssi Kääriäinen, 14 years ago

Attachment: ticket14317.diff added

comment:1 by Anssi Kääriäinen, 14 years ago

After this patch, the working of number_format is as follows:

  • if given too large number, will return it in exponent format
  • if given too small a number and no decimal_pos, will return it in exponent format
  • if given decimal_pos, will return the number correctly rounded

Before this patch:

  • if given too large number, would return it in exponent format (possibly mangled with thousand separator), it the number was float, else in correct format.
  • if given too small number and number_pos, would return the number in 1e-10,2, which is a bug
  • if given too small number and no number_pos, would return the number in exponent format (possibly mangled with thousand separator)
  • never did any rounding.

The main disadvantage of this patch is that large int, decimal and string numbers will be returned in exponent format, whereas before they were returned in correct format. Also, small numbers passed as strings when not using number_pos would be correctly returned. Now they are returned in exponent format.

So, all in all, the behavior is more consistent now, as any number passed in, no matter in what type, will be returned in same format. On the other hand, this means loss of correct formatting for some number types. If correct formatting for large numbers passed as strings or decimals is wanted, it gets a bit messy to achieve that and correct decimal_pos handling at the same time. It would at least require different formatting paths for strings and other types.

A simple fix for the bug described in the original report is to test if 'e' is in unicode(number) and if it is, return the e format directly...

Oh, BTW after this patch number formatting should be a bit faster (0% to almost 50% depending on what is being formatted). 1/3 of time is used in testing settings. (It seems a bit strange to me that number_format is testing any settings, but that is another matter.)

comment:2 by Anssi Kääriäinen, 14 years ago

Has patch: set
Needs documentation: set
Needs tests: set
Patch needs improvement: set
Triage Stage: UnreviewedDesign decision needed

The proposed patch above does not work correctly. Decimal(10100) is formatted in floating point precision.

Formatting every type passed in in a way that always results in correct formatting is messy, and will perform poorly. For example large float passed in needs to either be casted to decimal, and after that back to string, or the exponent form needs to be transformed directly to string, which is slow and error prone. To make things worse, the formatting will change depending on underlying C library (http://docs.python.org/library/functions.html#float).

So, to solve this ticket, one possibility is just check that if unicode(number) has e in it, return it directly. This way everything else will work, giving right result, except for large floats (will return 1.2e+100), small floats (will return 1.2e-100), or small decimals (again will return 1.2e-100). It is possible to fix the small numbers by using '%.(decimal_pos)f', but this will do rounding. This means that decimals and floats are rounded, while strings are not. Still, it is not wise to use '%.(decimal_pos)f' when handling large numbers (Decimal(10100) or large numbers passed in as strings) will return the number in floating point precision.

As said, it is possible to get almost any case to work if enough special casing is done. But it will require much code and will probably be much slower than the current code, at least for some cases.

I would say the best solution at least for now is to return the number in exponent format if that is what unicode(number) returns. Then document that this is how things work. This is backwards incompatible as the expectation is currently that any number is formatted in non-exponent format. However, this is not how things actually work. Still one possibility is to cast everything as float and if the result has e in it, return it directly. After that continue as now. This way number formatting is most consistent...

Some code to show the problems:

from decimal import Decimal as d
>>> str(10000000000000000000000000000000000000.0)
'1e+37' # Wrong result.
>>> str(10000000000000000000000000000000000000)
'10000000000000000000000000000000000000' # ints work correctly
>>> str(d('10000000000000000000000000000000000000.000000000001'))
'10000000000000000000000000000000000000.000000000001' # as do decimals
>>> '%.2f' % d('10000000000000000000000000000000000000.000000000001')
'9999999999999999538762658202121142272.00' # formatting large decimals using '%.nf' is bad idea.
>>> str(d('0.00000000000000000000000000001'))
'1E-29' # small decimals do not work correctly
>>> '%.2f' % d('0.00000000000000000000000000001')
'0.00' # for small decimals this is a nice way
>>> '%.2f' % d('0.666')
'0.67' # and we get rounding
>>> str(d('0.666'))
'0.666' # but if we want convert this string to 2 decimal_pos format, it will be messy with rounding
>>> float('inf')
'inf' # just a reminder that there are still more possibilities...

Still one more thing: when considering alternatives, it is good to remember that this is very performance sensitive area. After all the standard template rendering benchmark is to render large table of numbers, and for a reason too.

comment:3 by _roman, 14 years ago

I can’t duplicate this error. I've used Django vr.1.3 and vr.1.2 to achieve the same error, but in all cases I’ve drawn correct results.

>>> from django.utils.numberformat import format
>>> print format(0.000000001, ',', 2)
1e-10,00

comment:4 by Julien Phalip, 14 years ago

Severity: Normal
Type: Bug

comment:5 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:6 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:7 by Anssi Kääriäinen, 13 years ago

Easy pickings: set
Triage Stage: Design decision neededAccepted

I am marking this accepted as there is a definite bug:

In [1]: from django.utils.numberformat import format

In [2]: format(0.00000000000099, ',', 2)
Out[2]: u'9,9e'

An idea for solving this ticket without too much complications: large numbers are returned in exponent format if str(number) contains 'e' in it - this is, floats get returned in exponent format, other numbers in decimal format. Small numbers are truncated to the decimal places (rounding happens except for strings). So, you would get something like this

> format('0.666', ',', 2)
OUT: 0,66
> format(0.666, ',', 2)
OUT: 0,67
> format(decimal(1e37), ',', 2)
OUT: 10000000000000000000000000000000000000,00
> format(float(1e37), ',', 2)
OUT: 1e37
> format(float(1e-37), ',', 2)
OUT: 0,00
> format(decimal(1e-37), ',', 2)
OUT: 0,00

I think the above might be straightforward to code and would get rid of the worst inconsistencies. Comments?

If somebody wants a somewhat easy ticket to work on, I am willing to let go of my ownership of this ticket...

comment:8 by anonymous, 12 years ago

Owner: changed from Anssi Kääriäinen to anonymous

I will let go of this ticket, I currently have no time to work on this issue.

comment:9 by varshney.ruchi@…, 12 years ago

Owner: changed from anonymous to rvarshney

I have created a pull request for this here. This pull request does not treat '0.666' and 0.666 differently. i.e. they are both formatted as 0.67. Please review and let me know.
Thanks!

https://github.com/django/django/pull/348

comment:10 by lukenio, 12 years ago

Owner: changed from rvarshney to anonymous
Status: newassigned

comment:11 by lukenio, 12 years ago

Version: 1.21.5

comment:12 by lukenio, 12 years ago

Owner: changed from anonymous to lukenio

by Florian Apolloner, 12 years ago

Attachment: ticket14317.2.diff added

comment:13 by Florian Apolloner, 12 years ago

Attached a mergeable version (including the tests) from the PR.

comment:14 by Tim Graham, 11 years ago

Easy pickings: unset
Needs tests: unset

Haven't reviewed the patch in detail, but the tests contain the u'' prefix on some strings which is a syntax error on Python 3.2. It also no longer applies cleanly.

comment:15 by Bartosz Grabski, 6 years ago

Owner: changed from lukenio to Bartosz Grabski

comment:16 by Bartosz Grabski, 6 years ago

Version: 1.5master

comment:17 by Jacob Walls, 4 years ago

Needs documentation: unset
Patch needs improvement: unset
Resolution: fixed
Status: assignedclosed

Cannot reproduce the original bug report in master. I believe two commits addressed this:

Note: See TracTickets for help on using tickets.
Back to Top