Opened 9 years ago

Last modified 8 months ago

#25762 new Cleanup/optimization

Optimize numberformat.format

Reported by: Jaap Roes Owned by:
Component: Utilities Version: dev
Severity: Normal Keywords:
Cc: Johannes Maron, Sarah Boyce, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

numberformat.format is used a lot by Django and in most cases it's called with a number. Yet the current implementation is pretty much built for strings.

I've refactored the function to be up to 4x faster on my machine (and in the worst case it performs roughly the same). Attached is the benchmark script I've used.

Attachments (1)

bench_numberformat.py (2.6 KB ) - added by Jaap Roes 9 years ago.
Benchmark for numberformat.format

Download all attachments as: .zip

Change History (17)

by Jaap Roes, 9 years ago

Attachment: bench_numberformat.py added

Benchmark for numberformat.format

comment:1 by Tim Graham, 9 years ago

Triage Stage: UnreviewedAccepted

I'll mention djangobench in case you want to adapt any of your testing scripts so they can live on.

comment:2 by Jaap Roes, 9 years ago

Ah nice, I'll look into adding some benchmarks to djangobench

comment:3 by Jaap Roes, 9 years ago

By the way, while refactoring this code I was wondering if supporting NUMBER_GROUPING other than 3 is even useful. None of the locales shipped with Django define anything other than 3, and the other related configuration options (THOUSAND_SEPARATOR and USE_THOUSAND_SEPARATOR) all refer to grouping by 3 anyway.

It would also be nice if number formatting on strings was deprecated. It seems that Django already tries to call it with numbers in the places it's used, and there's no real check if the strings that are passed into the function are even number like.

comment:4 by Tim Graham, 9 years ago

Patch needs improvement: set

Left comments for improvement on the pull request. Maybe you'd like to raise the other questions on the DevelopersMailingList. I'm not sure about the answers myself.

comment:5 by Tim McCurrach, 5 years ago

Owner: changed from nobody to Tim McCurrach
Status: newassigned

comment:6 by Tim McCurrach, 5 years ago

The approach used in the linked PR (https://github.com/django/django/pull/5668/files) actually has a problem. The following approach is used to avoid the rounding behaviour of the builtin format function:

                # If `number` is 10.888 and `decimal_pos` is 2 return 10.88 not 10.89.
                format_string = '.%sf' % (decimal_pos + 1)

and the string is then truncated to decimal_pos decimals places. However, for a number such as 10.999, this will still be rounded 11.00. This is inconsistent with the behaviour of the rest of the function, which truncates and doesn't round.

We therefore can't use the bultin format to do any kind of decimal formatting as there doesn't appear (as it currently stands) to be an option to truncate with it, however the speed advantages it offers can still be used for the common case of seperating a number into thousands.

Using the test cases in bench_number_format.py linked above, an improved function performs anywhere between 2x and 5x faster for most cases, and again in the worst case is roughly the same (fluctuating between +2% and -2% compared with the original function).

I'll add a new PR.

comment:7 by Johannes Maron, 5 years ago

Cc: Johannes Maron added

comment:8 by Johannes Maron, 4 years ago

Patch needs improvement: unset

comment:9 by Johannes Maron, 4 years ago

Needs tests: set
Patch needs improvement: set

comment:10 by Jacob Walls, 4 years ago

New PR

comment:11 by Jacob Walls, 4 years ago

Needs tests: unset

comment:12 by Jacob Walls, 4 years ago

Patch needs improvement: unset

comment:13 by Mariusz Felisiak, 3 years ago

Patch needs improvement: set

comment:14 by Tim McCurrach, 18 months ago

Owner: Tim McCurrach removed
Status: assignednew

comment:15 by Sarah Boyce, 16 months ago

Cc: Sarah Boyce added

comment:16 by Ülgen Sarıkavak, 8 months ago

Cc: Ülgen Sarıkavak added
Note: See TracTickets for help on using tickets.
Back to Top