Context Navigation

← Previous Ticket
Next Ticket →

#21573 closed Cleanup/optimization (fixed)

Cache regular expression for normalizing newlines in django.utils.text.normalize_newlines

Reported by:	Vajrasky Kok	Owned by:	Vajrasky Kok
Component:	Utilities	Version:	dev
Severity:	Normal	Keywords:	performance
Cc:	sky.kok@…	Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	yes	UI/UX:	no

Description

This is the method:

def normalize_newlines(text):
    return force_text(re.sub(r'\r\n|\r|\n', '\n', text))
normalize_newlines = allow_lazy(normalize_newlines, six.text_type)

It should be:

_newlines_re = re.compile(r'\r\n|\r|\r')
def normalize_newlines(text):
    return force_text(_newlines_re.sub('\n', text))
normalize_newlines = allow_lazy(normalize_newlines, six.text_type)

Things to be considered:

Do we use this method often in Django? If not, caching may not be necessary.
From http://docs.python.org/2/library/re.html#re.compile: The compiled versions of the most recent patterns passed to re.match(), re.search() or re.compile() are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions.
The performance gain maybe negligible.

Change History (6)

comment:1 by Vajrasky Kok, 11 years ago

Cc:	sky.kok@… added
Has patch:	set
Owner:	changed from nobody to Vajrasky Kok
Status:	new → assigned

Here is the PR: https://github.com/django/django/pull/2046

The unit test for normalize_newlines can be found in #21572.

comment:2 by Baptiste Mispelon, 11 years ago

I ran a small benchmark [1] with four variants:

The original function
The same function but with a simpler regexp (r'\r\n|\r' instead of r'\r\n|\r|\n')
The original function with a compiled regexp
A function using str.replace instead of re.sub

I tried each variant on 3 different sizes of text: small (~20 lines), medium (~200 lines), and big (~2000 lines).

Here are the results (less is better):

	Small	Medium	Big
original	19.2 usec	141 usec	1.65 msec
simple_re	7.42 usec	14.8 usec	135 usec
compiled_re	6.15 usec	13.8 usec	146 usec
str_replace	5.29 usec	16.4 usec	130 usec

The str.replace looks like a clear winner to me so unless there are objections, I'll commit that one.

[1] https://gist.github.com/bmispelon/7841287

comment:3 by Baptiste Mispelon, 11 years ago

Triage Stage:	Unreviewed → Accepted

(Oops, forgot to change the status of the ticket)

comment:4 by Vajrasky Kok, 11 years ago

Objection. I got different result in my machine (32 bit) with Fedora Core 18.

On Python 3.4 (Yeah!!!!):

Function	Small	Medium	Big
original:	95 usec	625 usec	7.52 msec
simple_re:	33.6 usec	57 usec	281 usec
compiled_re:	27.2 usec	50.7 usec	273 usec
str_replace:	27.1 usec	51 usec	303 usec

But on Python 2.7, the compiled_re is still the winner:

Function	Small	Medium	Big
original:	34.3 usec	231 usec	2.48 msec
simple_re:	17.6 usec	49.6 usec	343 usec
compiled_re:	14.6 usec	46.5 usec	342 usec
str_replace:	14 usec	53.3 usec	414 usec

comment:5 by Baptiste Mispelon, 11 years ago

Triage Stage:	Accepted → Ready for checkin

Alright, let's go with compiled_re then (they're all in the same ballpark anyway).

comment:6 by Baptiste Mispelon <bmispelon@…>, 11 years ago

Resolution:	→ fixed
Status:	assigned → closed

In a7cf48a2b7878bf98879f2963ab9a7d0e6493d9a:

Fixed #21573 -- Improved performance of utils.text.normalize_newlines.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#21573 closed Cleanup/optimization (fixed)

Cache regular expression for normalizing newlines in django.utils.text.normalize_newlines

Description

Change History (6)

comment:1 by Vajrasky Kok, 11 years ago

comment:2 by Baptiste Mispelon, 11 years ago

comment:3 by Baptiste Mispelon, 11 years ago

comment:4 by Vajrasky Kok, 11 years ago

comment:5 by Baptiste Mispelon, 11 years ago

comment:6 by Baptiste Mispelon <bmispelon@…>, 11 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us