Opened 11 years ago

Last modified 9 months ago

#20757 new New feature

A more Object-Oriented URLResolver

Reported by: Flavio Curella Owned by:
Component: Core (URLs) Version: dev
Severity: Normal Keywords:
Cc: flavio.curella@…, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, the RegexURLResolver is populated using tuples.

Because of that, code that tries to inspect the resolver will end up containg awkward indexing like this (semplified code for clarity's sake):

def get_urlformat(urlname):
    """
    Given a URL name, returns the URL as a string suitable for string.format.

    Example::

    urlpatterns = patterns('',
        url(r'^extra/(?P<extra>\w+)/$', empty_view, name="named-url2"),
    )

    >>> get_urlformat('named-url2')
    'extra/%(extra)s/'
    """

    resolver = get_resolver()
    return resolver.reverse_dict[urlname][0][0][0]

My proposal is to replace tuples with a tuple-like object whose elements can be accessed by using attribute names. That way, the above method could become:

def get_urlformat(urlname):
    """
    Given a URL name, returns the URL as a string suitable for string.format.

    Example::

    urlpatterns = patterns('',
        url(r'^extra/(?P<extra>\w+)/$', empty_view, name="named-url2"),
    )

    >>> get_urlformat('named-url2')
    'extra/%(extra)s/'
    """

    resolver = get_resolver()
    urlbit = resolver.reverse_dict[urlname].urlbits[0]
    return urlbit.format

I realize this is mostly aesthetic, and there could be performance implications since the URLResolver is probably the most hit codepath, so I appreciate every kind of opinion.

The attached patch is still a draft, it definitely needs more extensive test coverage, but it's a start to get the idea.

Attachments (4)

patch_commit_44e7837a1706.patch (6.9 KB ) - added by Flavio Curella 11 years ago.
patch_commit_8a8468f33b16.patch (7.0 KB ) - added by Flavio Curella 11 years ago.
handle IndexError correctly
patch_commit_86a42e712a0e.patch (7.1 KB ) - added by Flavio Curella 11 years ago.
patch_commit_f8ce5686938b.patch (4.8 KB ) - added by Flavio Curella 11 years ago.
use namedtuples

Download all attachments as: .zip

Change History (14)

by Flavio Curella, 11 years ago

comment:1 by Flavio Curella, 11 years ago

Cc: flavio.curella@… added
Component: UncategorizedCore (URLs)
Type: UncategorizedNew feature

comment:2 by Flavio Curella, 11 years ago

Needs tests: set

comment:3 by Flavio Curella, 11 years ago

Patch needs improvement: set

by Flavio Curella, 11 years ago

handle IndexError correctly

by Flavio Curella, 11 years ago

comment:4 by Baptiste Mispelon, 11 years ago

Triage Stage: UnreviewedAccepted

Hi,

Python already has a namedtuple class in its collections module which is what you should be using for this case [1].
In theory, a namedtuple is a drop-in replacement for a regular tuple so backwards-compatibility should be assured.

However, I'm worried about the performance cost of such a replacement: as you noted, RegexURLResolver is a pretty critical component of django.

From a quick search, there's nothing in our documentation that explains how to extend the URL resolver (or why you'd want to).
I've also never done it myself either so I don't see exactly what we'd gain with this change.

So, to sum up, I'm -0 on the idea of using namedtuples, provided there's no major performance hit.

On the other hand, I'd definitely be +1 on documenting the process of extending the URL resolver and its machinery.

[1] http://docs.python.org/2/library/collections.html?highlight=collections.namedtuple#collections.namedtuple

by Flavio Curella, 11 years ago

use namedtuples

comment:5 by Flavio Curella, 11 years ago

I've updated the patch to use namedtuples and rebased against 7523e784633b7757fbc82df58f80b197eeed988a.

Re performance: according to the Python docs 'Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples.'.

I'd still like to run some profiling to have a better idea of the impact of this change. Memory consumption might be the same, but instantiation could have an overhead.

comment:6 by Ulrich Petri, 10 years ago

I've done a quick benchmark. The code and raw results can be found here: https://gist.github.com/ulope/a63aff42f51b08181f72#file-results-txt

Here is a table of the results (n=1000000):

Python operation datastructure time
CPy 2.6 instantiate tuple 0.243867874146
instantiate namedtuple 0.791953086853
access tuple 0.342396020889
access namedtuple 1.09635186195
CPy 2.7 instantiate tuple 0.300357103348
instantiate namedtuple 0.817892074585
access tuple 0.397746801376
access namedtuple 1.12735199928
CPy 3.3 instantiate tuple 0.19853064499329776
instantiate namedtuple 0.8839332649949938
access tuple 0.2663196460052859
access namedtuple 1.1236915799963754
CPy 3.4 instantiate tuple 0.2100249359937152
instantiate namedtuple 0.707535210007336
access tuple 0.2940493019996211
access namedtuple 0.9200455860118382
PyPy 2.3 instantiate tuple 0.00525689125061
instantiate namedtuple 0.00653004646301
access tuple 0.00449419021606
access namedtuple 0.00840997695923

From this it seems that namedtuple indeed is noticeably more expensive than a plain tuple (except, unsurprisingly, on PyPy). However as the absolute times for namedtuple are still quite short even at one million iterations I think the impact on Django's URL resolvers should be negligible.

comment:7 by Marten Kenbeek, 10 years ago

Owner: changed from nobody to Marten Kenbeek
Status: newassigned

comment:8 by Marten Kenbeek, 10 years ago

Has patch: unset
Needs tests: unset
Patch needs improvement: unset

comment:9 by Mariusz Felisiak, 10 months ago

Owner: Marten Kenbeek removed
Status: assignednew

comment:10 by Ülgen Sarıkavak, 9 months ago

Cc: Ülgen Sarıkavak added
Note: See TracTickets for help on using tickets.
Back to Top