#2977 closed enhancement (fixed)
Better handling of regular expressions for reverse urlresolver
Reported by: | Chris Beaven | Owned by: | Malcolm Tredinnick |
---|---|---|---|
Component: | Core (Other) | Version: | |
Severity: | normal | Keywords: | url reverse |
Cc: | albertpeschar+djangotrac@…, mattimustang@…, dcwatson@…, alexander.solovyov@…, daybreaker12@…, wbyoung@…, django@…, eric@…, beau@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I spent a bit of time looking over the urlresolvers.py today. I put together a patch fixing some problems with it.
Firstly, regular expressions should be set up outside the function so they only have to be compiled once (fix 1).
A TODO:
was to make the resolver handle recursive parenthesis. After a bit of thinking (and actually making it work like that) I realised that is actually not what you'd need to do. Recursion is not necessary because the outermost matched parenthesis will be replaced with the given argument anyway.
What was needed was a rethink of choosing between named/unnamed groups. It should be named groups only (if any exist) for each whole regular expression (fix 2).
It does need to handle non-grouping parenthesis recursively though (fix 3).
Other expression extension notations (apart from named groups, of course) should be ignored (fix 4).
Probably rarely needed, but it now also handles pipes in regular expressions (fix 5).
Attachments (6)
Change History (36)
by , 18 years ago
Attachment: | reverse_urlresolver.patch added |
---|
by , 18 years ago
Attachment: | reverse_urlresolver.2.patch added |
---|
comment:1 by , 18 years ago
This looks good, but I'm having trouble figuring out exactly how some of it works. Could you perhaps add some tests that illustrate how the current resolver fails? The right place for those is source:django/trunk/tests/urlpatterns_reverse/
comment:2 by , 18 years ago
Er, that should be: source:django/trunk/tests/regressiontests/urlpatterns_reverse
by , 18 years ago
Attachment: | new_reverse_urlresolver.patch added |
---|
complete rewrite of most of the reverse code!
comment:3 by , 18 years ago
All right, my previous implementation had a few fatal flaws I realised. So I have rewritten everything.
A major change is that the RegexURLPattern objects get a compiled matcher with them, which will greatly speed up reverse lookups.
The rewrite was mostly necessary to correctly handle nested parenthesis, which it now does using a basic tokenizer. There is a small change in logic, which was a bug in the previous one: if a RE has both named and non-named groups, only the named groups should be parsed (this is in line with how the forward lookups work and what documentation says).
A few more tests have been added.
comment:4 by , 18 years ago
Also, the original code (Adrian's) handled argument passing incorrectly when used with a URLResolver. The resolver is should check both itself and its children using one set of arguments.
comment:5 by , 18 years ago
Hi, i just looked briefly on the patch. You should use a helper function like adrian did, currently the regexp will be parsed, even if it is never used.
comment:6 by , 18 years ago
Thanks for the advice.
The difference is that the parsed REs inside of RegexURLPattern
s are cached, and I'm mostly sure that the RegexURLPattern
s themselves are cached, so everyone wins.
comment:7 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:8 by , 18 years ago
Patch needs improvement: | set |
---|
Should work with a mix of named and unnamed arguments
comment:9 by , 18 years ago
Patch needs improvement: | unset |
---|
Actually, it shouldn't. The current behaviour of reverse
is a bug since forward URL resolving doesn't accept both positional and named args (it drops the positional ones)
comment:10 by , 18 years ago
This ticket is a good example of why separate issues should be in separate bugs. You are trying to do many things at once and it makes it hard to split out the good from the not so good and discuss each issue in isolation. That being said, let's work with what you've done (since it's an impressive effort):
Fixes 1, 3 and 5 are all fine. They're just small cleanups. That can be a single patch.
Fix 2 is a design decision and I think you're making the wrong decision. Allowing keyword and positional args to mix (not nest, but be in the same reg-exp) is fine.
Fix 4, I need to think about some more.
I'll commit portions and make a decision on the rest in the next couple of days. No need to write a new patch, I can see what you're doing in the current code and it will be easy to tease it apart.
comment:11 by , 18 years ago
Yea, I should have written separate patches. Next time ;)
Re fix 2, I'm probably blowing hot air, you already said it's going stay as-is in the other ticket but I'm still not sure why you think mixing keyword and positional arguments is useful if it doesn't work for a forward resolution.
In the form of a question: Why should a reverse
match a RegexURLPattern
which won't call its view the same arguments which reverse
received?
Fix 4 is because I didn't want to tackle the difficult types of extension notation, notably lookahead/behind assertions and named backreferences. Note, the patch also handles non-grouping parenthesis.
comment:13 by , 17 years ago
Cc: | added |
---|
comment:14 by , 17 years ago
I'm not sure that you're sufficiently sneaky when removing special characters from URLs...
For example, you're using the regexes
{{{re_unused = re.compile(r'(?<\\)[$?*+()]')
re_special = re.compile(r'
([.+*()$])') # Characters from the IETF URL standard, RFC 1738.
}}}
I'm worried that re_unused
won't match the $
in r'\\$'
even though it should. Also, re_special
will match \.
in r'\\.'
even though it shouldn't.
Also, you don't translate unsafe characters from regexes back to normal strings. That might be the correct behavior--I'm not sure what Django/Apache does with a URL like '/words%20with%20space'
. If %20
becomes a space before it enters the url lookup system, then urls with unsafe characters are possible and we should be able to reverse them (making sure that they get encoded before they're actually included in a webpage). If it's kept in its encoded form, then we shouldn't have to worry about regexes with unsafe characters.
I just created ticket #4594 with my patch. Let's figure out how to deal with these corner cases and maybe we can pull out the part of your code that deals with this so Malcolm will finally get around to applying the various parts. :-)
by , 17 years ago
Attachment: | new_reverse_urlresolver.2.patch added |
---|
comment:15 by , 17 years ago
I caved in and undid fix 2 (but made it so that if there is a named group inside of a non-named group then it'll ignore the non-named group).
The slashes should be much more solid.
Unsafe characters from regex are unencoded nicely.
by , 17 years ago
Attachment: | new_reverse_urlresolver.3.patch added |
---|
comment:16 by , 17 years ago
Unsafe characters from regex are now only unencoded from sections not being replaced by arguments
I handled the somewhat rare case of non-argument sections containing character sets ([aeiou]
). It just chooses the first character in there. Perhaps overkill, but I do know that some people prefer to write their special characters this way rather than escaping them (eg [$]
rather than \$
).
comment:17 by , 17 years ago
Owner: | changed from | to
---|---|
Summary: | [patch] Better handling of regular expressions for reverse urlresolver → Better handling of regular expressions for reverse urlresolver |
Triage Stage: | Design decision needed → Accepted |
Okay, this is on my review list now. Maybe avoid making nay major changes for a couple of days so that I have a chance to read through it without changes happening underneath. Will let you know if I find anything disturbing.
comment:18 by , 17 years ago
Cc: | added |
---|
comment:19 by , 17 years ago
Cc: | added |
---|
comment:20 by , 17 years ago
Cc: | added |
---|
comment:21 by , 17 years ago
Cc: | added |
---|
comment:22 by , 17 years ago
Cc: | added |
---|---|
Patch needs improvement: | set |
this patch does not work for me. Even after manually applying it to latest trunk it fails with the following complex regexp:
IP4_RE = r'(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}' IP6_HEX_RE = r'(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}' IP6_HEX_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)' IP6_IP4_COMPAT_RE = r'((?:[0-9A-Fa-f]{1,4}:){6,6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}' IP6_IP4_COMPAT_COMPRESSED_RE = r'((?:[0-9A-Fa-f]{1,4}(?::[0-9A-Fa-f]{1,4})*)?)::((?:[0-9A-Fa-f]{1,4}:)*)(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}' IP6_RE = '%s|%s|%s|%s' % (IP6_HEX_RE, IP6_HEX_COMPRESSED_RE, IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE) IP6_RE_COMPILED = re.compile('^(%s|%s|%s|%s)$' % (IP6_HEX_RE, IP6_HEX_COMPRESSED_RE, IP6_IP4_COMPAT_RE, IP6_IP4_COMPAT_COMPRESSED_RE)) IP_RE = "(%s|%s)" % (IP4_RE, IP6_RE) urlpatterns = patterns('', url(r'^(?P<ipsupernetwork>%s)/$' % IP_RE, ipnetwork_list, IPNETWORK_LIST, name='ipnetwork_list'), )
Running django.core.urlresolvers.reverse results in:
In [6]: reverse('ipnetwork_list', urlconf='nong.ip.urls',args=(), kwargs={'ipsupernetwork': '10.0.0.0'}) --------------------------------------------------------------------------- sre_constants.error Traceback (most recent call last) /srv/dev/mpf/nong/third-party/django/tests/<ipython console> /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(viewname, urlconf, args, kwargs) 399 def reverse(viewname, urlconf=None, args=None, kwargs=None): 400 args = args or [] 401 kwargs = kwargs or {} --> 402 return iri_to_uri(u'/' + get_resolver(urlconf).reverse(viewname, *args, **kwargs)) 403 /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in reverse(self, lookup_view, *args, **kwargs) 383 except (ImportError, AttributeError): 384 raise NoReverseMatch --> 385 if lookup_view in self.reverse_dict: 386 return u''.join([reverse_helper(part.regex, *args, **kwargs) for part in self.reverse_dict[lookup_view]]) 387 raise NoReverseMatch /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_reverse_dict(self) 317 318 def _get_reverse_dict(self): --> 319 if not self._reverse_dict and hasattr(self.urlconf_module, 'urlpatterns'): 320 for pattern in reversed(self.urlconf_module.urlpatterns): 321 if isinstance(pattern, RegexURLResolver): /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in _get_urlconf_module(self) 353 except AttributeError: 354 try: --> 355 self._urlconf_module = __import__(self.urlconf_name, {}, {}, ['']) 356 except ValueError, e: 357 # Invalid urlconf_name, such as "foo.bar." (note trailing period) /srv/dev/mpf/nong/third-party/django/tests/nong/ip/urls.py 13 (r'^url_tag/', include('regressiontests.templates.urls')), 14 15 # django built-in views 16 (r'^views/', include('regressiontests.views.urls')), 17 ) /srv/dev/mpf/nong/third-party/django/django/conf/urls/defaults.py in url(regex, view, kwargs, name, prefix) 28 raise ImproperlyConfigured('Empty URL pattern view name not permitted (for pattern %r)' % regex) 29 if prefix: 30 view = prefix + '.' + view ---> 31 return RegexURLPattern(regex, view, kwargs, name) 32 /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, regex, callback, default_args, name) 239 # callable object (view). 240 self.regex = re.compile(regex, re.UNICODE) --> 241 self.reverse_regex_lookup = ReverseRegexLookup(regex) 242 if callable(callback): 243 self._callback = callback /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in __init__(self, text) 194 class ReverseRegexLookup(object): 195 def __init__(self, text): --> 196 self.tokens, self.minimum_arguments, self.error = tokenize(text) 197 198 def check(self, args=[], kwargs={}): /srv/dev/mpf/nong/third-party/django/django/core/urlresolvers.py in tokenize(text) 179 if isinstance(bit, list): 180 # Build the regex here so it only has to be compiled once. --> 181 bit = re.compile('%s$' % build_re(bit)) 182 count += 1 183 else: /usr/local/lib/python2.4/sre.py in compile(pattern, flags) 178 def compile(pattern, flags=0): 179 "Compile a regular expression pattern, returning a pattern object." --> 180 return _compile(pattern, flags) 181 182 def purge(): /usr/local/lib/python2.4/sre.py in _compile(*key) 225 p = sre_compile.compile(pattern, flags) 226 except error, v: --> 227 raise error, v # invalid expression 228 if len(_cache) >= _MAXCACHE: 229 _cache.clear() error: multiple repeat
by , 17 years ago
Attachment: | django-plus-sign-fix.patch added |
---|
Fix for problem with plus signs (see #4883) and dots (tried to fix also dollar signs and '' but didn't work--don't know why). Generated with svn diff.
comment:23 by , 17 years ago
Owner: | changed from | to
---|
comment:24 by , 17 years ago
Keywords: | url reverse added |
---|
comment:25 by , 17 years ago
Cc: | added |
---|
comment:26 by , 17 years ago
I started to solve this problems, too. See #6934. But this patch seems to be more complete. I would like to see it merged into SVN, but unfortunately I can not apply even the patch against the current HEAD.
comment:27 by , 17 years ago
Cc: | added |
---|
comment:28 by , 17 years ago
Cc: | added |
---|
I hit this the other day too; looking forward to this being applied! :)
comment:29 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [8760]) A rewrite of the reverse URL parsing: the reverse() call and the "url" template tag.
This is fully backwards compatible, but it fixes a bunch of little bugs. Thanks
to SmileyChris and Ilya Semenov for some early patches in this area that were
incorporated into this change.
comment:30 by , 16 years ago
I intentionally didn't take the pipe-handling portion of this patch. That's actually a lot trickier than it looks when the alternatives contain different parameter values. The infrastructure is more or less in place to handle that, but I was having a lot of trouble making it stable, so somebody can address that after 1.0.
small fix