#3553 closed (wontfix)
urlpatterns uses urllib.quote/unquote should use urllib.quote/unquote_plus
Reported by: | Owned by: | Graham King | |
---|---|---|---|
Component: | Core (Other) | Version: | 0.95 |
Severity: | Keywords: | sprintsept14 | |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When using the + character to represent spaces in urls it is impossible to differentiate between a literal + and an escaped %2B in the url.
Example
urlpatterns = (r'/search/(?<query>.*)$,app.search)
def search(request, query): print "query: "+query
Observed
Visiting http://example.com/search/foo%2B+bar causes django to print foo++bar
Expected
Visitin http://example.com/search/foo%2B+bar should cause django to print foo+ bar
Solution
I assume that django is using urllib.quote/unqoute Changing it to use urllib.quote_plus and urllib.unqoute_plus would produce the expected behavior and not have any side affects
Attachments (1)
Change History (13)
comment:1 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 18 years ago
probably the best method is to have a init param default to urllib.un/quote and be able to specify any func... (excuse the misterminology python newb)
comment:3 by , 17 years ago
Component: | Uncategorized → Core framework |
---|---|
Keywords: | sprintsept14 added |
I believe all of this changed with the Unicode merge (since various helper functions had to be written to handle non-ASCII data); can someone confirm whether this is still an issue?
comment:4 by , 17 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:5 by , 17 years ago
Verified against trunk r7264, using:
def search(request, query): response = HttpResponse() response.write("query: %s" % query) return response urlpatterns = patterns('', (r'q/(?P<query>.*)$', search),)
does print "query: foo++bar" for /q/foo%2B+bar
comment:7 by , 16 years ago
milestone: | 1.0 → post-1.0 |
---|
comment:8 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:10 by , 16 years ago
Status: | new → assigned |
---|---|
Triage Stage: | Accepted → Design decision needed |
I've uploaded a patch which solves this, but read on.
On the dev server it's a one liner.
On mod_python and fastcgi, for a path of '/q/foo%2B+bar' the environment hands you '/q/foo++bar' in PATH_INFO. I've had to use REQUEST_URI (req.unparsed_uri in mod_python) instead of PATH_INFO.
If those smart people at mod_python and fastcgi/flup are converting the %2B and + this way, I don't feel comfortable doing it differently. I'm setting this back to 'Design Decision Needed' for advice.
For what it's worth I've tested the mod_python version of this patch on a medium to large work project, and it doesn't seem to cause any problems.
I've just noticed this patch seems to break a whole bunch of unit tests. Oops. Investigating.
by , 16 years ago
Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass
comment:11 by , 16 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
Attached a patch that will allow you to do this, but I don't recommend you do. It will probably break mod_rewrite. The answer is don't put the query in the path. Change your URL structure to be /search?q=foo%2B+bar.
+ encodes a space in a _query_ (coming from a form), not in a _path_. In a path %20 should be used. Create a document called 'my test.html' on a local Apache install and try browsing to http://localhost/my+test.html -> 404, replace the + with %20 and it works.
When Apache hands Django the PATH_INFO, the %2B has been correctly converted to a +, and the other plus is left as it because, in the path, it doesn't mean anything special. So foo%2B+bar correctly becomes foo++bar
Look at last.fm, where there is both a band called '44' and one called '+44':
http://www.last.fm/music/44 --> 44
http://www.last.fm/music/+44 --> 44
http://www.last.fm/music/%2B44 --> 44
http://www.last.fm/music/%252B44 --> +44
They had to double encode it, replacing the % with %25. That's what you'll have to do to use + as a character and a space in the path. Otherwise shift the query to the query component or the URL.
Well it would have a side-effect of breaking anyone's existing code if their code relied on the current behaviour...