Opened 18 years ago

Closed 16 years ago

Last modified 16 years ago

#3553 closed (wontfix)

urlpatterns uses urllib.quote/unquote should use urllib.quote/unquote_plus

Reported by: justin-django@… Owned by: Graham King
Component: Core (Other) Version: 0.95
Severity: Keywords: sprintsept14
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using the + character to represent spaces in urls it is impossible to differentiate between a literal + and an escaped %2B in the url.

Example

urlpatterns = (r'/search/(?<query>.*)$,app.search)
def search(request, query):
  print "query: "+query

Observed

Visiting http://example.com/search/foo%2B+bar causes django to print foo++bar

Expected

Visitin http://example.com/search/foo%2B+bar should cause django to print foo+ bar

Solution

I assume that django is using urllib.quote/unqoute Changing it to use urllib.quote_plus and urllib.unqoute_plus would produce the expected behavior and not have any side affects

Attachments (1)

3553.diff (6.4 KB ) - added by Graham King 16 years ago.
Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

Download all attachments as: .zip

Change History (13)

comment:1 by Chris Beaven, 18 years ago

Triage Stage: UnreviewedDesign decision needed

Well it would have a side-effect of breaking anyone's existing code if their code relied on the current behaviour...

comment:2 by justin-django@…, 18 years ago

probably the best method is to have a init param default to urllib.un/quote and be able to specify any func... (excuse the misterminology python newb)

comment:3 by James Bennett, 17 years ago

Component: UncategorizedCore framework
Keywords: sprintsept14 added

I believe all of this changed with the Unicode merge (since various helper functions had to be written to handle non-ASCII data); can someone confirm whether this is still an issue?

comment:4 by Jacob, 17 years ago

Triage Stage: Design decision neededAccepted

comment:5 by loppear, 17 years ago

Verified against trunk r7264, using:

def search(request, query):
    response = HttpResponse()
    response.write("query: %s" % query)
    return response

urlpatterns = patterns('', (r'q/(?P<query>.*)$', search),)

does print "query: foo++bar" for /q/foo%2B+bar

comment:6 by Matt McClanahan, 16 years ago

milestone: 1.0
Needs tests: set

Verified against [8776].

comment:7 by Jacob, 16 years ago

milestone: 1.0post-1.0

comment:8 by anonymous, 16 years ago

Owner: changed from nobody to anonymous
Status: newassigned

comment:9 by Graham King, 16 years ago

Owner: changed from anonymous to Graham King
Status: assignednew

comment:10 by Graham King, 16 years ago

Status: newassigned
Triage Stage: AcceptedDesign decision needed

I've uploaded a patch which solves this, but read on.
On the dev server it's a one liner.

On mod_python and fastcgi, for a path of '/q/foo%2B+bar' the environment hands you '/q/foo++bar' in PATH_INFO. I've had to use REQUEST_URI (req.unparsed_uri in mod_python) instead of PATH_INFO.
If those smart people at mod_python and fastcgi/flup are converting the %2B and + this way, I don't feel comfortable doing it differently. I'm setting this back to 'Design Decision Needed' for advice.

For what it's worth I've tested the mod_python version of this patch on a medium to large work project, and it doesn't seem to cause any problems.
I've just noticed this patch seems to break a whole bunch of unit tests. Oops. Investigating.

by Graham King, 16 years ago

Attachment: 3553.diff added

Uses urllib.unquote_plus for dev server, mod_python and fastcgi, and all the tests pass

comment:11 by Graham King, 16 years ago

Resolution: wontfix
Status: assignedclosed

Attached a patch that will allow you to do this, but I don't recommend you do. It will probably break mod_rewrite. The answer is don't put the query in the path. Change your URL structure to be /search?q=foo%2B+bar.

+ encodes a space in a _query_ (coming from a form), not in a _path_. In a path %20 should be used. Create a document called 'my test.html' on a local Apache install and try browsing to http://localhost/my+test.html -> 404, replace the + with %20 and it works.

When Apache hands Django the PATH_INFO, the %2B has been correctly converted to a +, and the other plus is left as it because, in the path, it doesn't mean anything special. So foo%2B+bar correctly becomes foo++bar

Look at last.fm, where there is both a band called '44' and one called '+44':
http://www.last.fm/music/44 --> 44
http://www.last.fm/music/+44 --> 44
http://www.last.fm/music/%2B44 --> 44
http://www.last.fm/music/%252B44 --> +44

They had to double encode it, replacing the % with %25. That's what you'll have to do to use + as a character and a space in the path. Otherwise shift the query to the query component or the URL.

comment:12 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top