#14955 closed Uncategorized (fixed)
URLField validation should use HEAD requet instead of GET
Reported by: | Claude Paroz | Owned by: | Jannis Leidel |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When an URLField has verify_exists to True, it checks that the page exists with urrlib2.urlopen. Doing a GET request (urllib2 default) is not necessary whereas a simple HEAD request should be sufficient to know if the page exists.
Attachments (3)
Change History (15)
by , 14 years ago
Attachment: | urrlib2_head_request.diff added |
---|
comment:1 by , 14 years ago
Note that tests are already done in forms/tests/fields.py (e.g. test_urlfield_3).
comment:2 by , 14 years ago
milestone: | → 1.3 |
---|---|
Triage Stage: | Unreviewed → Ready for checkin |
I've applied it cleanly, tested it in a demo app and run the tests OK.
Marking it as RFC.
comment:3 by , 14 years ago
What about websites that don't allow HEAD request? There are some sites that only allow GET and POST.
How about HEAD first fall back to GET?
comment:4 by , 14 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | set |
Triage Stage: | Ready for checkin → Accepted |
Yeah, you are probably right. Even if it adds a supplementary request for each not found resource, a fallback to a GET request is probably more robust.
comment:5 by , 14 years ago
comment:6 by , 14 years ago
Owner: | removed |
---|---|
Patch needs improvement: | unset |
Thanks Luke for the useful insight. The patch now falls back to GET when response code is 405 or 501.
comment:7 by , 14 years ago
Class should not be created inside the method; classes are some of the most expensive (memorywise) objects in CPython, and can be slow to gc.
by , 14 years ago
Attachment: | urllib2_head_request_3.diff added |
---|
Put HeadRequest class at root level
comment:8 by , 14 years ago
Thanks Alex for your input. I've modified the patch to get HeadRequest class out of the method.
comment:9 by , 14 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:11 by , 14 years ago
Easy pickings: | unset |
---|---|
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
After thinking about this, I wonder if it's worth falling back to GET for any failed request? I've found a variety of IIS servers running ASP.NET that are responding with 403 errors or simply timing out on HEAD requests. Any thoughts? Should I open another ticket about this?
Use HEAD HTTP request instead of GET