Opened 17 years ago

Closed 14 years ago

Last modified 13 years ago

#4992 closed Uncategorized (fixed)

Alter cache key based on GET parameters

Reported by: anonymous Owned by: nobody
Component: Core (Cache system) Version: dev
Severity: Normal Keywords:
Cc: miracle2k, jim@…, dane.springmeyer@…, sylvain.pasche@…, taavi@…, apasotti@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Why GET requests are not cached by CacheMiddleware?
For example paginator accepts page numer as a GET parametr and pages other than first were not cached in any way.
I changed this policy on my own in django source, but I think it should be merged in to django, maybe as an option.
Patch changing it is attached to this ticket.

Attachments (3)

diff (1.3 KB ) - added by anonymous 17 years ago.
patch
cache_by_request_full_path.diff (3.9 KB ) - added by PeterKz 16 years ago.
Cache by request.get_full_path() instead of request.path
cache_by_request_full_path_v2.diff (7.0 KB ) - added by Thomas Güttler 14 years ago.
Updated patch of PeterKz to apply cleanly to trunk. Added tests

Download all attachments as: .zip

Change History (30)

by anonymous, 17 years ago

Attachment: diff added

patch

comment:1 by Jacob, 17 years ago

Resolution: invalid
Status: newclosed

GET requests are cached by the cache middleware. If you're having difficulties making it work, try asking on the django-users mailing list.

Oh, and you don't want to use this patch. It's gonna break your site pretty bad.

comment:2 by Collin Grady <cgrady@…>, 17 years ago

Resolution: invalid
Status: closedreopened
Summary: Cache GET request.Alter cache key based on GET parameters

Talking with the submitter on IRC, I think there was a miscommunication in his original summary. He's not saying that it won't cache GET requests at all, he wants the cache keys to vary based on the GET params, so you can cache a view and still use GET params.

For instance, with the current cache middleware, /?page=2, and /?page=3 would skip the cache entirely, while they could be cached based on the page=2 or page=3 bit as well.

The attached patch isn't a good way to do it, but I think the idea has merit :)

comment:3 by James Bennett, 17 years ago

Probably the best way to get different cache keys from different GET parameters is to use an md5 of the contents of request.GET (so something like md5.new(str(request.GET)).hexdigest()).

comment:4 by Simon G. <dev@…>, 17 years ago

Triage Stage: UnreviewedDesign decision needed

comment:5 by Jacob, 17 years ago

Component: UncategorizedCache system

comment:6 by Jacob, 17 years ago

FWIW, a simple str(request.GET) isn't good enough -- if this ticket gets accepted, I'd expect /?a=1&b=2 to have the same cache key as /?b=2&a=1, so there's sorting business that needs to happen, too.

I'm currently -0 on this, though. For one, it increases the time to calculate caching in the most common case. On top of that, though GET with a query string is supposed to be cacheable, RFC 2616 notes that "some applications have traditionally used GETs [...] with query URLs [...] to perform operations with significant side effects" and suggests that caches treat such requests carefully (§13.9). I'm inclined to do the same.

comment:7 by Daniel Pope <dan@…>, 16 years ago

What about just request.get_full_path()?

HTTP caches don't consider /?a=1&b=2 to have the same cache key as /?b=2&a=1; they treat the URL as opaque. We know different, but developers shouldn't be surprised about having to use the same key for internal caching that would be used externally.

comment:8 by Julian Bez, 16 years ago

milestone: post-1.0

comment:9 by PeterKz, 16 years ago

If I understand http://www.w3.org/DesignIssues/Axioms.html correctly you should treat URL:s as opaque and get three different cache objects for:

  • example.com/list/
  • example.com/list/?a=1&b=2
  • example.com/list/?b=2&a=1

It is likely that there won't be many permutations for multiple URL parameters as they are typically constructed in a form controlled by the application developer.

Implementing this would likely give a decent performance boost for Django apps that filter large lists based on query parameters.

comment:10 by PeterKz, 16 years ago

Design-wise it would be great if the developer could be in control of which parameters that require a new cache item. After all, it is the developer who knows which parameters are likely to influence the result returned to the client. The currenct view cache decorator could be complemented like this:

Vary by the entire URL (should be default behaviour):

@cache_page(60 * 15)
@vary_by_param("*")
def slashdot_this(request):
    ...

Only vary by values for parameter a and b (ignore everything else):

@cache_page(60 * 15)
@vary_by_param(["a","b"])
def slashdot_this(request):
    ...

comment:11 by Daniel Pope <dan@…>, 16 years ago

Restricting this to a subset of the GET parameters would only reduce the number of duplicate copies in the cache. As such I think it's more beneficial to provide transparent caching with a simple decorator API. The reason we need to use request.get_full_path() is that the current behaviour allows Django to serve the wrong page for requests with a query string. Trying to reduce the number of duplicate entries in the cache to the bare minimum is more of a wishlist item.

Rather than building all of the possible ways you could generate cache keys into decorators, why not just allow cache_page to take a callable?

eg.

@cache_page(key=lambda request: request.path)
def slashdot_this(request):
  ...

This should still default to request.get_full_path() so that the default behaviour works for all views.

comment:12 by PeterKz, 16 years ago

get_full_path() seems to be the way to go. It returns the path and query, but ignores the fragment identifier. So, this would be a minor patch that improves performance for many sites.

by PeterKz, 16 years ago

Cache by request.get_full_path() instead of request.path

comment:13 by PeterKz, 16 years ago

Has patch: set

Uploaded patch for the current trunk.

comment:14 by Jim Garrison, 16 years ago

Cc: jim@… added

comment:15 by springmeyer, 16 years ago

Cc: dane.springmeyer@… added

comment:16 by sylvain, 16 years ago

Cc: sylvain.pasche@… added

comment:17 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:18 by Mikhail Korobov, 15 years ago

I've implemented something like Daniel Pope 's suggestion and opened separate ticket for that: #11269.

comment:19 by anonymous, 15 years ago

Cc: taavi@… added

comment:20 by miracle2k, 15 years ago

Cc: miracle2k added
Resolution: fixed
Status: reopenedclosed

comment:21 by miracle2k, 15 years ago

Resolution: fixed
Status: closedreopened

Sorry. This has happened to me before. Is it a bug in Trac?

comment:22 by Malcolm Tredinnick, 14 years ago

Triage Stage: Design decision neededAccepted

Even squid recently (last year) changed their default caching behaviour to respect GET params, so they think the web has moved far enough to udnerstand side-effects methods. We should do this now.

comment:23 by Thomas Güttler, 14 years ago

Cc: hv@… added

by Thomas Güttler, 14 years ago

Updated patch of PeterKz to apply cleanly to trunk. Added tests

comment:24 by elpaso66, 14 years ago

Cc: apasotti@… added

comment:25 by Jannis Leidel, 14 years ago

Triage Stage: AcceptedReady for checkin

comment:26 by Jannis Leidel, 14 years ago

Resolution: fixed
Status: reopenedclosed

In [15705]:

Fixed #4992 -- Respect the GET request query string when creating cache keys. Thanks PeterKz and guettli for the initial patch.

comment:27 by Thomas Güttler, 13 years ago

Cc: hv@… removed
Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset
Note: See TracTickets for help on using tickets.
Back to Top