#7460 closed (fixed)
cache tag calculates invalid key for memcached
Reported by: | trbs | Owned by: | Jacob |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Keywords: | memcached, space, problem, with, in, key | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
The cache node from the cache tag calculates it's key from the fragment_name appended by all the vary-on keys in the context joined together by ':'
cache_key = u':'.join([self.fragment_name] + \ [force_unicode(resolve_variable(var, context)) for var in self.vary_on])
This however can lead to a situation where one uses an object name for one of the vary-on fields which returns a string with a space in it.
For example:
{% load cache %} {% cache 7200 sidebar blogname category %} ... {% endcache %}
Where category is the data model object for a category. As the same peace of template (DRY) is also used on the frontpage and other pages one cannot be sure that category is a valid object so you cannot use 'category.slug' (which will likely be the default answer to this ticket) Cause this will raise an None has no attribute 'slug' error.
But since category yields a string value there's can be a space in there which memcached does not like for it's keys.
[ERROR@1213562932.703199] mcm_validate_key_func():3443: memcache(4) protocol error: isspace(3) returned true for character in key
In this example the key will become: "sidebar:myweblog:Some Category" where the space between some and category will be the problem yielding the memcache error above.
The attached patch modifies the cache templatetag to guaranty that the key does not contain spaces. This seemed to be the best place, as the cache tag is used from the a template and not from code.
I've also thought about maybe handling this in the cache backend for memcached, which at first hand seemed another valid place as that also converts any non-ascii key to ascii ignoring all other characters. But i don't like a backend to start changing things for the user which can lead to unexpected results.
(Granted this already is the case in the memcached backend because of .encode('ascii', 'ignore') which will break an ignorant users caching when he using non-ascii characters in the key string. For example, when using localized texts as key strings. But that's another matter :) )
Attachments (5)
Change History (25)
by , 17 years ago
Attachment: | cache_replace_space_with_underscore.diff added |
---|
comment:1 by , 16 years ago
milestone: | → 1.0 |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 3 comment:2 by , 16 years ago
comment:3 by , 16 years ago
Replying to magneto:
given the nature of the unicode DB fields and other crazy non-memcachable key chars, wouldn't 'slugify' be a bit more robust here?
(i guess the only draw back is there may be a slight chance for one slugged key for multiple input keys)
slugify would impose to much of a structure for the cache-key imo.
cause the key should be case-sensitive, allow for underscores, alphanumerics and special characters.
for instance, an email address should be a valid cache key.
comment:4 by , 16 years ago
Patch needs improvement: | set |
---|
in Changeset 5718 some changes to the memcache backend where made to 'Fix some problems with Unicode usage and caching'
but it looks to me that the 'add' method of memcached was 'forgotten' in this commit.
where other methods use smart_str and/or value.encode, 'add' still does the imo wrong thing with key.encode and creates a very big potential problem for unicode users.
i also changed my view on where the patch should be, my first patch was in the wrong place. other cache backends
may very well have no problem at all with spaces inside cache keys. for instance a simple dict-based backend would allow spaces without problems.
so the replace of spaces to underscores should be inside the memcache.py backend, this way there's no difference between the backends.
by , 16 years ago
Attachment: | django_memcache_backend_7460.diff added |
---|
comment:5 by , 16 years ago
uploaded new patch which fixes both spaces in keys and cache.add for memcached backend
comment:6 by , 16 years ago
This approach has the usual bug with naive escaping of this nature: if I insert a key for foo_bar
, the value will be retrieved when foo bar
(containing) a space is asked for. And vice-versa. Basically, all the spaced versions are colliding with similar strings containing underscores.
comment:7 by , 16 years ago
We could percent-encode all keys. How about that? You could throw everything in there, spaces, unicode, and it gets encoded to something that memcached can handle.
comment:8 by , 16 years ago
The more I look at it, the more I dislike this approach. Everybody using memcache would be paying an extra processing penalty for one edge case. We've never allowed spaces in keys for memcache and there's really no reason to do so now. The thing generating the bad key should be responsible for generating a valid key instead. Fix it at the source, rather than penalising everybody for one thing misbehaving (since we control the misbehaving one thing).
comment:9 by , 16 years ago
Triage Stage: | Accepted → Design decision needed |
---|
I agree with you Malcolm, maybe encoding the keys for everyone is overkill.
I think it's a design decision now if we want to encode the keys for the cache tag only or never ever touch them, like it was the policy before (is this stated in the docs?). In the latter case, this ticket should be wontfix.
comment:10 by , 16 years ago
sorry that two slightly separate issue's got rounded up in this one ticket.
the smart_str was placed there to fix issues with unicode, if we remove all encoding then memcached would be as good as useless for unicode sites. and you can hardly say that having a unicode key is 'bad', then again if having a simple space in a key is bad then i guess unicode can be considered bad as well :)
i also agree that we should not put in overhead for a relatively small corner case. but imho i feel that the source is not generating a bad-key, it's just that memcached has this severe limitation.
so maybe we could still 'fix' the spaces case in the cache_tag itself, cause i think it should be proper there to use spaces. and since it already does a lot of processing on what's going to be the key it will not hurt in terms of performance.
then in memcache only fix the 'add' method to use the same smart_str as the other methods so it's consistent.
comment:11 by , 16 years ago
Triage Stage: | Design decision needed → Accepted |
---|
The add
method for memcache should be a separate ticket. This ticket is about the cache tag generating an invalid key.
As for the cache tag, the solution is simple and I really don't understand need for all the discussion. When generating a cache key, make sure it doesn't include spaces, so that it can be inserted into memcache and make sure it doesn't collide with any other unique key. In other words, generate a valid key when constructing the cache key. There's no edge case or need for design decision (the ticket was already accepted. Triagers should not change that) or anything. There's a simple, unambiguous bug that should be fixed.
Unicode, spaces or anything else aren't broken or bad and we aren't saying don't use them. It's a simple fact that the memcache API doesn't take spaces in the keys, so opinions on that are irrelevant to the issue. The fact that we are generating a key with spaces makes this a bug in whatever is generating that key, since the requirement is "no spaces" for memcache. So let's just fix the actual bug and move on: generate a spaceless, unique key when creating the key for the cache tag. Do it in such a way that other, similar tags might be able to use the same code if/when they are written.
by , 16 years ago
Attachment: | cachetag.diff added |
---|
comment:12 by , 16 years ago
This version now has a build_cache_key
function handling the key generation for cache tags. It generates a spaceless, unique key. You cannot insert a different string and get the same key.
comment:13 by , 16 years ago
i've created a new ticket for the 'add' methods inconsistency here #8410
julian's patch looks good
comment:14 by , 16 years ago
milestone: | 1.0 → post-1.0 |
---|
I'm going to punt this to post-1.0; as Malcolm pointed out, the fact that memcached has rules about keys means that you should be careful choosing your keys, so any behind-the-scenes wizardry to save you from yourself can wait.
comment:15 by , 16 years ago
milestone: | post-1.0 → 1.0 |
---|
I don't think we can assume the template designers understand the subtleties of memcached, so I think making the cache tag Do The Right Thing is something we should do. So un-punting.
I'd like to see a patch without the single-use build_cache_key function -- there's no reason not to inline it. Also, you should use something like urlencode to generate the key; that'll make sure to deal with bad chars other than spaces.
comment:16 by , 16 years ago
Owner: | changed from | to
---|
by , 16 years ago
Attachment: | django_templatetags_cache-key-quote.diff added |
---|
comment:17 by , 16 years ago
added new patch with a minimal change to cachetag so it uses urlquote to encode vary-on's to something every caching backend should be able to handle.
comment:18 by , 16 years ago
Five different patches and not a single one has a test to try and demonstrate that the problem is fixed. :-(
comment:19 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
given the nature of the unicode DB fields and other crazy non-memcachable key chars, wouldn't 'slugify' be a bit more robust here?
(i guess the only draw back is there may be a slight chance for one slugged key for multiple input keys)