Opened 16 years ago
Closed 15 years ago
#9180 closed (duplicate)
Low-level cache interface incorrectly tries to typecast bytestring
Reported by: | Paul Smith | Owned by: | nobody |
---|---|---|---|
Component: | Core (Cache system) | Version: | 1.1 |
Severity: | Keywords: | ||
Cc: | django.9180@…, Gonzalo Saavedra, lvscar, Oliver Beattie | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The low-level Django cache API encodes basestrings as UTF-8 when set
ting, and decodes basestrings as Unicode when get
ting.
If you're trying to store a string of bytes in the cache -- for instance, the raw bytes of an image -- the set operation will possibly modify the data by encoding it, and the get operation will potentially raise a DjangoUnicodeDecodeError if the codec can't decode bytes in the string.
from django.core.cache import cache from django.utils.encoding import DjangoUnicodeDecodeError # The simplest possible GIF: a 43-byte, 1x1-pixel transparent image EMPTY_GIF_BYTES = 'GIF89a\x01\x00\x01\x00\xf0\x00\x00\xb0\x8cZ\x00\x00\x00!\xf9\x04\x00\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x02D\x01\x00;' cache.set('empty_gif', EMPTY_GIF_BYTES) try: cache.get('empty_gif') except DjangoUnicodeDecodeError: print 'Tried to decode GIF bytestring as a Unicode string' else: print 'Got the raw GIF bytestring'
A workaround is to create a one-tuple from the bytestring when storing, and when retrieving, returning the single element from the tuple.
def raw_cache_set(key, value, timeout_seconds=None): cache.set(key, (value,), timeout_seconds=timeout_seconds) def raw_cache_get(key): value = cache.get(key) if value is not None: return value[0] raw_cache_set('empty_gif', EMPTY_GIF_BYTES) assert raw_cache_get('empty_gif') == EMPTY_GIF_BYTES
One possible fix would be to expose a raw
keyword argument boolean to cache.get
, cache.set
, cache.add
, cache.get_many
that would conditionally skip any smart_unicode
or other encoding/decoding logic when storing or retrieving from the cache.
Attachments (1)
Change History (9)
comment:1 by , 16 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 16 years ago
comment:3 by , 16 years ago
Has patch: | set |
---|
I've tested 9180.diff
using the memcache
library; I'd appreciate it if someone could test it with cmemcache
so we could get this to ready for checkin.
comment:4 by , 16 years ago
Cc: | added |
---|
by , 16 years ago
Patch that fixes this issue, is backwards-compatible & includes a regression test
comment:5 by , 16 years ago
Cc: | added |
---|
comment:6 by , 15 years ago
Cc: | added |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Version: | 1.0 → 1.1 |
django 1.1 lack the patch!
i patch it to django1.1 while cmemcache as my cache backend.
it make django.core.cache.cache.get method work great with pickled data
comment:7 by , 15 years ago
Cc: | added |
---|
Any update on the checkin status of this patch? The use case I came across this on was storing zlib-compressed data in the cache. Since that needs to be preserved as raw bytes, it's not possible to store and retrieve without further processing (thus lengthening the compressed data).
comment:8 by , 15 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
See also #5589. I would greatly prefer the
raw
kwarg, as using the 1-tuple solution would mean that Python pickles are actually being cached, which would mean that the cache can only be primed by Python code - and when using something like memcached, that's not always the case. For example, I often put output from imagemagick into memcached; right now, that's innaccessible fromdjango.core.cache.cache
due to this issue.