Context Navigation

← Previous Ticket
Next Ticket →

#17896 closed Cleanup/optimization (fixed)

Implement file hash computation as a separate method of staticfiles' CachedFilesMixin

Reported by:	mkai	Owned by:	nobody
Component:	contrib.staticfiles	Version:	dev
Severity:	Normal	Keywords:	CachedStaticFilesStorage, hash, md5
Cc:	Jannis Leidel	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	yes
Easy pickings:	no	UI/UX:	no

Description

I'm currently extending django-cumulus (Rackspace Cloud Files storage backend) and S3BotoStorage (Amazon S3 storage backend) to use hashed file names using the CachedFilesMixin of contrib.staticfiles. That came down to (as intended, I suppose!):

class MyCachedCloudFilesStorage(CachedFilesMixin, CloudFilesStorage):
    pass

Now I noticed that both Amazon and Rackspace include file hashes for each file in their API responses. It the response (e. g. a file list) is cached appropriately, the post-processing of the CachedFilesMixin can be sped up dramatically by just getting the hash from the cached response instead of calculating it from the file (which is expensive because it reads the file over the network).

I propose the attached patch so that backend authors can implement their own method of getting the hash for a file:

diff --git a/django/contrib/staticfiles/storage.py b/django/contrib/staticfiles/storage.py
index c000f97..e59b987 100644
--- a/django/contrib/staticfiles/storage.py
+++ b/django/contrib/staticfiles/storage.py
@@ -65,6 +65,13 @@ class CachedFilesMixin(object):
                 compiled = re.compile(pattern)
                 self._patterns.setdefault(extension, []).append(compiled)
 
+    def get_file_hash(self, name, content=None):
+        """Gets the MD5 hash of the file."""
+        md5 = hashlib.md5()
+        for chunk in content.chunks():
+            md5.update(chunk)
+        return md5.hexdigest()[:12]
+
     def hashed_name(self, name, content=None):
         parsed_name = urlsplit(unquote(name))
         clean_name = parsed_name.path.strip()
@@ -79,13 +86,9 @@ class CachedFilesMixin(object):
                 return name
         path, filename = os.path.split(clean_name)
         root, ext = os.path.splitext(filename)
-        # Get the MD5 hash of the file
-        md5 = hashlib.md5()
-        for chunk in content.chunks():
-            md5.update(chunk)
-        md5sum = md5.hexdigest()[:12]
+        file_hash = self.get_file_hash(clean_name, content)
         hashed_name = os.path.join(path, u"%s.%s%s" %
-                                   (root, md5sum, ext))
+                                   (root, file_hash, ext))

Attachments (1)

cachedfilesmixin_custom_file_hash.diff (1.3 KB ) - added by mkai 13 years ago.

Download all attachments as: .zip

Change History (6)

by mkai, 13 years ago

Attachment:	cachedfilesmixin_custom_file_hash.diff added

comment:1 by Preston Holmes, 13 years ago

Is there a reason you wouldn't just subclass the mixin and make your changes there?

comment:2 by mkai, 13 years ago

Yes, of course, and I'm already doing this. I needed to copy/paste the get_hash_name method in full, though, so my storage class was blown up from the above (2 lines) to 45 lines of duplicated code.

I figure that when overwriting the mixin, customizing the file hash computation would be a common change, especially when one is using a slow, remote storage and wants to avoid having to open each file (which takes about a second for each file on Rackspace Cloud Files, for example).

comment:3 by Jannis Leidel, 13 years ago

Triage Stage:	Unreviewed → Accepted

Sounds like a good idea to me.

comment:4 by Jannis Leidel, 13 years ago

Patch needs improvement:	set

FWIW, the content parameter to the method needs to be checked before iterating over it.

comment:5 by Jannis Leidel, 13 years ago

Resolution:	→ fixed
Status:	new → closed

Fixed in 5f75ac91df2ef1c19946f65e08aa50165f061cd4.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#17896 closed Cleanup/optimization (fixed)

Implement file hash computation as a separate method of staticfiles' CachedFilesMixin

Description

Attachments (1)

Change History (6)

by mkai, 13 years ago

comment:1 by Preston Holmes, 13 years ago

comment:2 by mkai, 13 years ago

comment:3 by Jannis Leidel, 13 years ago

comment:4 by Jannis Leidel, 13 years ago

comment:5 by Jannis Leidel, 13 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us