#29994 closed Cleanup/optimization (fixed)
Document performance issues in FileBasedCache
Reported by: | Mateusz Konieczny | Owned by: | Sarah Boyce |
---|---|---|---|
Component: | Documentation | Version: | 2.1 |
Severity: | Normal | Keywords: | |
Cc: | Sarah Boyce | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
https://docs.djangoproject.com/en/2.1/topics/cache/#filesystem-caching is not mentioning any problems with FileBasedCache
According to https://github.com/grantjenks/python-diskcache#diskcache-disk-backed-cache "Unfortunately the file-based cache in Django is essentially broken. The culling method is random and large caches repeatedly scan a cache directory which slows linearly with growth. Can you really allow it to take sixty milliseconds to store a key in a cache with a thousand items?"
From my checking it seems to not be reported so far.
According to https://github.com/grantjenks/python-diskcache/issues/93#issuecomment-442580191 "The deficiencies of FileBasedCache are too well known to require enumeration" but it would be nice to warn also developers unaware of this problems.
Change History (17)
comment:1 by , 6 years ago
Needs documentation: | set |
---|
comment:2 by , 6 years ago
comment:3 by , 6 years ago
Needs documentation: | unset |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
The Django documentation avoids endorsing third-party packages because of the burden of curation this would present. I think we would need a consensus on the DevelopersMailingList to do that.
comment:4 by , 6 years ago
From #11260 - ""I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.""
Would it be OK to submit patch that for start will add something like "Note that this caching strategy is useful only for development and should not be used in production due to a poor performance" ?
comment:5 by , 6 years ago
I'm not sure. I think ticket:11260#comment:9 is interesting: "I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant." The patch for #11260 is very simple and seems worth considering if it makes file-based caching usable in production. Do you have an opinion on that, Grant?
comment:6 by , 6 years ago
I looked at the patch in #11260 but as far as I can tell it only adds a scenario where when _max_enries
is set to 0 or None then no culling ever takes place. I'm not sure how anteater_sa
handles cache size in that case. I suppose if you can guarantee that the filesystem cache will never grow beyond a certain size then that's a reasonable strategy. It's more of a persistent dictionary though.
I certainly think filesystem caching has serious use cases. I myself have used diskcache for an ecommerce site for several years with tens of thousands of pages. I started with the built-in file-based cached but later chose sqlite3 (used by diskcache) for better performance.
comment:7 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:8 by , 2 years ago
The _cull function in Filesystem caching does still slow down with a large number of files. I added a warning to the docs.
comment:9 by , 2 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
comment:10 by , 2 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:11 by , 2 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
comment:12 by , 2 years ago
Patch needs improvement: | set |
---|
comment:13 by , 2 years ago
Cc: | added |
---|---|
Owner: | removed |
Status: | assigned → new |
comment:14 by , 22 months ago
Owner: | set to |
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
comment:15 by , 22 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Related, and much older issue: https://code.djangoproject.com/ticket/11260
I agree that the file-based caching docs could be improved to described the limitations. I would also welcome a link to the Disk Cache project (disclaimer: I'm the author.)
For the Django developers, please don't think Disk Cache is a criticism of Django's file-based cache. I love Django! The Disk Cache project is thousands of lines of code with many non-Django users. I don't think the code nor design is appropriate for inclusion in Django.