Opened 7 years ago
Closed 7 years ago
#28977 closed New feature (fixed)
Change Local Memory Cache to Use LRU
Reported by: | Grant Jenks | Owned by: | Grant Jenks |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Adam Johnson, josh.smeaton@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The current local memory cache (locmem) in Django uses a pseudo-random culling strategy. Rather than random, the OrderedDict data type can be used to implement an LRU eviction policy. A prototype implementation is already used by functools.lru_cache and Python 3 now supports OrderedDict.move_to_end and OrderedDict.popitem to ease the implementation.
I'm willing to work on a pull request that changes locmem to use an LRU eviction strategy but I wanted to first check whether it would be accepted. I did some research to find a good reason for the locmem's random culling strategy but did not find one.
There's also a bit of prior art at https://pypi.python.org/pypi/django-lrucache-backend.
Change History (13)
comment:1 by , 7 years ago
Triage Stage: | Unreviewed → Someday/Maybe |
---|
comment:2 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 7 years ago
I have created the example changes at https://github.com/grantjenks/django/tree/ticket_28977 The only thing I remain unsure about is whether django.utils.synch.RWLock needs to be deprecated. View the commit at https://github.com/grantjenks/django/commit/b06574f6713d4b7d367d7a11e0268fb62f5fd1d1
I will ask for feedback on the developers mailing list next.
comment:4 by , 7 years ago
Has patch: | set |
---|---|
Triage Stage: | Someday/Maybe → Accepted |
Pull request created at https://github.com/django/django/pull/9555
Django developers mailing list thread at https://groups.google.com/forum/#!topic/django-developers/Gz2XqtoYmNk
Summarizing the two responses: Josh Smeaton (author of django-lrucache-backend, project cited in the initial post) was in favor of providing a better default option. Adam Johnson was also +1. Josh and Adam were also interested in providing a way to disable cache key validation. I'm +0 on that change so long as the default is to validate the key. I'd rather not add those changes myself.
I did some very simple benchmarking locally:
Here's the performance of cache.get for the current implementation:
$ python manage.py shell Python 3.6.3 (default, Oct 5 2017, 22:47:21) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: from django.core.cache import cache In [2]: cache.set(b'foo', b'bar') In [3]: %timeit cache.get(b'foo') 14.2 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And here's the performance of cache.get for the new implementation:
$ python manage.py shell Python 3.6.3 (default, Oct 5 2017, 22:47:21) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: from django.core.cache import cache In [2]: cache.set(b'foo', b'bar') In [3]: %timeit cache.get(b'foo') 6.29 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I also tried Josh's django-lrucache-backend implementation:
$ python manage.py shell Python 3.6.3 (default, Oct 5 2017, 22:47:21) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. In [3]: from django.core.cache import caches In [4]: cache = caches['local'] In [5]: cache.set(b'foo', b'bar') In [6]: %timeit cache.get(b'foo') 10.1 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
It's not a great benchmark but it's encouraging. The new implementation appears to be faster than both the current implementation and the django-lrucache-backend. I haven't profiled but I would guess collections.OrderedDict is very fast, as is threading.RLock both of which are introduced by these changes.
comment:5 by , 7 years ago
Josh asked on django-developers if I would run more extensive benchmarking against the current changes. Here's the results of the locmem backend using the python-diskcache benchmark harness. Note that these were generated with only one process as locmem does no cross-process synchronization.
The current implementation benchmark results:
========= ========= ========= ========= ========= ========= ========= ========= Timings for locmem ------------------------------------------------------------------------------- Action Count Miss Median P90 P99 Max Total ========= ========= ========= ========= ========= ========= ========= ========= get 88986 16526 12.159us 19.789us 28.849us 154.734us 1.266s set 9030 0 14.067us 15.974us 30.994us 150.919us 134.134ms delete 983 0 11.921us 13.828us 26.941us 32.663us 12.563ms Total 98999 1.413s ========= ========= ========= ========= ========= ========= ========= =========
And the new implementation benchmark results:
========= ========= ========= ========= ========= ========= ========= ========= Timings for locmem ------------------------------------------------------------------------------- Action Count Miss Median P90 P99 Max Total ========= ========= ========= ========= ========= ========= ========= ========= get 88986 16526 5.007us 5.245us 14.067us 80.109us 449.330ms set 9030 0 5.960us 7.153us 16.689us 111.103us 58.852ms delete 983 0 4.053us 5.007us 8.821us 18.835us 4.200ms Total 98999 512.381ms ========= ========= ========= ========= ========= ========= ========= =========
Displayed is four percentiles of timing: Median (50th), 90th, 99th, and Max (100th). The results are better across the board with a 50% speedup typical.
comment:6 by , 7 years ago
One more set of benchmarks. The above results benchmark in a single-process and single-thread environment. There's therefore never a time when the locks contend with one another for access. I thought it would be useful to observe performance with more contention and my development machine has eight logical processors so here's the results in a single-process and eight-thread environment.
The current implementation:
========= ========= ========= ========= ========= ========= ========= ========= Timings for locmem ------------------------------------------------------------------------------- Action Count Miss Median P90 P99 Max Total ========= ========= ========= ========= ========= ========= ========= ========= get 712648 85412 14.067us 863.075us 2.461ms 14.337ms 160.973s set 71226 0 204.563us 238.180us 282.288us 7.419ms 12.356s delete 8118 0 201.941us 235.081us 281.811us 410.080us 1.395s Total 791992 174.724s ========= ========= ========= ========= ========= ========= ========= =========
The new implementation:
========= ========= ========= ========= ========= ========= ========= ========= Timings for locmem ------------------------------------------------------------------------------- Action Count Miss Median P90 P99 Max Total ========= ========= ========= ========= ========= ========= ========= ========= get 712637 84222 146.151us 165.224us 201.225us 9.591ms 106.855s set 71241 0 149.012us 167.847us 204.802us 9.604ms 10.875s delete 8114 0 144.958us 163.078us 195.026us 552.893us 1.200s Total 791992 118.930s ========= ========= ========= ========= ========= ========= ========= =========
The results overall display less variation but the median "get" time is 10x slower (14 microseconds vs 146 microseconds). Despite that large slowdown, the total time for "get" operations is ~33% faster (160 seconds vs 106 seconds). The "set" and "delete" operations have comparable performance.
The LRU eviction policy turns every read into a write which obsoletes the django.utils.synch.RWLock used in the current implementation. The RWLock was capable of giving multiple readers concurrent access but with the additional overhead. Therefore we see in the "no contention" scenario that the overhead is less by using threading.RLock. In the "high contention" scenario the additional overhead gives reads an advantage at the median but a larger penalty at the 99th percentile.
I think these tradeoffs are easy to accept. Note that the benchmark does not include a miss-rate penalty which the LRU-eviction policy will almost certainly improve compared with the current random-eviction policy.
comment:7 by , 7 years ago
Cc: | added |
---|
comment:8 by , 7 years ago
Cc: | added |
---|
comment:9 by , 7 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:10 by , 7 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Some test coverage is missing as noted on the PR.
comment:12 by , 7 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
I'm not sure how to evaluate the pros and cons of that change. I guess the idea should be raised on the DevelopersMailingList.