#5133 closed Uncategorized (fixed)
Memcached connections get left open (in certain circumstances).
Reported by: | Jacob | Owned by: | nobody |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Normal | Keywords: | memcached |
Cc: | Michael Malone, rajesh.dhawan@…, harm.verhagen+django@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
In a few cases Django under mod_python can leave dangling connections to memcached open. We've had trouble tracking down the circumstances under which this happens, but when it does it can lead to memcached servers hitting their connection limits, which means caching stops. Nasty.
The attached patch should take care of this, but I'm not 100% sure it's the right approach.
Attachments (1)
Change History (23)
by , 17 years ago
Attachment: | memcached-cleanup-connections.patch added |
---|
comment:1 by , 17 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 17 years ago
comment:4 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Please try the latest version of python-memcached (which is 1.4.0 as of this writing). I believe that the error was there and that it has been resolved between 1.3.9 and 1.4.0.
comment:5 by , 17 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I'm sorry, but I've been using the latest version of memcached (1.2.2) and python-memcached (1.4.0), when I watch these problems.
comment:6 by , 17 years ago
hans: Indeed, my apologies. I looked at 1.4.0 during the sprint and must have missed it.
here's a patch against python-memcached 1.4.0 that removes threadlocals and doesn't leave any dangling connections for us. The threadlocals bit appeared somewhere in 1.3.x and has been the source for problems for us too. You might also look at cmemcache. This ticket should definitely remain open until we figure something out.
comment:7 by , 17 years ago
Thanks, your patch seems to solve our problem. Our Site is running for 24 hours and we didn't had to restart memcached or apache.
follow-up: 9 comment:8 by , 17 years ago
Improved patch (really just nitpick): http://dpaste.com/hold/41946/
comment:9 by , 17 years ago
Removing thread locals gives me lots of bad errors after running okay for a few minutes:
Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py", line 82, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.5/site-packages/shotserver04/websites/views.py", line 93, in details preload_foreign_keys(browsers, browser_group=True) File "/usr/lib/python2.5/site-packages/shotserver04/common/object_cache.py", line 165, in preload_foreign_keys [getattr(instance, field_id) for instance in instances])) File "/usr/lib/python2.5/site-packages/shotserver04/common/object_cache.py", line 92, in get_many [KEY_FORMAT % (model._meta.db_table, 'id', id) for id in id_list]) File "/usr/lib/python2.5/site-packages/django/core/cache/backends/memcached.py", line 41, in get_many return self._cache.get_multi(map(smart_str,keys)) File "/usr/lib/python2.5/site-packages/python_memcached-1.41-py2.5.egg/memcache.py", line 717, in get_multi retvals[prefixed_to_orig_key[rkey]] = val # un-prefix returned key. KeyError: 'platforms_operatingsystem:id=6'
However, memcached-cleanup-connections.patch seems to fix the open connections problem on my system.
- Django trunk r7519
- memcached 1.1.12-1
- python-memcached-1.41
comment:10 by , 16 years ago
I just saw this happen on a client's production site after we updated the memcache client from 1.34 to 1.43 (server is 1.1.12). We ended up applying Jacob's original patch because it looked the least intrusive, but it feels like something that python-memcache should be fixing rather than Django having to do manual connection management (my only concern with Jacob's patch). We elected not to go with Matt's patch because it basically undoes something that was included in python-memcache 1.36, which was included to fix some (other) multi-threading issues, according to their ChangeLog.
I've filed a bug upstream with the python memcache providers, so we'll see if anything happens.
comment:11 by , 16 years ago
Cc: | added |
---|
comment:12 by , 16 years ago
milestone: | → 1.0 |
---|
Just heard back from the python-memcached maintainer and current behaviour is intended, to allow for more efficient pooling. This problem may well end up being because we're not getting the same thread-id each time inside the same mod_python process or something similar. Since we probably don't want to do inter-thread pooling (for the same reason we don't do database connection pooling), I'd suggest we go with Jacob's original patch for now as a way to manage the connection lifetimes in a well-defined fashion.
Not going to apply this immediately, in case anybody else on the list has bright ideas, but barring any good objections, I think this would be worth applying before 1.0. It's a "bug" in the sense that we're not cooperating well with memcache and problems that only appear when you're wildly successful and/or under high load are not a lot of fun to see.
comment:13 by , 16 years ago
If python-memcached can't handle being used in a multithreaded environment, which I recollect as the case when I looked at the code many moons ago, it isn't just a mod_python issue, but would also be an issue were mod_wsgi, fastcgi, paste server, cherrypy wsgi server were used in a multithreaded configuration. I suspect that python-memcached may also have a lot of issues if any of those hosting mechanisms were to kill off threads when they feel like they are no longer needed and then create new ones later on. This would possibly be an issue as not sure it would do cleanup of state when a thread no longer becomes used. Anyway, that later point is an issue for python-memcached and not Django unless Django had similar issues with retaining state for threads beyond their lifetime.
comment:14 by , 16 years ago
Cc: | added |
---|
comment:15 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
(In [8418]) Fixed #5133 -- Explicitly close memcached connections after each request
(similar to database connection management). We can't effectively manage the
lifecycle by pooling connections and recent versions of python-memcache can
lead to connection exhaustion in some quite reasonable setups.
comment:16 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The patch takes care of connections kept open, but it introduces another problem - the need to open one or more tcp connections every single request.
With a simple loop, you can make a system run out of sockets easily - after a socket is closed, that port cannot be reused for an eternity, ranging from 1 minute to 4 depending on OS. If enough sockets get stuck in TIME_WAIT state, the server simply fails to connect to memcached and start serving everything from db again - that's not something you want to see on a site with sufficient traffic to need a memcached installation.
In my opinion, the cure is worse than the disease. There's an easy workaround available for the original problem: restart workers after a certain amount of requests. With max-request=500 on a 5 threads deamon process (mod_wsgi, times 20 processes), we never go over 100 connections on our memcached server, started with the default cap of 1024 connections. If you run mod_python, use MaxRequestsPerChild.
My current solution is to just noop the whole fix with one line in any .py: django.core.cache.backends.memcached.CacheClass.close = lambda x: None. It might be an idea to make it configurable so people can choose between disconnect after every request and keep it open until process restart.
comment:17 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
This bug was fixed many months ago, please file a newbug if you believe there is a new issue.
comment:18 by , 14 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
Severity: | → Normal |
Type: | → Uncategorized |
I have/had the following problem:
With memcached enabled, our site (dms.bildung.hessen.de) does not respond anymore after 10 - 20 minutes. Restarting memcached solves this problem for the next minutes. Therefor it's not a problem with apache or the program logic.
Therefore I tried the patch above. But it not solve our problem.
In the end I use the file-cache with no problems.