#23830 closed Cleanup/optimization (worksforme)
Make expired session clearing on database backend scale better
Reported by: | Adam Johnson | Owned by: | Adam Johnson |
---|---|---|---|
Component: | contrib.sessions | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
In two large django projects now I've had to swap out using the default clear_expired
implementation on the database SessionStore to a chunked approach since a single delete across the whole table became a bottleneck query. It would be better if the database backend clear that shipped with django was already chunked so django apps scale better.
Change History (7)
comment:1 by , 10 years ago
Status: | new → assigned |
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
The patch is a partial improvement because it does not instantiate all the objects. However using 'IN' query statements with a huge number of ids in it is not quite fast (to say it nicely) in most DB's. As whereof I think it won't be faster in general. (not tested though)
Actually in this specific case a 'DELETE FROM session WHERE expire_date<'20XX-XX-XX' would deliver a huge performance boost, as the expire_date field is DB indexed and it will not create a long running transaction in which the records are deleted one by one.
So I would propose a solution which constructs a 'DELETE FROM.. WHERE' statement and executes it.
comment:4 by , 10 years ago
Giving the patch a second thought, it breaks possible signal handlers POST_DELETE and PRE_DELETE as they are not triggered in this way.
The same problem is with my own proposal above.
So I think your patch is not a solution.
On the other hand, how does it happen that there are so many expired sessions to remove. Can't you just run the session cleanup more often?
comment:5 by , 10 years ago
I don't think the manual chunking solution proposed on the PR is acceptable. That makes the code quite scary.
Also, when I tested clearsessions
, I do get a single query:
DELETE FROM "django_session" WHERE "django_session"."expire_date" < %s' - PARAMS = (u'2014-11-20 16:13:19.188990',); args=(u'2014-11-20 16:13:19.188990',
I wonder if you have a FK to Session
or signal handler that's preventing fast-path deletion; see the QuerySet.delete() docs.
comment:6 by , 10 years ago
Resolution: | → worksforme |
---|---|
Status: | assigned → closed |
comment:7 by , 10 years ago
I can confirm there were no problems with signal handlers or foreign keys making the fast deletion path ignored. The problem (at my previous employer's, Memrise) was literally the DELETE FROM WHERE query touching too many rows in one operation and locking the index[es] whilst it removes them. The session table was seeing relatively high writes (memcached fleet for reads) as there were a number of features mutating session data.
Actually at one point clearsessions made the site go down briefly. The daily cron task had not run for a couple weeks due to misconfiguration, so when it did run the DELETE query was removing around half the table. This is all in MySQL but I'm sure postgres would behave similarly with such a large table.
If you don't want to introduce the chunked deletion into django core I understand, this problem must only occur with such high traffic sites. I'm using more tools like pt-archiver ( http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html ) for such cleanup operations now anyway. At least I had my first go at submitting a django patch though, thanks!
PR: https://github.com/django/django/pull/3520
Adapted from some production code.
Tests pass but there is naturally no test for >10k sessions. I tried one out and of course it was very slow. We could make the chunk size an extra setting and that would make it testable?