Opened 19 years ago

Closed 13 years ago

Last modified 5 years ago

#17 closed New feature (wontfix)

Metasystem optimization: Share select_related in memory

Reported by: Adrian Holovaty Owned by: Philippe Raoult
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: feature caching
Cc: miracle2k, Mike Scott, dcramer@…, brosner@…, paching@…, wbyoung@…, jesse.lovelace@…, calvin@…, David Larlet, haavikko@…, matt@…, romme@…, django@…, Trevor Caira, datavortex@…, Michael Malone, Alexander Koshelev, xiterrex@…, Craig de Stigter, eallik@…, dan.fairs@…, miloslav.pojman@…, James Pic Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When using select_related, each cache is stored separately in memory. For
example, each Choice object here has its Poll cached, but each of those caches
is a separate object in memory. It would require less memory if the caches were
references to the same Poll object.

>>> from djangomodels.polls import choices
>>> choice_list = choices.get_list(poll__id__exact=90, select_related=True)
>>> id(choice_list[0]._poll_cache)
-156344020
>>> id(choice_list[1]._poll_cache)
-156344084

Attachments (9)

patch (4.0 KB ) - added by (removed) 17 years ago.
rough stab at instance caching
patch2.diff (4.0 KB ) - added by David Cramer <dcramer@…> 17 years ago.
corrected an issue with pk_val not being set
instance-caching-2.patch (20.9 KB ) - added by (removed) 17 years ago.
instance-caching-v2
instance-caching-3.patch (11.2 KB ) - added by (removed) 17 years ago.
instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.
5737_instance_caching.diff (8.6 KB ) - added by Brian Rosner <brosner@…> 17 years ago.
updated patch to #5737 with some minor edits
instance-caching-4.patch (12.7 KB ) - added by (removed) 17 years ago.
split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).
instance-caching-5.patch (12.8 KB ) - added by (removed) 17 years ago.
add in a class attr able to disable instance caching for that model.
17_no_tests.diff (10.8 KB ) - added by Philippe Raoult 17 years ago.
cleanup and comments. Not working but comments welcome
17.diff (19.5 KB ) - added by Philippe Raoult 17 years ago.
added tests, and style issues

Download all attachments as: .zip

Change History (73)

comment:1 by Here, 18 years ago

Type: enhancementdefect

comment:2 by Gary Wilson <gary.wilson@…>, 18 years ago

Triage Stage: UnreviewedAccepted

Barring any gotchas, I don't think anyone would argue with this.

comment:3 by (removed), 17 years ago

Cc: ferringb@… added

Worth noting this also will speed things up pretty nicely for mass instance creation- Model.init

Downside to such instance caching is that currently with django, you can get away with modifying an instance and decide to not commit the changes- if it's a shared instance, that no longer flies. Have a patch locally that does caching, but had a few failures test wise.

Still any interest? If so, can dig it out and update it.

comment:4 by Chris Beaven, 17 years ago

Can't harm to attach your patch to this ticket, even if it still needs work.

by (removed), 17 years ago

Attachment: patch added

rough stab at instance caching

comment:5 by (removed), 17 years ago

Offhand, that patch breaks a couple of tests; screws up a couple of tests that rely on checking the # of queries to see if things behaved (those tests are slightly broke imo anyways, since they're not checking *what* the queries were about, just that the db was touched N times).

Basically consists of two changes;
change 1: modify Model metaclass overriding default call to do a weakref cache lookup, returning the instance if it's already available; additionally, modify the save method to update the cache for new instances post commit so that they're reused.

change 2: mangle fields.ReverseSingleRelatedObjectDescriptor inserting a cache lookup; did it this way since bastardizing the query api didn't seem wise, but changing record_inst.foreign_key to try and use cached instances (thus avoiding hitting the db) *is* useful.

Mind you, been a while since I've done serious testing against this, but recall change #2 being particularly useful for select_related; reduction of 32s to 23s for 53k record set joining in around 20 unique foreign keys. 14.6k records with a simple display, reduction from 11.8s to 5.4s; mainly via again avoiding the overkill of select_related, and via dodging Model.init; *very* heavy function call runtime wise.

Fair warning also; works for my own uses, but definitely needs heavier testing (and unittests). Basically is a proof of concept.

comment:6 by Chris Beaven, 17 years ago

Has patch: set
Needs tests: set
Patch needs improvement: set

comment:7 by David Cramer <dcramer@…>, 17 years ago

Cc: dcramer@… added

We are running this live on curse-gaming.com now -- in a brief set of tests I ran we gained 30% speed and 80% more efficiency on memory use in our apps. We ran into one issue w/ a set of models doing .select_related() (at a depth of 2) but it wasn't serious for us (as we don't do 2nd depth select_related's.. i dont recommend anyone to do it) and this change is VERY important for load optimization.

comment:8 by Brian Rosner <brosner@…>, 17 years ago

Cc: brosner@… added

by David Cramer <dcramer@…>, 17 years ago

Attachment: patch2.diff added

corrected an issue with pk_val not being set

comment:9 by David Cramer <dcramer@…>, 17 years ago

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

in reply to:  9 comment:10 by (removed), 17 years ago

re: patch2.diff, that's broke- note

self.__instance_cache__.setdefault(getattr(self, self._meta.pk.attname, self))

that setdefault's a None in place. Should be

self.__instance_cache__[getattr(self, self._meta.pk.attname)] = self

instead; Forces the cache to reuse the last saved version whenever the next request comes in.

Aside from that, patch has issues with json/xml serialization bits which basically just requires disabling instance caching for generation of the 'detached' instance that haven't yet been saved.

Replying to David Cramer <dcramer@gmail.com>:

There is still an issue with delete objects. We are implementing patch #1796 as I was told it may help resolve this issue (and we also needed to resolve the issue it addresses).

The current problem is it's trying to .delete() it twice (at least it seems) so when the second time comes around there is no id attribute set so it throws an AssertionError

Got a test case? I'm offline for the next 36 hours, but have got a local patch (will post when the system is online) passing all unittests at this point- haven't seen anything delete related in local fixing.

by (removed), 17 years ago

Attachment: instance-caching-2.patch added

instance-caching-v2

comment:11 by anonymous, 17 years ago

Cc: paching@… added

by (removed), 17 years ago

Attachment: instance-caching-3.patch added

instance caching v3; add appropriate cache cleaning to Model.delete, and tweak queries delete_objects to prune the cache while it's wiping pk instances. Only potential issue I'm aware of at this point is if there are multiple pk's for a record, which is screwy behaviour for django in general.

comment:12 by anonymous, 17 years ago

Cc: wbyoung@… added

comment:13 by anonymous, 17 years ago

Cc: jesse.lovelace@… added

comment:14 by Brian Rosner <brosner@…>, 17 years ago

I have updated this patch to trunk. I made some minor changes to some naming of variables and comments. The test suite errors with 4 errors related to data from the instance being a byte-string and not unicode. I searched through all the code and could not find why. This doesn't have anything to do with how the files are being saved or accessed? Perhaps it is just some setting on my side. Once I can grab some more time I will write up some more tests for this patch to ensure it works better.

by Brian Rosner <brosner@…>, 17 years ago

Attachment: 5737_instance_caching.diff added

updated patch to #5737 with some minor edits

comment:15 by Brian Rosner <brosner@…>, 17 years ago

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

in reply to:  15 comment:16 by (removed), 17 years ago

Patch needs improvement: unset

Replying to Brian Rosner <brosner@gmail.com>:

Oh and I wanted to mention that I removed some changes in areas that really don't pertain to what this ticket is trying to accomplish. While the changes were better they are not really related.

Only unrelated changes (patch wise) you removed were conversion of range in delete to xrange; those might as well stay (already tweaking that area of the file).

The tests changes that were dropped (forced conversion of a few test args up front) however *are* required- run the tests, it'll hork. They're needed- that's more a sign that the misc fields are screwed up since they're not forcing unicode conversion, but that's a seperate mess only exposed by tests.

Either way, a version of this is maintained in a bzr branch at http://pkgcore.org/~ferringb/bzr/instance-caching/ ; using it for curse these days, so I keep it reasonably within trunk (week lag is usually the most).

Related to that, please don't do misc var renames when correcting for bit rot- not trying to be possessive, moreso it's extra noise when trying to nail down exactly what was updated between patch versions, and changes the api the functionality provides when kwargs are mangled.

Finally, dropping the "patch needs improvement"; not really relevant at this point (my earlier comments have long since been fixed).

by (removed), 17 years ago

Attachment: instance-caching-4.patch added

split against 5737; preserving test updates (thus tests pass), and xrange updates (as said, modifying 4 lines away).

by (removed), 17 years ago

Attachment: instance-caching-5.patch added

add in a class attr able to disable instance caching for that model.

comment:17 by Simon G. <dev@…>, 17 years ago

Needs tests: unset
Triage Stage: AcceptedReady for checkin

This looks good, tests and everything, and would be a fantastic improvment, so I'm bumping this to RFC.

comment:18 by Jacob, 17 years ago

Needs documentation: set
Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

This is a very good idea, but isn't quite ready for prime-time. Some things that need fixing:

  • The class attribute feels hackish, and is badly named. I'd prefer something simpler like Model.objects.select_related(cache=False) -- any reason not to do this?
  • A note needs to be added to the docs about this behavior so that people don't get bitten by stale select related stuff.
  • There's a lot of dense code here without docstrings/comments; I can barely follow what's going on. Some more explanation of what's happening would be very nice with a feature of this complexity.


comment:19 by Philippe Raoult, 17 years ago

The class attribute allows very fine-grained control over caching: you can select which classes will be cached. But a cache=False argument to select_related would be a nice complement.

comment:20 by Philippe Raoult, 17 years ago

Owner: changed from nobody to Philippe Raoult

by Philippe Raoult, 17 years ago

Attachment: 17_no_tests.diff added

cleanup and comments. Not working but comments welcome

comment:21 by Philippe Raoult, 17 years ago

My patch addresses cleans up and documents the doc, and adds way more tests. It also removed unrelated changes that were present in the original patch. It's still subject to jacob's points 1 and 2, namely how to configure the cache and lack of user doc. Note that by default the cache is disabled so it's safe.

by Philippe Raoult, 17 years ago

Attachment: 17.diff added

added tests, and style issues

comment:22 by Philippe Raoult, 17 years ago

Keywords: feature caching added

comment:23 by Philippe Raoult, 17 years ago

Status: newassigned

comment:24 by James Bennett, 17 years ago

See also #3369, which was closed in favor of this ticket.

comment:25 by Bastian Kleineidam <calvin@…>, 17 years ago

Cc: calvin@… added

comment:26 by Philippe Raoult, 17 years ago

See #5514 for a use case (besides the performance/memory gains).

comment:27 by David Larlet, 17 years ago

Cc: larlet@… added

comment:28 by Michael Axiak, 17 years ago

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).
  2. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

in reply to:  28 ; comment:29 by anonymous, 17 years ago

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

  1. Cache cleanup (garbage collection) -- Maybe we can attach a cleanup of the local cache to the response signal. This should be thought through though.

The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

in reply to:  29 comment:30 by Michael Axiak, 17 years ago

Replying to anonymous:

Replying to axiak:

I was talking with PhiR in the IRC channel. I would just like to mention two issues:

  1. Thread safety -- for python > 2.3 we can wrap the storage in threading.local to avoid issues. Though maybe making it thread-aware could lead to other enhancements (and lots of trouble).

Each thread gets it's own view of the cache- trying to allow instance sharing across threads means that thread 1 (which is doing an edit that will ultimately fail) is screwing with the data thread 2 (which is doing a simple view rendering) uses, meaning bad mojo... and if someone suggests locking to try and duck that, I wedgie them, that's even worse mojo ;)

As I understand it the storage is part of the class object, which is global, no? So currently the patch does not behave like this and this will need to be addressed.

[...]
The cache is a WeakValueDictionary... meaning weak ref'ing of the instances. This means there is no garbage collection issues- if a cycle exists to hold it in memory, it would exist without this code anyways.

Ah, didn't see that.

comment:31 by Philippe Raoult, 17 years ago

If you are interested in this ticket you should check out:

http://code.djangoproject.com/wiki/DjangoSpecifications/Core/SingleInstance

If you notice something that this page doesn't mention please add it.

comment:32 by anonymous, 16 years ago

milestone: 1.0
Version: queryset-refactor

comment:33 by anonymous, 16 years ago

milestone: 1.0
Version: queryset-refactor

comment:34 by oliver@…, 16 years ago

Cc: oliver@… added

comment:35 by mrts, 16 years ago

milestone: post-1.0
Version: SVN

Not in scope for 1.0.

comment:36 by Matti Haavikko, 16 years ago

Cc: haavikko@… added

comment:37 by anonymous, 16 years ago

Cc: Mike Scott added

comment:38 by mattrussell, 16 years ago

Cc: matt@… added

comment:39 by (removed), 16 years ago

Cc: ferringb@… removed

comment:40 by RommeDeSerieux, 16 years ago

Cc: romme@… added

comment:42 by David Cramer, 16 years ago

Ive worked up a quick "SingletonModel" piece of code. It allows you to define a model, by inheriting from SingletonModel, as a Singleton. It works almost identically to this patch, barring the removal of some unneeded code.

http://github.com/dcramer/django-singletons/tree/master

comment:43 by David Cramer, 16 years ago

Just a note, django-singletons was renamed to django-idmapper: http://github.com/dcramer/django-idmapper/tree/master

My mistake for the incorrect terminology. It *only* provides an Identity Mapper, just as this patch does.

comment:44 by Greg Wogan-Browne, 16 years ago

Cc: django@… added

comment:45 by Trevor Caira, 16 years ago

Cc: Trevor Caira added

comment:46 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:47 by David Larlet, 16 years ago

Cc: David Larlet added; larlet@… removed

comment:48 by datavortex, 15 years ago

Cc: datavortex@… added

comment:49 by Michael Malone, 15 years ago

Cc: Michael Malone added

comment:50 by Alexander Koshelev, 15 years ago

Cc: Alexander Koshelev added

comment:51 by Guillermo Gutiérrez, 15 years ago

Cc: xiterrex@… added

comment:52 by miracle2k, 15 years ago

Cc: miracle2k added

comment:53 by Craig de Stigter, 14 years ago

Cc: Craig de Stigter added

comment:54 by Erik Allik, 14 years ago

Cc: eallik@… added

comment:55 by Dan Fairs, 14 years ago

Cc: dan.fairs@… added

comment:56 by Miloslav Pojman, 14 years ago

Cc: miloslav.pojman@… added

comment:57 by Łukasz Rekucki, 14 years ago

Severity: normalNormal
Type: defectCleanup/optimization

comment:58 by Jannis Leidel, 14 years ago

Type: Cleanup/optimizationNew feature

This is really a new feature.

comment:59 by Oliver Beattie, 14 years ago

Cc: oliver@… removed

comment:60 by Jacob, 13 years ago

Easy pickings: unset
Resolution: wontfix
Status: assignedclosed
UI/UX: unset

I've spoken to Russ, and he concurs with my decision that this would fundamentally change the ORM, and it can't be part of Django core.

However, it's been proven that this is possible (albeit tricky) in external packages -- JohhnyCache, for example. Concrete suggestions about how to make things like Johnny easier are super-welcome, but for clarity those should get fresh new tickets.

comment:61 by James Pic, 6 years ago

Do you remember the discussion you had with Russ about this ?

Is this good for re-opening ?

  • The ORM is much cleaner now than 7 years ago,
  • Many great ORMs have a builtin Identity Map, why can't Django's ?

Identity Map is a common programming pattern:

Last edited 6 years ago by James Pic (previous) (diff)

comment:62 by James Pic, 6 years ago

Cc: James Pic added

comment:64 by James Pic, 6 years ago

Thanks for the heads up Tim.

Last edited 5 years ago by James Pic (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top