#17001 closed New feature (fixed)
Allow usage of custom querysets in prefetch_related
Reported by: | Anssi Kääriäinen | Owned by: | loic84 |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | prefetch_related |
Cc: | marc.tamlyn@…, AndrewIngram, k.sikora@…, ethan.jucovy@…, as@…, mszamot@…, jonatan.lundin@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
#16973 introduced prefetch_related. I would like to extend it so that one can use custom querysets for the fetching. This also requires the ability to define the attribute name to use for the prefetched objects, as it is not valid to return filtered objects from related_manager.all(). There is also the possibility that one wants to fetch multiple objects of the same type with prefetch_related (young authors to one attribute, old authors to another).
The proposed API is following:
# The normal usage does not change SomeModel.objects.prefetch_related('rel_model', 'rel_model__another_rel') # You can pass R objects to prefetch_related # The syntax for R is # R(lookup_path, to_attr=None, qs=None) # Thus: SomeModel.objects.prefetch_related(R('rel_model'), R('rel_model__another_rel')) # is equivalent to the first call. # One can use to_attr alone: SomeModel.objects.prefetch_related(R('rel_model', to_attr='foo_lst'), ...) # custom queryset SomeModel.objects.prefetch_related( R('rel_model', to_attr='foo_lst', qs=RelModel.objects.order_by('foo')), ... ) # If qs is defined, to_attr must be defined also: SomeModel.objects.prefetch_related(R('rel_model', qs=RelModel.objects.order_by('-pk'))) -> Error: Thou shalt not use qs without to_attr # One can use the to_attr in nested calls: qs = SomeModel.objects.prefetch_related( R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'), R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst') ) # This will result in every object O in foo_lst to have attribute anot_lst containing # related objects to O, fetched from the above given qs. # Going still deeper would need to use syntax: qs = qs.prefetch_related( R('foo_lst__anot_lst__third_model', qs=ThirdModel.objects.order_by('-pk'), to_attr='bar_lst'), )
Attachments (2)
Change History (54)
comment:1 by , 13 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
comment:2 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
@akaariai:
Accepted on the basis that the feature in general would be nice.
However, I haven't looked at the code at all. I doubt that I will have time to look at this before 1.4 is released. Part of me would actually like to leave the current version of prefetch_related()
to get some testing in the wild before we build on top of it.
comment:3 by , 13 years ago
I added an update for django 1.4 (tests still need to be upgraded) ...however, the question I still have is, the added attribute (to_attr) is a list, and I suppose it should be/act as a QuerySet
instead, in my most humble opinion. Shouldn't it?
comment:4 by , 12 years ago
What is the reasoning behind this in the R object's init ?
if qs is not None and not to_attr: raise ValueError('When custom qs is defined, to_attr ' 'must also be defined')
In my use case, I really need the custom prefetch to go on the same attribute and this prevents me from doing so.
Does not having a 'to_attr' with queryset break something or it is just for safety ( preventing accidents )?
comment:5 by , 12 years ago
If I recall correctly it is there so you can't shadow an existing queryset. This patch will fetch the results as a list, not as a queryset when you use a custom qs. So, if you use custom qs to fetch authors for book, then book.authors.all() wouldn't work, the items would be in book.authors, and book.authors would be a list.
In general, we want this feature in Django, but the current API isn't something wanted in Django...
comment:6 by , 12 years ago
Cc: | added |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
I would love this feature!
by , 12 years ago
Attachment: | #17001-prefetch_extensions.diff added |
---|
Fixed issue with multiple level prefetch
comment:9 by , 12 years ago
Cc: | added |
---|
comment:10 by , 11 years ago
Would be great to have this! What is the status of this feature request currently?
comment:12 by , 11 years ago
The current status of this feature is fully and accurately reflected in the description above.
There's a patch, but it needs improvements, tests, and documentation.
Since it's a fairly large and complex feature, gaining the support of a core developer will be critical to complete the patch and have it committed.
comment:13 by , 11 years ago
Cc: | added |
---|
comment:14 by , 11 years ago
Cc: | added |
---|
comment:15 by , 11 years ago
The status of this ticket highlights everything that is wrong with how Django is being developed currently. 22 months! and still no movement on this!!!
comment:16 by , 11 years ago
The tone of your comment highlight everything that is wrong with the consumer approach to open-source :)
If this issue bothers you so much, why don't you put your energy into resolving it rather than angry comments?
Seriously — your comment makes me *less* likely to contribute to this issue; I don't feel like helping someone who shouts on volunteers.
comment:17 by , 11 years ago
Note that this ticket is totally dependant on finding a good API. The functionality was already implemented using R() objects, the problem is that the API wasn't usable enough for merge. So, find a nice API and it is very likely that this feature will be merged.
comment:18 by , 11 years ago
ML discussion on proposed alternative API: https://groups.google.com/forum/?fromgroups=#!topic/django-developers/BSyGOR7cMEs
follow-up: 20 comment:19 by , 11 years ago
Lets accept the proposed API. There weren't any objections, and at least two core developers (me + mjtamlyn) support this API. The proposed API feels more natural than the R() thing, and it should be powerful enough.
The accepted API is to add an overloaded form of .prefetch_related(). The current way works as always, that is .prefetch_related('somemodel', 'somemodel__someothermodel')
will continue to work. The new alternative is .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age'))
.
Chaining to prefetched custom queryset isn't allowed, that is in the above example you can't do .prefetch_related('authors_by_age__books')
. If you need books prefetched to authors_by_age, then doing .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books'))
should work.
The prefetch happens directly into a list. That is, obj.authors_by_age.filter(...) doesn't work. The obj.authors_by_age attribute is a list instance, containing the related objects. Not creating a QueyrSet for each obj is a big performance optimization (IIRC around 5x performance difference when fetching small amount of related objects in a large queryset). Fetching directly into a list should also make the implementation cleaner.
For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.
comment:20 by , 11 years ago
Replying to akaariai:
For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.
Agreed. If it's trickier than just getting this patch in needs to be we can always delay it for a future patch.
comment:21 by , 11 years ago
I am interested in this feature, so please don't take the following as a criticism, I am just trying to add a feedback from a Django user perspective.
I am trying to understand, what is wrong with the initial implementation using the R objects?
We already have “F” and “Q” objects for “advanced” queries, so why not “R” objects for some other “advanced” stuff?
The code looks less clear when you begin to mix object names and various parameters in prefetch_related()
, and .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books'))
seems really weird and inconsistent with the current possibility of prefetch_related('authors_by_age__books')
.
Just to take the original example for comparison:
qs = SomeModel.objects.prefetch_related( R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'), R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst') )
How does it translates in the new API since chaining is not allowed?
follow-up: 23 comment:22 by , 11 years ago
This follows some discussion on IRC with loic and akaariai...
Perhaps a more keywordy syntax?
To express that in the comment above:
qs = SomeModel.objects.prefetch_related( foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')), anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')), )
This also means you can avoid the repeated QS-copy cost for adding multiple prefetches.
And, as loic pointed out, you can get the "terse" syntax simply using:
from .... import Prefetch as R
comment:23 by , 11 years ago
Cc: | added |
---|
Replying to FunkyBob:
To express that in the comment above:
qs = SomeModel.objects.prefetch_related( foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')), anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')), )
This syntax seems very nice, I like how it is even more concise without to_attr
.
comment:24 by , 11 years ago
I also like the kwargs based syntax. Maybe we should change the API to that?
One thing to consider is that kwargs aren't ordered. So, the API must not assume ordering there.
comment:25 by , 11 years ago
I am more than happy to help with testing on this (or anything else that might need doing). There is no shortage of ideas on this; some(one/people) that can make a decision on this just need to do so....
Hate to point this out; can't imagine this (ongoing) discussion happening with a rails feature.... something would have been added already (it may well not be perfect or even right)...
comment:26 by , 11 years ago
I have discussed this issue with multiple people on #django-dev. It seems kwargs based API is favoured. Lets do that.
The syntax is:
qs = SomeModel.objects.prefetch_related( rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk')), )
This means that the objects fetched will have rel_pk_ordered each, containing objects through relation 'rel_model' using given queryset. It is not possible to chain prefetches to custom prefetches, instead one should do:
qs = SomeModel.objects.prefetch_related( rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk').prefetch_related(...)), )
that is, issue the prefetch in the given custom queryset.
comment:28 by , 11 years ago
Cc: | added |
---|
comment:29 by , 11 years ago
Cc: | removed |
---|
comment:30 by , 11 years ago
Cc: | added |
---|
comment:31 by , 11 years ago
Cc: | added |
---|
comment:32 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Version: | 1.3 → master |
comment:33 by , 11 years ago
WIP: https://github.com/loic/django/compare/prefetch_related.
It's rather experimental and still needs tons of work with tests and docs, but if anyone want to give it a try, that would help.
Prefetch
takes two parameters: lookup
and an optional queryset
.
Manager
backed lookups with*args
:
# Traditional string lookup: Restaurant.objects.prefetch_related('pizzas__toppings') # Equivalent with Prefetch: Restaurant.objects.prefetch_related(Prefetch('pizzas__toppings')) # Lookup with custom queryset: Restaurant.objects.prefetch_related( Prefetch('pizzas__toppings', queryset=Toppings.object.filter(veggie=True)))
list
backed lookups with**kwargs
:
# Traditional lookup stored in a list (significantly faster, see #20577 and link below) Restaurant.objects.prefetch_related(pizza_list=Prefetch('pizzas')) # With custom QuerySet. Pizza.objects.prefetch_related( toppings_list=Prefetch('toppings', queryset=Toppings.object.filter(veggie=True)) # Multiple level using nested prefetch_related: pizza_queryset = Pizza.objects.prefetch_related( Prefetch('toppings', queryset=Toppings.object.filter(veggie=True))) Restaurant.objects.prefetch_related( pizza_list=Prefetch('pizzas', queryset=pizza_queryset)) # Alternatively, multiple level using kwarg lookups (nested prefetch_related actually rewrite to that): Restaurant.objects.prefetch_related( pizza_list=Prefetch('pizzas'), pizza_list__toppings_list=Prefetch('pizza_list__toppings', queryset=Toppings.object.filter(veggie=True))
- Mixing both approaches:
Restaurant.objects.prefetch_related( 'pizza_list__toppings_list' 'pizza_list': Prefetch('pizza'), )
- Flattening is not supported, it doesn't seem trivial to implement, maybe for another ticket?
Restaurant.objects.prefetch_related( toppings_list=Prefetch('pizzas__toppings') )
Additionally, here is a quick and dirty benchmark that compares lookups backed by list
vs Manager
: https://gist.github.com/loic/7198193.
comment:34 by , 11 years ago
As @akaariai pointed out in #comment:24, kwargs
are not ordered; yet ordering matters a fair bit.
For example my example from the previous comment:
Restaurant.objects.prefetch_related( 'pizza_list__toppings_list', pizza_list=Prefetch('pizza'), )
This wouldn't work without some sorting in the implementation.
I then found out that *args
ordering is something we actually test for, even though it doesn't seem documented. This implicit ordering allows for @properties
to masquerade as managers.
Sorting by lookup like in my patch, would obviously mess that up.
We could:
- Do the sorting based on the lookup path and add an
after
argument to thePrefetch
object, so you could doprefetch_related(Prefetch('primary_house__occupants', after='houses__rooms'), 'houses__rooms')
. This would actually be backward incompatible for whomever already rely on the implicit ordering of*args
, since you need to opt-in for a customafter
.
- Require multiple calls to
prefetch_related()
when ordering matters:prefetch_related(pizza_list=Prefetch('pizza')).prefetch_related('pizza_list__toppings_list')
. IMO this adds yet another construct to an already complicated API.
- Go back to having
to_attr
as aPrefetch
argument rather than akwarg
, and finally document that ordering of*args
actually matters. This is backward compatible, and it also makes for a cleaner implementation. I was actually on the verge of adding it (undocumented), only for the purpose of cleaning up the current implementation.
For the record, I favor 3) by a big margin.
Please note that while prefetch_related('foo__bar__baz', 'foo__bar')
and prefetch_related('foo__bar', 'foo__bar__baz')
were equivalent, it's not necessarily true anymore now that we can do prefetch_related(Prefetch('foo__bar', queryset=...), 'foo__bar__baz')
, so the actual ordering becomes a more prominent concern than before.
comment:35 by , 11 years ago
In the light of @akaariai's comment concerning kwargs the only other alternative I could think of was to rely on a Prefetch
instance creation counter to sort kwargs
but as @loic84 pointed out in IRC it would be quite fragile.
I'd say I also favor the to_attr
approach since it looks like the most sane way to avoid chaining prefetch_related
to declare an ordering.
comment:36 by , 11 years ago
I'm happy to have to_attr
back as well, especially if it makes the code cleaner. This is a good smell.
comment:37 by , 11 years ago
Needs tests: | unset |
---|
to_attr
is back, cleaned up implementation and added a fair amount of tests. I'll tackle the docs ASAP.
Made a PR https://github.com/django/django/pull/1826 to ease with review.
comment:38 by , 11 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Updated the PR with docs, and even more tests.
This should be ready for deep review.
comment:39 by , 11 years ago
@akaariai wrote a great summary of the status of this patch on the django-developers mailing list: https://groups.google.com/d/msg/django-developers/LhXpbpSOHlg/n8xKrpFcvyMJ.
Feedback would definitely help to move this ticket forward.
comment:40 by , 11 years ago
Tackled the multidb issue described on the ML post and rebased to the latest master.
comment:41 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:42 by , 11 years ago
Hi guys,
How would i integrate this patch in Django 1.4.5 as i need this in my existing project running on production.
Thanks,
comment:43 by , 11 years ago
@abdulwahid24 it shouldn't be too hard to backport this patch manually, just note that the get_prefetch_queryset
methods used to be called get_prefetch_query_set
, other than that I don't think the code for prefetching has significantly changed since 1.4.
comment:44 by , 11 years ago
Hi loic84,
I was trying it since morning but i found it very hard to build backport manually, i tried to integrated all changes related to "get_prefetch_queryset" and i did changes as per your changes https://code.djangoproject.com/changeset/f51c1f590085556abca44fd2a49618162203b2ec in following files.
1.django/contrib/contenttypes/generic.py
2.django/db/models/init.py
3.django/db/models/query.py
4.django/db/models/fields/related.py
also add one more file as it require for LOOKUP_SEP
5.django/db/models/constants.py
It works fine with following query
Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker', queryset=Event.objects.all())).
But when i run
Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker',queryset=Event.objects.latest('start_time')))
gives error
AttributeError: 'Event' object has no attribute '_add_hints here 'Event' is my table name.
I stuck here and failed to resolve this.
Could you help me out to overcome this issue.
Note:
I just want to fire a query which return its related data in queryset only so that i can avoid for loop to access the related data as "[obj.relate_data for obj in objects]" to achieve optimization.
comment:45 by , 11 years ago
@abdulwahid QuerySet.latest()
doesn't return a QuerySet
instance, so you can't pass it as the queryset
kwarg of prefetch_related
, maybe you meant order_by('-start_time')
instead?
I'm afraid I can't help you much more than that here, trac tickets aren't meant as a support channel, you can find me on #django on IRC if you want.
Ok, I think I have a ready patch for this. There are some TODO comments still there (a part of them are personal TODOs, some are real TODOs). And this still needs documentation.
About the documentation, I don't know if we should mention that it is possible to chain prefetches in this manner:
The problem is that it is really, really, easy to do this instead:
Note that the lookup path in the second R-object isn't referring to foo_lst, but instead ref_model, so we will fetch the ref_models another time. I would even go so far as to explicitly prohibit the foo_lst
__
another_model usage, as it is confusing as hell.Chaining prefetches can be done by using custom querysets which have prefetches. The only problem is that if you get a queryset which already has prefetches, you can't chain to the existing ones. One idea is to have a 'name' and 'chain_to' kwargs for R objects, and then you could chain by the name.
My suggestion is that lets leave this for 1.5. I would explicitly prohibit the foo_lst
__
another_model way of chaining. It can be resurrected if need be. Then there is still the usual naming issues.