Opened 13 years ago

Closed 11 years ago

Last modified 9 years ago

#17001 closed New feature (fixed)

Allow usage of custom querysets in prefetch_related

Reported by: Anssi Kääriäinen Owned by: loic84
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: prefetch_related
Cc: marc.tamlyn@…, AndrewIngram, k.sikora@…, ethan.jucovy@…, as@…, mszamot@…, jonatan.lundin@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

#16973 introduced prefetch_related. I would like to extend it so that one can use custom querysets for the fetching. This also requires the ability to define the attribute name to use for the prefetched objects, as it is not valid to return filtered objects from related_manager.all(). There is also the possibility that one wants to fetch multiple objects of the same type with prefetch_related (young authors to one attribute, old authors to another).

The proposed API is following:

# The normal usage does not change
SomeModel.objects.prefetch_related('rel_model', 'rel_model__another_rel')
# You can pass R objects to prefetch_related
# The syntax for R is
# R(lookup_path, to_attr=None, qs=None)
# Thus:
SomeModel.objects.prefetch_related(R('rel_model'), R('rel_model__another_rel'))
# is equivalent to the first call.
# One can use to_attr alone:
SomeModel.objects.prefetch_related(R('rel_model', to_attr='foo_lst'), ...)
# custom queryset
SomeModel.objects.prefetch_related(
    R('rel_model', to_attr='foo_lst', qs=RelModel.objects.order_by('foo')),
    ...
)
# If qs is defined, to_attr must be defined also:
SomeModel.objects.prefetch_related(R('rel_model', qs=RelModel.objects.order_by('-pk')))
-> Error: Thou shalt not use qs without to_attr
# One can use the to_attr in nested calls:
qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)
# This will result in every object O in foo_lst to have attribute anot_lst containing
# related objects to O, fetched from the above given qs.
# Going still deeper would need to use syntax:
qs = qs.prefetch_related(
   R('foo_lst__anot_lst__third_model', qs=ThirdModel.objects.order_by('-pk'), to_attr='bar_lst'),
)

Attachments (2)

prefetch_extensions.diff (35.3 KB ) - added by Anssi Kääriäinen 13 years ago.
Now with settings.DEBUG removed
#17001-prefetch_extensions.diff (25.5 KB ) - added by German M. Bravo 11 years ago.
Fixed issue with multiple level prefetch

Download all attachments as: .zip

Change History (54)

comment:1 by Anssi Kääriäinen, 13 years ago

Has patch: set
Needs documentation: set

Ok, I think I have a ready patch for this. There are some TODO comments still there (a part of them are personal TODOs, some are real TODOs). And this still needs documentation.

About the documentation, I don't know if we should mention that it is possible to chain prefetches in this manner:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

The problem is that it is really, really, easy to do this instead:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('ref_model__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

Note that the lookup path in the second R-object isn't referring to foo_lst, but instead ref_model, so we will fetch the ref_models another time. I would even go so far as to explicitly prohibit the foo_lst__another_model usage, as it is confusing as hell.

Chaining prefetches can be done by using custom querysets which have prefetches. The only problem is that if you get a queryset which already has prefetches, you can't chain to the existing ones. One idea is to have a 'name' and 'chain_to' kwargs for R objects, and then you could chain by the name.

My suggestion is that lets leave this for 1.5. I would explicitly prohibit the foo_lst__another_model way of chaining. It can be resurrected if need be. Then there is still the usual naming issues.

by Anssi Kääriäinen, 13 years ago

Attachment: prefetch_extensions.diff added

Now with settings.DEBUG removed

comment:2 by Luke Plant, 13 years ago

Triage Stage: UnreviewedAccepted

@akaariai:

Accepted on the basis that the feature in general would be nice.

However, I haven't looked at the code at all. I doubt that I will have time to look at this before 1.4 is released. Part of me would actually like to leave the current version of prefetch_related() to get some testing in the wild before we build on top of it.

comment:3 by German M. Bravo, 12 years ago

I added an update for django 1.4 (tests still need to be upgraded) ...however, the question I still have is, the added attribute (to_attr) is a list, and I suppose it should be/act as a QuerySet instead, in my most humble opinion. Shouldn't it?

Last edited 12 years ago by German M. Bravo (previous) (diff)

comment:4 by iu_long, 12 years ago

What is the reasoning behind this in the R object's init ?

if qs is not None and not to_attr:
      raise ValueError('When custom qs is defined, to_attr '
              'must also be defined')

In my use case, I really need the custom prefetch to go on the same attribute and this prevents me from doing so.
Does not having a 'to_attr' with queryset break something or it is just for safety ( preventing accidents )?

comment:5 by Anssi Kääriäinen, 12 years ago

If I recall correctly it is there so you can't shadow an existing queryset. This patch will fetch the results as a list, not as a queryset when you use a custom qs. So, if you use custom qs to fetch authors for book, then book.authors.all() wouldn't work, the items would be in book.authors, and book.authors would be a list.

In general, we want this feature in Django, but the current API isn't something wanted in Django...

comment:6 by Marc Tamlyn, 12 years ago

Cc: marc.tamlyn@… added
Needs tests: set
Patch needs improvement: set

I would love this feature!

comment:7 by anonymous, 12 years ago

Pretty neat stuff!

comment:8 by ebertti@…, 12 years ago

Great feature!!! I will love this!!!

by German M. Bravo, 11 years ago

Fixed issue with multiple level prefetch

comment:9 by AndrewIngram, 11 years ago

Cc: AndrewIngram added

comment:10 by naktinis@…, 11 years ago

Would be great to have this! What is the status of this feature request currently?

comment:11 by anonymous, 11 years ago

It would be a life saver for me! What is the status of this feature?

comment:12 by Aymeric Augustin, 11 years ago

The current status of this feature is fully and accurately reflected in the description above.

There's a patch, but it needs improvements, tests, and documentation.

Since it's a fairly large and complex feature, gaining the support of a core developer will be critical to complete the patch and have it committed.

comment:13 by Karol Sikora, 11 years ago

Cc: k.sikora@… added

comment:14 by ethan.jucovy@…, 11 years ago

Cc: ethan.jucovy@… added

comment:15 by chokosabe@…, 11 years ago

The status of this ticket highlights everything that is wrong with how Django is being developed currently. 22 months! and still no movement on this!!!

comment:16 by Aymeric Augustin, 11 years ago

The tone of your comment highlight everything that is wrong with the consumer approach to open-source :)

If this issue bothers you so much, why don't you put your energy into resolving it rather than angry comments?

Seriously — your comment makes me *less* likely to contribute to this issue; I don't feel like helping someone who shouts on volunteers.

comment:17 by Anssi Kääriäinen, 11 years ago

Note that this ticket is totally dependant on finding a good API. The functionality was already implemented using R() objects, the problem is that the API wasn't usable enough for merge. So, find a nice API and it is very likely that this feature will be merged.

comment:19 by Anssi Kääriäinen, 11 years ago

Lets accept the proposed API. There weren't any objections, and at least two core developers (me + mjtamlyn) support this API. The proposed API feels more natural than the R() thing, and it should be powerful enough.

The accepted API is to add an overloaded form of .prefetch_related(). The current way works as always, that is .prefetch_related('somemodel', 'somemodel__someothermodel') will continue to work. The new alternative is .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age')).

Chaining to prefetched custom queryset isn't allowed, that is in the above example you can't do .prefetch_related('authors_by_age__books'). If you need books prefetched to authors_by_age, then doing .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books')) should work.

The prefetch happens directly into a list. That is, obj.authors_by_age.filter(...) doesn't work. The obj.authors_by_age attribute is a list instance, containing the related objects. Not creating a QueyrSet for each obj is a big performance optimization (IIRC around 5x performance difference when fetching small amount of related objects in a large queryset). Fetching directly into a list should also make the implementation cleaner.

For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.

in reply to:  19 comment:20 by Marc Tamlyn, 11 years ago

Replying to akaariai:

For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.

Agreed. If it's trickier than just getting this patch in needs to be we can always delay it for a future patch.

comment:21 by alexpirine, 11 years ago

I am interested in this feature, so please don't take the following as a criticism, I am just trying to add a feedback from a Django user perspective.

I am trying to understand, what is wrong with the initial implementation using the R objects?

We already have “F” and “Q” objects for “advanced” queries, so why not “R” objects for some other “advanced” stuff?

The code looks less clear when you begin to mix object names and various parameters in prefetch_related(), and .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books')) seems really weird and inconsistent with the current possibility of prefetch_related('authors_by_age__books').

Just to take the original example for comparison:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

How does it translates in the new API since chaining is not allowed?

comment:22 by FunkyBob, 11 years ago

This follows some discussion on IRC with loic and akaariai...

Perhaps a more keywordy syntax?

To express that in the comment above:

qs = SomeModel.objects.prefetch_related(
    foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
    anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')),
)

This also means you can avoid the repeated QS-copy cost for adding multiple prefetches.

And, as loic pointed out, you can get the "terse" syntax simply using:

from .... import Prefetch as R

in reply to:  22 comment:23 by alexpirine, 11 years ago

Cc: as@… added

Replying to FunkyBob:

To express that in the comment above:

qs = SomeModel.objects.prefetch_related(
    foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
    anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')),
)

This syntax seems very nice, I like how it is even more concise without to_attr.

comment:24 by Anssi Kääriäinen, 11 years ago

I also like the kwargs based syntax. Maybe we should change the API to that?

One thing to consider is that kwargs aren't ordered. So, the API must not assume ordering there.

comment:25 by chokosabe, 11 years ago

I am more than happy to help with testing on this (or anything else that might need doing). There is no shortage of ideas on this; some(one/people) that can make a decision on this just need to do so....

Hate to point this out; can't imagine this (ongoing) discussion happening with a rails feature.... something would have been added already (it may well not be perfect or even right)...

comment:26 by Anssi Kääriäinen, 11 years ago

I have discussed this issue with multiple people on #django-dev. It seems kwargs based API is favoured. Lets do that.

The syntax is:

qs = SomeModel.objects.prefetch_related(
    rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
)

This means that the objects fetched will have rel_pk_ordered each, containing objects through relation 'rel_model' using given queryset. It is not possible to chain prefetches to custom prefetches, instead one should do:

qs = SomeModel.objects.prefetch_related(
    rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk').prefetch_related(...)),
)

that is, issue the prefetch in the given custom queryset.

comment:27 by Marc Tamlyn, 11 years ago

+1 to that syntax, seems a good middle ground.

comment:28 by Ryan Hiebert, 11 years ago

Cc: Ryan Hiebert added

comment:29 by Ryan Hiebert, 11 years ago

Cc: Ryan Hiebert removed

comment:30 by mszamot@…, 11 years ago

Cc: mszamot@… added

comment:31 by jonatan.lundin@…, 11 years ago

Cc: jonatan.lundin@… added

comment:32 by loic84, 11 years ago

Owner: changed from nobody to loic84
Status: newassigned
Version: 1.3master

comment:33 by loic84, 11 years ago

WIP: https://github.com/loic/django/compare/prefetch_related.

It's rather experimental and still needs tons of work with tests and docs, but if anyone want to give it a try, that would help.

Prefetch takes two parameters: lookup and an optional queryset.

  1. Manager backed lookups with *args:
    # Traditional string lookup:
    Restaurant.objects.prefetch_related('pizzas__toppings')

    # Equivalent with Prefetch:
    Restaurant.objects.prefetch_related(Prefetch('pizzas__toppings'))

    # Lookup with custom queryset:
    Restaurant.objects.prefetch_related(
        Prefetch('pizzas__toppings', queryset=Toppings.object.filter(veggie=True)))
  1. list backed lookups with **kwargs:
    # Traditional lookup stored in a list (significantly faster, see #20577 and link below)
    Restaurant.objects.prefetch_related(pizza_list=Prefetch('pizzas'))

    # With custom QuerySet.
    Pizza.objects.prefetch_related(
        toppings_list=Prefetch('toppings', queryset=Toppings.object.filter(veggie=True))

    # Multiple level using nested prefetch_related:
    pizza_queryset = Pizza.objects.prefetch_related(
        Prefetch('toppings', queryset=Toppings.object.filter(veggie=True)))
    Restaurant.objects.prefetch_related(
        pizza_list=Prefetch('pizzas', queryset=pizza_queryset))

    # Alternatively, multiple level using kwarg lookups (nested prefetch_related actually rewrite to that):
    Restaurant.objects.prefetch_related(
        pizza_list=Prefetch('pizzas'),
        pizza_list__toppings_list=Prefetch('pizza_list__toppings', queryset=Toppings.object.filter(veggie=True))
  1. Mixing both approaches:
    Restaurant.objects.prefetch_related(
        'pizza_list__toppings_list',
         pizza_list=Prefetch('pizza'), 
    )

  1. Flattening is not supported, it doesn't seem trivial to implement, maybe for another ticket?
Restaurant.objects.prefetch_related(
    toppings_list=Prefetch('pizzas__toppings')
)

Additionally, here is a quick and dirty benchmark that compares lookups backed by list vs Manager: https://gist.github.com/loic/7198193.

Last edited 11 years ago by loic84 (previous) (diff)

comment:34 by loic84, 11 years ago

As @akaariai pointed out in #comment:24, kwargs are not ordered; yet ordering matters a fair bit.

For example my example from the previous comment:

    Restaurant.objects.prefetch_related(
        'pizza_list__toppings_list',
         pizza_list=Prefetch('pizza'), 
    )

This wouldn't work without some sorting in the implementation.

I then found out that *args ordering is something we actually test for, even though it doesn't seem documented. This implicit ordering allows for @properties to masquerade as managers.

Sorting by lookup like in my patch, would obviously mess that up.

We could:

  1. Do the sorting based on the lookup path and add an after argument to the Prefetch object, so you could do prefetch_related(Prefetch('primary_house__occupants', after='houses__rooms'), 'houses__rooms'). This would actually be backward incompatible for whomever already rely on the implicit ordering of *args, since you need to opt-in for a custom after.
  1. Require multiple calls to prefetch_related() when ordering matters: prefetch_related(pizza_list=Prefetch('pizza')).prefetch_related('pizza_list__toppings_list'). IMO this adds yet another construct to an already complicated API.
  1. Go back to having to_attr as a Prefetch argument rather than a kwarg, and finally document that ordering of *args actually matters. This is backward compatible, and it also makes for a cleaner implementation. I was actually on the verge of adding it (undocumented), only for the purpose of cleaning up the current implementation.

For the record, I favor 3) by a big margin.

Please note that while prefetch_related('foo__bar__baz', 'foo__bar') and prefetch_related('foo__bar', 'foo__bar__baz') were equivalent, it's not necessarily true anymore now that we can do prefetch_related(Prefetch('foo__bar', queryset=...), 'foo__bar__baz'), so the actual ordering becomes a more prominent concern than before.

comment:35 by Simon Charette, 11 years ago

In the light of @akaariai's comment concerning kwargs the only other alternative I could think of was to rely on a Prefetch instance creation counter to sort kwargs but as @loic84 pointed out in IRC it would be quite fragile.

I'd say I also favor the to_attr approach since it looks like the most sane way to avoid chaining prefetch_related to declare an ordering.

comment:36 by Marc Tamlyn, 11 years ago

I'm happy to have to_attr back as well, especially if it makes the code cleaner. This is a good smell.

comment:37 by loic84, 11 years ago

Needs tests: unset

to_attr is back, cleaned up implementation and added a fair amount of tests. I'll tackle the docs ASAP.

Made a PR https://github.com/django/django/pull/1826 to ease with review.

comment:38 by loic84, 11 years ago

Needs documentation: unset
Patch needs improvement: unset

Updated the PR with docs, and even more tests.

This should be ready for deep review.

comment:39 by loic84, 11 years ago

@akaariai wrote a great summary of the status of this patch on the django-developers mailing list: https://groups.google.com/d/msg/django-developers/LhXpbpSOHlg/n8xKrpFcvyMJ.

Feedback would definitely help to move this ticket forward.

comment:40 by loic84, 11 years ago

Tackled the multidb issue described on the ML post and rebased to the latest master.

comment:41 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In f51c1f590085556abca44fd2a49618162203b2ec:

Fixed #17001 -- Custom querysets for prefetch_related.

This patch introduces the Prefetch object which allows customizing prefetch
operations.

This enables things like filtering prefetched relations, calling select_related
from a prefetched relation, or prefetching the same relation multiple times
with different querysets.

When a Prefetch instance specifies a to_attr argument, the result is stored
in a list rather than a QuerySet. This has the fortunate consequence of being
significantly faster. The preformance improvement is due to the fact that we
save the costly creation of a QuerySet instance.

Thanks @akaariai for the original patch and @bmispelon and @timgraham
for the reviews.

comment:42 by abdulwahid24@…, 11 years ago

Hi guys,

How would i integrate this patch in Django 1.4.5 as i need this in my existing project running on production.

Thanks,

comment:43 by loic84, 11 years ago

@abdulwahid24 it shouldn't be too hard to backport this patch manually, just note that the get_prefetch_queryset methods used to be called get_prefetch_query_set, other than that I don't think the code for prefetching has significantly changed since 1.4.

comment:44 by abdulwahid, 11 years ago

Hi loic84,

I was trying it since morning but i found it very hard to build backport manually, i tried to integrated all changes related to "get_prefetch_queryset" and i did changes as per your changes https://code.djangoproject.com/changeset/f51c1f590085556abca44fd2a49618162203b2ec in following files.
1.django/contrib/contenttypes/generic.py
2.django/db/models/init.py
3.django/db/models/query.py
4.django/db/models/fields/related.py

also add one more file as it require for LOOKUP_SEP
5.django/db/models/constants.py

It works fine with following query

Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker', queryset=Event.objects.all())).

But when i run

Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker',queryset=Event.objects.latest('start_time')))

gives error

AttributeError: 'Event' object has no attribute '_add_hints here 'Event' is my table name.

I stuck here and failed to resolve this.

Could you help me out to overcome this issue.

Note:

I just want to fire a query which return its related data in queryset only so that i can avoid for loop to access the related data as "[obj.relate_data for obj in objects]" to achieve optimization.

comment:45 by loic84, 11 years ago

@abdulwahid QuerySet.latest() doesn't return a QuerySet instance, so you can't pass it as the queryset kwarg of prefetch_related, maybe you meant order_by('-start_time') instead?

I'm afraid I can't help you much more than that here, trac tickets aren't meant as a support channel, you can find me on #django on IRC if you want.

comment:46 by abdulwahid, 11 years ago

Thanks for you help.

comment:47 by Anssi Kääriäinen <akaariai@…>, 11 years ago

In 7bbb6958dcb6a450f926e5bb49b07391f801aef1:

Allowed custom querysets when prefetching single valued relations

The original patch for custom prefetches didn't allow usage of custom
queryset for single valued relations (along ForeignKey or OneToOneKey).
Allowing these enables calling performance oriented queryset methods like
select_related or defer/only.

Thanks @akaariai and @timgraham for the reviews. Refs #17001.

comment:48 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In bdf3473e64879d196e79fe8832cdacc6efb68150:

Fixed #22650 -- Fixed regression on prefetch_related.

Regression from f51c1f59 when using select_related then prefetch_related
on the reverse side of an O2O:

Author.objects.select_related('bio').prefetch_related('biobooks')

Thanks Aymeric Augustin for the report and tests. Refs #17001.

comment:49 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In 870b0a1f8689aa8fd4806623262228b8ac3a88a5:

Fixed the ordering of prefetch lookups so that latter lookups can refer to former lookups.

Thanks Anssi Kääriäinen and Tim Graham for the reviews. Refs #17001 and #22650.

comment:50 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In 24a41ecc3590f8d92d085d4a4caa5f8bf962ceae:

[1.7.x] Fixed #22650 -- Fixed regression on prefetch_related.

Regression from f51c1f59 when using select_related then prefetch_related
on the reverse side of an O2O:

Author.objects.select_related('bio').prefetch_related('biobooks')

Thanks Aymeric Augustin for the report and tests. Refs #17001.

Backport of bdf3473e64 from master

comment:51 by Loic Bistuer <loic.bistuer@…>, 10 years ago

In 0fa1aeb8d8142bd6bceb0c9a36c8e42eeca21770:

[1.7.x] Fixed the ordering of prefetch lookups so that latter lookups can refer to former lookups.

Thanks Anssi Kääriäinen and Tim Graham for the reviews. Refs #17001 and #22650.

Backport of 870b0a1f86 from master

comment:52 by Simon Charette <charette.s@…>, 9 years ago

In 5d240b0:

Refs #17001 -- Added a test for custom prefetch related queryset on generic relations.

Note: See TracTickets for help on using tickets.
Back to Top