Opened 22 months ago

Closed 22 months ago

Last modified 17 months ago

#34413 closed New feature (duplicate)

Variant of Prefetch but for the earliest/latest related object

Reported by: Willem Van Onsem Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Willem Van Onsem)

A frequently asked feature that seems to be missing is fetch the earlest/latest related object for each item. Indeed, we can for example work with a subquery to fetch *the primary key* of the earliest/latest related object, but not that object itself.

It turns out however that automating this is not that complicated. What we need to do is (automatically) construct a reverse filter, so if we want the latest Comment of each Post, we make a queryset that makes a Comment.objects.filter(post_id=OuterRef('pk')), then we convert that into a subquery that will, for each Post fetch the primary key of the latest comment, and slightly alter the logic that is already used to for prefetching to then prefetch all the Comments for these primary keys, and add attributes to the corresponding Posts.

I made a small proof-of-concept that should normally work for (most) cases: it allows one to use an arbitrary queryset and specify an ordering (or use the ordering already in the queryset or by the model). It can also span over multiple layers. It requires to *always* specify a to_attr, since, especially when spanning over multiple relations, the default to_attr would require double underscores.

The most ugly part is that we work with an annotation that is then accessible by the user, so it is not somehow hidden. Technically we could remove the attribute, or do something extra in the ORM to prevent exposing this attribute.

I did not yet check what querysets will be problematic. For example a sliced queryset would (very likely) not work, and likely most/all restrictions already in place for a the queryset of a Prefetch object are still applicable.

A simple demonstration of the PrefetchEarliest could be:

User.objects.prefetch_related(PrefetchEarliest('groups', to_attr='first_group', Group.objects.order_by('name'))  		

Attachments (1)

prefetching_the_earliest_latest_related_object_.patch (8.1 KB ) - added by Willem Van Onsem 22 months ago.

Download all attachments as: .zip

Change History (11)

comment:1 by Willem Van Onsem, 22 months ago

Needs tests: set

comment:2 by Willem Van Onsem, 22 months ago

Description: modified (diff)

comment:3 by Simon Charette, 22 months ago

Resolution: duplicate
Status: assignedclosed

Pretty sure this is a duplicate of #26780 fixed in the soon to be released 4.2.

Since 242499f2dc2bf24a9a5c855690a2e13d3303581a Prefetch(queryset) supports slicing through the use of filtering over partitioned rank (RANK() OVER (PARTITION BY <join_field> ORDER BY <order_by>) so earliest and latest related objects (or any slice of related objects for that matter) can be prefetched.

User.objects.prefetch_related(
    Prefetch("groups", queryset=Group.objects.order_by('name')[:1], to_attr="first_group")
)

comment:4 by Willem Van Onsem, 22 months ago

Description: modified (diff)

in reply to:  3 comment:5 by Willem Van Onsem, 22 months ago

Replying to Simon Charette

What would happen if we span over multiple relations, like:

User.objects.prefetch_related(
    Prefetch("groups__permissions", queryset=Permission.objects.order_by('codename')[:1], to_attr="first_permission")
)

comment:6 by Willem Van Onsem, 22 months ago

Description: modified (diff)

comment:7 by Simon Charette, 22 months ago

Nothing special really, there's one query per depth of prefetching and Prefetch.queryset is always for the trailing relation of the lookup so something along the lines of

SELECT auth_group.*
FROM auth_group
JOIN user_groups ON (user_groups.group_id = auth_group.id)
WHERE user_groups.user_id IN ($user_ids)

then

SELECT * FROM (
    SELECT
        auth_permission.*,
        RANK() OVER (
            PARTITION BY auth_group_permissions.group_id
            ORDER BY auth_permission.name
        ) _prefetch_rank
    FROM auth_permission
    JOIN auth_group_permissions ON (auth_permission.id = auth_group_permissions.permission_id)
    WHERE auth_group_permissions.group_id IN ($groups_ids)
) qualify
WHERE _prefetch_rank <= 1

You can confirm it's working as expected by checking out the latest 4.2 beta and reporting any issues you might encounter.

Last edited 22 months ago by Simon Charette (previous) (diff)

in reply to:  7 comment:8 by Willem Van Onsem, 22 months ago

Replying to Simon Charette
Well that was what I thought, the semantics differ: the Prefetch will in this case fetch the earliest Permission on each Group prefetched for that user, whereas the PrefetchEarliest would annotate the earliest Permission to each User. So it limits "bandwidth", and annotates at the same layer. The two therefore don't seem duplicates if I understand correctly?

comment:9 by Simon Charette, 22 months ago

Not sure I understand the rationale here, what does it even mean to annotate the earliest Permission to each User since User relates to Permission through groups and that's how prefetch_related operates.

Why not use a Subquery annotation instead of a prefetch for this particular case?

User.objects.annotate(
    first_permission_id=Permission.objects.filter(
        groups__users=OuterRef('pk')
    ).order_by('name').values('id')[:1]
)
SELECT
    user.*,
    (
        SELECT permission.id
        FROM permission
        JOIN group_permissions ON (group_permissions.permission_id = permission.id)
        JOIN user_groups ON (user_groups.group_id = group_permissions.group_id)
        WHERE user_groups.user_id = user.id
        ORDER BY permission.name
        LIMIT 1
    ) first_permission
FROM user

Isn't this report more about the ability of subquery to return more than one column so the following is possible

SELECT
    user.*,
    first_permission.*
FROM user,
    (
        SELECT permission.*
        FROM permission
        JOIN group_permissions ON (group_permissions.permission_id = permission.id)
        JOIN user_groups ON (user_groups.group_id = group_permissions.group_id)
        WHERE user_groups.user_id = user.id
        ORDER BY permission.name
        LIMIT 1
    ) first_permission

and less about fetching earliest and latest slices of nested relationships through prefetch_related?

Last edited 22 months ago by Simon Charette (previous) (diff)

comment:10 by Gordon Wrigley, 17 months ago

I would like to do singular prefetches, which seems kinda like what this is asking for.

With regard to:

    User.objects.prefetch_related(
        Prefetch("groups", queryset=Group.objects.order_by('name')[:1], to_attr="first_group")
    )

The downside of this is "first_group" is actually a 1 element list. I was imagining maybe it could be just the item when there is a single item slice, or failing that a "flat=True" option on Prefetch.

Note: See TracTickets for help on using tickets.
Back to Top