Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#35810 closed New feature (wontfix)

Provide `Select` class for `select_related` (like `Prefetch` exists for `prefetch_related`)

Reported by: Bart van Andel Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords: query optimization
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Use case:

Given a model Chat with many linked Messages:

class Chat(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    users = models.ManyToManyField(User)

    def last_message(self):
        return self.jobmatch_set.latest("created_at")

    def last_message_using_prefetched_set(self):
        return max(self.message_set.all(), key=lambda item: item.created_at, default=None)

    def last_message_using_prefetched_single_item_sets(self):
        assert hasattr(self, "_last_message_set"), "_last_message_set not populated":
        return self._last_message_set[0] if len(self._last_message_set) > 0 else None

class Message(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    chat = models.ForeignKey(Chat)

We'd like to load a list of chats with the latest message of each chat. This doesn't seem to be easily achievable at the moment. Example:

# Valid, but causes an additional trip for each chat:
chats = Chat.objects.all()
last_messages = [chat.last_message for chat in chats]

# Invalid:
chats = Chat.objects.select_related("last_message")

# Also invalid:
chats = Chat.objects.prefetch_related("last_message")

# Valid approach (2 queries), which may however load a huuuuge amount of data, so in general this should be avoided
# Also, there is no guarantee that the prefetched set is not filtered from the perspective of the last message getter property
chats = Chat.objects.prefetch_related("message_set")
last_messages = [chat.last_message_using_prefetched_set for chat in chats]

# Workaround (2 queries) using prefetch to get the single latest (if exist) message per chat:
chats = Chat.objects.prefetch_related(
    Prefetch(
        "chat_set",
        queryset=Chat.objects.order_by("-created_at")[:1],
        to_attr="_last_match_set",
    ),
)
last_messages = [chat.last_message_using_prefetched_single_item_sets for chat in chats]

I do appreciate that there are probably very good reasons why the invalid calls in the example above won't be able to work, unless maybe with some still-to-be-invented annotations.

Now, the Prefetch class was introduced quite recently to even make the above possible. It can only be used to prefetch _sets of items_ though, not _single values_, so using something like .latest("created_at") is out of the question.

Could something like that also be implemented to support this syntax? Basically the same effect as Prefetch, except with a single output (or None).

chats = Chat.objects.select_related(
    Select(
        "chat_set",
        queryset=Chat.objects.latest("created_at"),
        to_attr="last_match",
    ),
)

Note: I couldn't find a similar question in the existing ticket list; if I missed something, I do apologize.

Change History (2)

comment:1 by Bart van Andel, 3 months ago

Note: I tried to optimize our use case like this (pseudo code):

if self.message_set.is_loaded:
    return max(self.message_set.all(), key=lambda item: item.created_at, default=None)
    # or, more clearly, but also not possible:
    return self.message_set.latest("created_at", trust_cached_data=True)
else:
    return self.message_set.latest("created_at")

# or, even more clearly, just get the value directly, using preloaded data if it exists:
return self.message_set.latest("created_at", trust_cached_data=True)

There is no is_loaded or trust_cached_data (or equivalents) however, so this obviously won't work. Also, obviously this should be used with extreme care, but then again, when optimizing you sometimes need to be careful and creative anyway.

comment:2 by Simon Charette, 3 months ago

Resolution: wontfix
Status: newclosed

I'm struggling to find it now but I know the idea of allow custom classes to be provided to select_related was already discussed somewhere in the past years.

I think the question here is less about whether we'd want to support such pattern but how exactly it should be achieved.

For example, for reverse relationships it could be done using LEFT JOIN LATERAL on backends that support it

chats = Chat.objects.select_related(
    Select(
        "messages",
        queryset=Message.objects.order_by("created_at")[:1],
        to_attr="lastest_message",
    ),
)
SELECT
    chat.*,
    latest_message.*
FROM chat
LEFT JOIN LATERAL (
    SELECT *
    FROM message
    WHERE chat_id = chat.id
    ORDER BY created_at DESC
    LIMIT 1
) latest_message

I'm not sure what the best way to express this through the ORM should be though. It feels like it could also be something that is expressed through a Subquery that doesn't make use of values to limit to a single field

chats = Chat.objects.annotate(
    lastest_message=Message.objects.filter(
       chat_id=OuterRef("id"),
   ).order_by("created_at")[:1]
)

In all cases, just like any new large feature request, it should be discussed on the forum before hand to gather consensus.

Last edited 3 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top