Opened 4 months ago

Closed 4 months ago

Last modified 4 months ago

#35587 closed New feature (wontfix)

Add QuerySet.partition(*args, **kwargs)

Reported by: Micah Cantor Owned by:
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

A common task with a Django model is to partition the model instances into two sets, ones that is selected by some filters, and ones that are not. Naively, the following utility script can accomplish this with QuerySet.filter() and QuerySet.exclude()

from django.db.models import QuerySet
from django.db.models.manager import BaseManager

def partition(self, *args, **kwargs):
    filtered = self.filter(*args, **kwargs)
    excluded = self.exclude(*args, **kwargs)
    return filtered, excluded

QuerySet.partition = partition
BaseManager.partition = partition

For instance, if we have a Book model, we can divide it into those that are fiction and nonfiction.

fiction, nonfiction = Book.objects.partition(genre="fiction")

Obtaining two separate QuerySets is often helpful if we want add further filters, ordering, or prefetches to one set but not the other.

Adding this method to Django would be a helpful utility, and could also be implemented more efficiently than my own naive implementation. It would be difficult for me to suggest a better implementation without a deeper understanding of the implementations of filter() and exclude().

Change History (1)

comment:1 by Simon Charette, 4 months ago

Resolution: wontfix
Status: newclosed

I don't think it's worth extending the Queryset API with a method that can be emulated through various means (with different semantics) and would entertain the idea that the returned set of objects will always be mutually exclusive. This is not a guarantee that the ORM can provide for a few reasons.

First the querysets are going to reach to the database serially and thus they won't be executed against the same snapshot so an object could be changed between queries execution a way that makes it appear in both partitions. Secondly, while the ORM goes at great length to make exclude the complement of filter it has a few know bugs which could also manifest themselves in these scenarios.

You are likely better off with a single query that uses an annotation as the Python-level predicate for partitioning

def partition(self, *args, **kwargs):
    queryset = self.annotate(_partition_predicate=Q(*args, **kwargs))
    predicate = attrgetter("_partition_predicate")
    return filter(predicate, queryset), filterfalse(predicate, queryset)

But that doesn't allow chaining which for the aforementioned reasons I believe is not achievable.

Last edited 4 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top