Opened 4 years ago
Closed 3 years ago
#32043 closed Bug (needsinfo)
OuterRef with possible NULL generates value = NULL instead of value IS NULL in COUNT() group by.
Reported by: | Illia Petrov | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | OuterRef, ORM, SQL, Subquery |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Let's imagine we have two models(db - PostgreSQL):
class Category(models.Model): name = models.CharField(max_length=255) class Tag(models.Model): PRIORITY_CHOICES = ( (1, "1"), (2, "2"), (3, "3"), ) category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=True) priority = models.PositiveSmallIntegerField(default=PRIORITY_CHOICES[0][0], choices=PRIORITY_CHOICES)
Then we want to get all tags for category_id=3 or category_id=None with the number of tags with priority=3 grouped by categories(remember that we have null=True in the category field, so category_id could be empty):
from django.db.models import OuterRef, Subquery, Q, Count import pprint q = Q(category_id=3)|Q(category_id=None) queryset = Tag.objects.filter(q, priority=3) subquery = Tag.objects.filter( priority=OuterRef('priority'), category_id=OuterRef('category_id') ).values('priority').annotate(priority_count=Count('*')) result = queryset.annotate( priority_count=Subquery(subquery.values('priority_count'), output_field=IntegerField()) ).values('id', 'category_id', 'priority', 'priority_count') pprint.pprint(result)
Output result:
<QuerySet [{'id': 28, 'category_id': None, 'priority': 3, 'priority_count': None}, {'id': 39, 'category_id': 3, 'priority': 3, 'priority_count': 3}, {'id': 40, 'category_id': 3, 'priority': 3, 'priority_count': 3}, {'id': 41, 'category_id': 3, 'priority': 3, 'priority_count': 3}]>
Result SQL query:
SELECT "tag"."id", "tag"."category_id", "tag"."priority", (SELECT COUNT(*) AS "priority_count" FROM "tag" U0 WHERE (U0."priority" = ("tag"."priority") AND U0."category_id" = ("tag"."category_id")) GROUP BY U0."priority") AS "priority_count" FROM "tag" WHERE (("tag"."category_id" = 3 OR "tag"."category_id" IS NULL) AND "tag"."priority" = 3)
Problem:
In output we can see priority_count == None for category_id == None.
The reason is clear, since in subquery we have U0."category_id" = ("tag"."category_id"))
So for now we ended up using UNION for separate queries with category_id=None and category_id=3.
Question:
Is it possible to do it without a union using Django ORM?
Will it be fixed in future?
Change History (6)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
comment:2 by , 4 years ago
Description: | modified (diff) |
---|
comment:3 by , 4 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:4 by , 3 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Hi Simon, thanks for the idea, but it isn't working with Django 3.2.4, filtering by annotated 'outer_category_id' in the subquery raises an error
AttributeError: 'ResolvedOuterRef' object has no attribute 'get_lookup'
category_id is an annotated value in the inner query in my case.
Probably this is a bug?
comment:5 by , 3 years ago
Type: | Uncategorized → Bug |
---|---|
Version: | 2.0 → 3.2 |
comment:6 by , 3 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Denis, this is a separate issue, see #31714.
Hello there, from what I can see the generated SQL matches what your specifying through your queryset construction.
Could you provide an example of the SQL you're trying to generate, its expected result, and how you're achieving it using unions?
It's hard to tell without these details but I think you can achieve what you're after by annotating your inner queryset with
OuterRef('category_id')
and changing thecategory_id=OuterRef('category_id')
criteria accordinglyIf that's the case then I fear we cannot change the current behaviour without causing massive backward incompatibilities.
For example
filter(nullable_field=F('nullable_field'))
doesn't result inWHERE nullable_field = nullable_field OR (nullable_field IS NULL and nullable_field IS NULL)
so making this change only for resolvedOuterRef
would make the current behaviour even more inconsistent. The ORM does a relatively good job at abstracting the triple-boolean logic of SQL when provided literal values (e.g.None
handling inexclude()
) but it doesn't deal with nullable expressions for performance reasons (hard to determine if complex expressions are nullable andOR
clauses required forIS NULL
makes the job of the query planner harder).