Opened 4 years ago

Closed 3 years ago

#32043 closed Bug (needsinfo)

OuterRef with possible NULL generates value = NULL instead of value IS NULL in COUNT() group by.

Reported by: Illia Petrov Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: OuterRef, ORM, SQL, Subquery
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Illia Petrov)

Let's imagine we have two models(db - PostgreSQL):

class Category(models.Model):
     name = models.CharField(max_length=255)

class Tag(models.Model):
    PRIORITY_CHOICES = (
       (1, "1"),
       (2, "2"),
       (3, "3"),
    )
    category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True, blank=True)
    priority = models.PositiveSmallIntegerField(default=PRIORITY_CHOICES[0][0], choices=PRIORITY_CHOICES)

Then we want to get all tags for category_id=3 or category_id=None with the number of tags with priority=3 grouped by categories(remember that we have null=True in the category field, so category_id could be empty):

from django.db.models import OuterRef, Subquery, Q, Count
import pprint

q = Q(category_id=3)|Q(category_id=None)
queryset = Tag.objects.filter(q, priority=3)
subquery = Tag.objects.filter(
     priority=OuterRef('priority'), 
     category_id=OuterRef('category_id')
).values('priority').annotate(priority_count=Count('*'))

result = queryset.annotate(
     priority_count=Subquery(subquery.values('priority_count'), output_field=IntegerField())
).values('id', 'category_id', 'priority', 'priority_count')
pprint.pprint(result)

Output result:

<QuerySet [{'id': 28, 'category_id': None, 'priority': 3, 'priority_count': None}, {'id': 39, 'category_id': 3, 'priority': 3, 'priority_count': 3}, {'id': 40, 'category_id': 3, 'priority': 3, 'priority_count': 3}, {'id': 41, 'category_id': 3, 'priority': 3, 'priority_count': 3}]>

Result SQL query:

SELECT "tag"."id",
       "tag"."category_id",
       "tag"."priority",
       (SELECT COUNT(*) AS "priority_count"
        FROM "tag" U0
        WHERE (U0."priority" = ("tag"."priority") AND
               U0."category_id" = ("tag"."category_id"))
        GROUP BY U0."priority") AS "priority_count"
FROM "tag"
WHERE (("tag"."category_id" = 3 OR "tag"."category_id" IS NULL) AND
        "tag"."priority" = 3)

Problem:
In output we can see priority_count == None for category_id == None.
The reason is clear, since in subquery we have U0."category_id" = ("tag"."category_id"))
So for now we ended up using UNION for separate queries with category_id=None and category_id=3.

Question:
Is it possible to do it without a union using Django ORM?
Will it be fixed in future?

Change History (6)

comment:1 by Illia Petrov, 4 years ago

Description: modified (diff)

comment:2 by Illia Petrov, 4 years ago

Description: modified (diff)

comment:3 by Simon Charette, 4 years ago

Resolution: needsinfo
Status: newclosed

Hello there, from what I can see the generated SQL matches what your specifying through your queryset construction.

Could you provide an example of the SQL you're trying to generate, its expected result, and how you're achieving it using unions?

It's hard to tell without these details but I think you can achieve what you're after by annotating your inner queryset with OuterRef('category_id') and changing the category_id=OuterRef('category_id') criteria accordingly

subquery = Tag.objects.annotate(
    outer_category_id=OuterRef('category_id'),
).filter(
     Q(category=F('outer_category_id')) | Q(category=None, outer_category_id=None),
     priority=OuterRef('priority'), 
).values('priority').annotate(priority_count=Count('*'))

If that's the case then I fear we cannot change the current behaviour without causing massive backward incompatibilities.

For example filter(nullable_field=F('nullable_field')) doesn't result in WHERE nullable_field = nullable_field OR (nullable_field IS NULL and nullable_field IS NULL) so making this change only for resolved OuterRef would make the current behaviour even more inconsistent. The ORM does a relatively good job at abstracting the triple-boolean logic of SQL when provided literal values (e.g. None handling in exclude()) but it doesn't deal with nullable expressions for performance reasons (hard to determine if complex expressions are nullable and OR clauses required for IS NULL makes the job of the query planner harder).

Version 1, edited 4 years ago by Simon Charette (previous) (next) (diff)

comment:4 by Denis Verbin, 3 years ago

Resolution: needsinfo
Status: closednew

Hi Simon, thanks for the idea, but it isn't working with Django 3.2.4, filtering by annotated 'outer_category_id' in the subquery raises an error

AttributeError: 'ResolvedOuterRef' object has no attribute 'get_lookup'

category_id is an annotated value in the inner query in my case.

Probably this is a bug?

comment:5 by Denis Verbin, 3 years ago

Type: UncategorizedBug
Version: 2.03.2

comment:6 by Mariusz Felisiak, 3 years ago

Resolution: needsinfo
Status: newclosed

Denis, this is a separate issue, see #31714.

Note: See TracTickets for help on using tickets.
Back to Top