Opened 4 weeks ago

Last modified 4 weeks ago

#36025 assigned Bug

__range lookup in conditional aggregate with subquery annotation does not use annotated related fields

Reported by: Aashay Amballi Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: ORM
Cc: Aashay Amballi, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm encountering an issue with a Django ORM operation that uses the __range filter on related fields. Here is the relevant model setup and operation:

class Project(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()
    start_date = models.DateField()
    end_date = models.DateField()

class LaborRecord(models.Model):
    actual_hours = models.DecimalField(max_digits=5, decimal_places=2)
    billable_hours = models.DecimalField(max_digits=5, decimal_places=2)
    object_id = models.PositiveIntegerField()
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    content_object = GenericForeignKey("content_type", "object_id")


class WorkOrder(models.Model):
    class Status(models.TextChoices):
        IN_PROGRESS = 'st_in_progress', _('In Progress')
        NEW = 'st_new', _('New')
        OPEN = 'st_open', _('Open')
        CANCELLED = 'st_cancelled', _('Cancelled')
        COMPLETED = 'st_completed', _('Completed')
        REJECTED = 'st_rejected', _('Rejected')

    project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name="project_work_orders", null=True)
    name = models.CharField(max_length=100)
    description = models.TextField()
    due_date = models.DateTimeField()
    estimated_labor = models.DecimalField(max_digits=5, decimal_places=2, null=True)
    labor_records = GenericRelation('LaborRecord')
    status = models.CharField(max_length=20, choices=Status.choices, default=Status.NEW)

ORM Operation:

model_contentype_id = ContentType.objects.get_for_model(WorkOrder).id
labor_hour_sq = LaborRecord.objects.filter(object_id=OuterRef("pk"), content_type_id=model_contentype_id).values("object_id")
billable_hours_sq = labor_hour_sq.annotate(billable_labor_hours=Sum("billable_hours")).values("billable_labor_hours")
actual_hours_sq = labor_hour_sq.annotate(actual_labor_hours=Sum("actual_hours")).values("actual_labor_hours")

queryset = Project.objects.first().project_work_orders.all()

filter_condition = Q(
    ~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), 
    Q(due_date__isnull=True) | Q(due_date__date__range=(F('project__start_date'), F('project__end_date')))
)

query = queryset.annotate(
    billable_labor_hours=Subquery(billable_hours_sq),
    actual_labor_hours=Subquery(actual_hours_sq),
).aggregate(
    out_of_bound_count=Count("id", filter=filter_condition), 
    planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), 
    completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0)
)

When running this operation, the following error occurs:

django.db.utils.ProgrammingError: missing FROM-clause entry for table "app_1_project"
LINE 1: ...LL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_pro...

The SQL generated by this operation is:

SELECT COUNT("__col1") FILTER (
    WHERE (NOT ("__col2" IN ('completed', 'closed', 'sch_closed', 'cancelled', 'rejected')) 
    AND ("__col3" IS NULL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_project"."start_date" AND "app_1_project"."end_date"))
), 
COALESCE(SUM("__col4") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0),
COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0)
FROM (
    SELECT (
        SELECT SUM(U0."billable_hours") AS "billable_labor_hours"
        FROM "app_1_laborrecord" U0
        WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
        GROUP BY U0."object_id"
    ) AS "billable_labor_hours",
    (
        SELECT SUM(U0."actual_hours") AS "actual_labor_hours"
        FROM "app_1_laborrecord" U0
        WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
        GROUP BY U0."object_id"
    ) AS "actual_labor_hours",
    "app_1_workorder"."id" AS "__col1",
    "app_1_workorder"."status" AS "__col2",
    "app_1_workorder"."due_date" AS "__col3",
    "app_1_workorder"."estimated_labor" AS "__col4"
    FROM "app_1_workorder"
    INNER JOIN "app_1_project"
    ON ("app_1_workorder"."project_id" = "app_1_project"."id")
    WHERE "app_1_workorder"."project_id" = 1
) subquery

Initially, it seemed like a problem with alias field generation for annotated fields within the __range filter.

Explicitly using __gte and __lte instead of using __range resolved the issue. Below is the example for it and the SQL generated by the ORM operation

filter_condition = Q(~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']),Q(due_date__isnull=True) | Q(due_date__date__gte=F('project_start_date'), due_date__date__lte=F('project_end_date')))
SELECT COUNT("__col1") FILTER (WHERE (NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')) AND ("__col3" IS NULL OR (("__col3" AT TIME ZONE 'UTC')::date >= ("__col4") AND ("__col3" AT TIME ZONE 'UTC')::date <= ("__col5"))))),
       COALESCE(SUM("__col6") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0),
       COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0)
  FROM (
        SELECT (
                SELECT SUM(U0."billable_hours") AS "billable_labor_hours"
                  FROM "app_1_laborrecord" U0
                 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
                 GROUP BY U0."object_id"
               ) AS "billable_labor_hours",
               (
                SELECT SUM(U0."actual_hours") AS "actual_labor_hours"
                  FROM "app_1_laborrecord" U0
                 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
                 GROUP BY U0."object_id"
               ) AS "actual_labor_hours",
               "app_1_project"."start_date" AS "project_start_date",
               "app_1_project"."end_date" AS "project_end_date",
               "app_1_workorder"."id" AS "__col1",
               "app_1_workorder"."status" AS "__col2",
               "app_1_workorder"."due_date" AS "__col3",
               "app_1_project"."start_date" AS "__col4",
               "app_1_project"."end_date" AS "__col5",
               "app_1_workorder"."estimated_labor" AS "__col6"
          FROM "app_1_workorder"
          LEFT OUTER JOIN "app_1_project"
            ON ("app_1_workorder"."project_id" = "app_1_project"."id")
         WHERE "app_1_workorder"."project_id" = 1
       ) subquery

as you can see it created alias columns __col4 for project__start_date and __col5 for project__end_date. but for the __range it was directly trying to fetch from the table/model.

I tried annotating the project start and end dates as follows with __range filter and it didn't help either. Despite explicitly annotating the fields, the generated SQL remains unchanged from the original. Django does not recognize or utilize the aliased/annotated fields:

filter_condition = Q(~Q(status__in=['completed', 'closed', 'sch_closed', 'cancelled', 'rejected']), Q(due_date__isnull=True) | Q(due_date__date__range=(F('project_start_date'), F('project_end_date'))))
queryset.annotate(
    billable_labor_hours=Subquery(billable_hours_sq),
    actual_labor_hours=Subquery(actual_hours_sq),
    project_start_date=F("project__start_date"),
    project_end_date=F("project__end_date")
).aggregate(
    out_of_bound_count=Count("id", filter=filter_condition), 
    planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), 
    completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0)
)

The issue seems related to ongoing discussions in Django's ticket #33929, but further investigation is needed to confirm a direct link.

Change History (7)

comment:1 by Aashay Amballi, 4 weeks ago

Summary: Django ORM `__range` Filter Fails to Use Annotated Date Fields in SQL GenerationDjango ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation

comment:2 by Sarah Boyce, 4 weeks ago

Triage Stage: UnreviewedAccepted

I think I can replicate, on SQLite I get an error: django.db.utils.OperationalError: no such column: appname_project.start_date

I think I can simplify the example a bit

class Project(models.Model):
    start_date = models.DateField()
    end_date = models.DateField()


class WorkOrder(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name="project_work_orders", null=True)
    due_date = models.DateTimeField()

Then I get the same error with:

WorkOrder.objects.annotate(
    project_id_subquery=Subquery(Project.objects.filter(id=OuterRef("project_id")).values("id")),
).aggregate(
    out_of_bound_count=Count("id", filter=Q(due_date__date__range=(F('project__start_date'), F('project__end_date')))),
    project_id_subquery_sum=Sum("project_id_subquery"),
)

comment:3 by Sarah Boyce, 4 weeks ago

Summary: Django ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation__range lookup in conditional aggregate with subquery annotation does not use annotated related fields

comment:4 by Simon Charette, 4 weeks ago

Cc: Simon Charette added

An important lead here is likely this statement

Explicitly using __gte and __lte instead of using __range resolved the issue.

If that's truly the case then it means that the likely culprit is how RangeLookup stores its resolved right-hand-side as a tuple(Col(...), Col(...)) which trips resolvable detection (hasattr(expr, "resolve_expression")) that happens when alias relabeling takes place (see Expression.relabeled_clone).

Expression.relabeled_clone, which is called when subqueries are involved and tables need to be re-aliases to avoid conflicts, relies on self.get_source_expression to walk through the expression tree and make sure all nested references to tables are re-aliases. If you look at Lookup.get_source_expressions though you'll notice that the right-hand-side is excluded if it's not a compilable (this should likely be a resolvable check instead). Since RangeLookup.rhs is a tuple and thus not considered a resolvable it falls on its face and isn't considered for re-labeling.

We have a larger problem here where lookups that allow containers of potentially resolvable members as right-hand-side (e.g. __range, __in) that are not considered as resolvable themselves that we should address but the immediate solution for RangeLookup appears to be wrapping it's right-hand-side in an ExpressionList at resolving time.

Version 0, edited 4 weeks ago by Simon Charette (next)

comment:5 by Simon Charette, 4 weeks ago

Some related tickets of interest here are #22288 and #16487 which added support for tuple[Resolvable, Resolvable] but missed the re-aliasing support.

comment:6 by Gregory Mariani, 4 weeks ago

Can't reproduce with the minimal example. I get:
{'out_of_bound_count': 0, 'project_id_subquery_sum': 3}
with that script:

project1 = Project.objects.create(
    start_date='2024-01-01',
    end_date='2024-06-30')

project2 = Project.objects.create(
    start_date='2024-07-01',
    end_date='2024-12-31')

due_date1 = timezone.make_aware(timezone.datetime(2024, 4, 15, 10, 0, 0))
due_date2 = timezone.make_aware(timezone.datetime(2024, 10, 1, 12, 0, 0))

work_order1 = WorkOrder.objects.create(
    project=project1,
    due_date=due_date1,
)

work_order2 = WorkOrder.objects.create(
    project=project2,
    due_date=due_date1,
)

result = WorkOrder.objects.annotate(
    project_id_subquery=Subquery(Project.objects.filter(
        id=OuterRef("project_id")).values("id")),
).aggregate(
    out_of_bound_count=Count("id", filter=Q(due_date__date__range=(
        F('project__start_date'), F('project__end_date')))),
    project_id_subquery_sum=Sum("project_id_subquery"),
)
print(result)

comment:7 by Simon Charette, 4 weeks ago

Has patch: set
Owner: set to Simon Charette
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top