Opened 4 weeks ago
Last modified 4 weeks ago
#36025 assigned Bug
__range lookup in conditional aggregate with subquery annotation does not use annotated related fields
Reported by: | Aashay Amballi | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | ORM |
Cc: | Aashay Amballi, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I'm encountering an issue with a Django ORM operation that uses the __range
filter on related fields. Here is the relevant model setup and operation:
class Project(models.Model): name = models.CharField(max_length=100) description = models.TextField() start_date = models.DateField() end_date = models.DateField() class LaborRecord(models.Model): actual_hours = models.DecimalField(max_digits=5, decimal_places=2) billable_hours = models.DecimalField(max_digits=5, decimal_places=2) object_id = models.PositiveIntegerField() content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE) content_object = GenericForeignKey("content_type", "object_id") class WorkOrder(models.Model): class Status(models.TextChoices): IN_PROGRESS = 'st_in_progress', _('In Progress') NEW = 'st_new', _('New') OPEN = 'st_open', _('Open') CANCELLED = 'st_cancelled', _('Cancelled') COMPLETED = 'st_completed', _('Completed') REJECTED = 'st_rejected', _('Rejected') project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name="project_work_orders", null=True) name = models.CharField(max_length=100) description = models.TextField() due_date = models.DateTimeField() estimated_labor = models.DecimalField(max_digits=5, decimal_places=2, null=True) labor_records = GenericRelation('LaborRecord') status = models.CharField(max_length=20, choices=Status.choices, default=Status.NEW)
ORM Operation:
model_contentype_id = ContentType.objects.get_for_model(WorkOrder).id labor_hour_sq = LaborRecord.objects.filter(object_id=OuterRef("pk"), content_type_id=model_contentype_id).values("object_id") billable_hours_sq = labor_hour_sq.annotate(billable_labor_hours=Sum("billable_hours")).values("billable_labor_hours") actual_hours_sq = labor_hour_sq.annotate(actual_labor_hours=Sum("actual_hours")).values("actual_labor_hours") queryset = Project.objects.first().project_work_orders.all() filter_condition = Q( ~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), Q(due_date__isnull=True) | Q(due_date__date__range=(F('project__start_date'), F('project__end_date'))) ) query = queryset.annotate( billable_labor_hours=Subquery(billable_hours_sq), actual_labor_hours=Subquery(actual_hours_sq), ).aggregate( out_of_bound_count=Count("id", filter=filter_condition), planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0) )
When running this operation, the following error occurs:
django.db.utils.ProgrammingError: missing FROM-clause entry for table "app_1_project" LINE 1: ...LL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_pro...
The SQL generated by this operation is:
SELECT COUNT("__col1") FILTER ( WHERE (NOT ("__col2" IN ('completed', 'closed', 'sch_closed', 'cancelled', 'rejected')) AND ("__col3" IS NULL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_project"."start_date" AND "app_1_project"."end_date")) ), COALESCE(SUM("__col4") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0), COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0) FROM ( SELECT ( SELECT SUM(U0."billable_hours") AS "billable_labor_hours" FROM "app_1_laborrecord" U0 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id")) GROUP BY U0."object_id" ) AS "billable_labor_hours", ( SELECT SUM(U0."actual_hours") AS "actual_labor_hours" FROM "app_1_laborrecord" U0 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id")) GROUP BY U0."object_id" ) AS "actual_labor_hours", "app_1_workorder"."id" AS "__col1", "app_1_workorder"."status" AS "__col2", "app_1_workorder"."due_date" AS "__col3", "app_1_workorder"."estimated_labor" AS "__col4" FROM "app_1_workorder" INNER JOIN "app_1_project" ON ("app_1_workorder"."project_id" = "app_1_project"."id") WHERE "app_1_workorder"."project_id" = 1 ) subquery
Initially, it seemed like a problem with alias field generation for annotated fields within the __range
filter.
Explicitly using __gte
and __lte
instead of using __range
resolved the issue. Below is the example for it and the SQL generated by the ORM operation
filter_condition = Q(~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']),Q(due_date__isnull=True) | Q(due_date__date__gte=F('project_start_date'), due_date__date__lte=F('project_end_date')))
SELECT COUNT("__col1") FILTER (WHERE (NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')) AND ("__col3" IS NULL OR (("__col3" AT TIME ZONE 'UTC')::date >= ("__col4") AND ("__col3" AT TIME ZONE 'UTC')::date <= ("__col5"))))), COALESCE(SUM("__col6") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0), COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0) FROM ( SELECT ( SELECT SUM(U0."billable_hours") AS "billable_labor_hours" FROM "app_1_laborrecord" U0 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id")) GROUP BY U0."object_id" ) AS "billable_labor_hours", ( SELECT SUM(U0."actual_hours") AS "actual_labor_hours" FROM "app_1_laborrecord" U0 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id")) GROUP BY U0."object_id" ) AS "actual_labor_hours", "app_1_project"."start_date" AS "project_start_date", "app_1_project"."end_date" AS "project_end_date", "app_1_workorder"."id" AS "__col1", "app_1_workorder"."status" AS "__col2", "app_1_workorder"."due_date" AS "__col3", "app_1_project"."start_date" AS "__col4", "app_1_project"."end_date" AS "__col5", "app_1_workorder"."estimated_labor" AS "__col6" FROM "app_1_workorder" LEFT OUTER JOIN "app_1_project" ON ("app_1_workorder"."project_id" = "app_1_project"."id") WHERE "app_1_workorder"."project_id" = 1 ) subquery
as you can see it created alias columns __col4
for project__start_date
and __col5
for project__end_date
. but for the __range
it was directly trying to fetch from the table/model.
I tried annotating the project start and end dates as follows with __range
filter and it didn't help either. Despite explicitly annotating the fields, the generated SQL remains unchanged from the original. Django does not recognize or utilize the aliased/annotated fields:
filter_condition = Q(~Q(status__in=['completed', 'closed', 'sch_closed', 'cancelled', 'rejected']), Q(due_date__isnull=True) | Q(due_date__date__range=(F('project_start_date'), F('project_end_date'))))
queryset.annotate( billable_labor_hours=Subquery(billable_hours_sq), actual_labor_hours=Subquery(actual_hours_sq), project_start_date=F("project__start_date"), project_end_date=F("project__end_date") ).aggregate( out_of_bound_count=Count("id", filter=filter_condition), planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0) )
The issue seems related to ongoing discussions in Django's ticket #33929, but further investigation is needed to confirm a direct link.
Change History (7)
comment:1 by , 4 weeks ago
Summary: | Django ORM `__range` Filter Fails to Use Annotated Date Fields in SQL Generation → Django ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation |
---|
comment:2 by , 4 weeks ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 4 weeks ago
Summary: | Django ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation → __range lookup in conditional aggregate with subquery annotation does not use annotated related fields |
---|
comment:4 by , 4 weeks ago
Cc: | added |
---|
An important lead here is likely this statement
Explicitly using
__gte
and__lte
instead of using__range
resolved the issue.
If that's truly the case then it means that the likely culprit is how RangeLookup
stores its resolved right-hand-side as a tuple(Col(...), Col(...))
which trips resolvable detection (hasattr(expr, "resolve_expression")
) that happens when alias relabeling takes place (see Expression.relabeled_clone
).
Expression.relabeled_clone
, which is called when subqueries are involved and tables need to be re-aliases to avoid conflicts, relies on self.get_source_expression
to walk through the expression tree and make sure all nested references to tables are re-aliases. If you look at Lookup.get_source_expressions
though you'll notice that the right-hand-side is excluded if it's not a compilable (this should likely be a resolvable check instead). Since RangeLookup.rhs
is a tuple
and thus not considered a resolvable it falls on its face and isn't considered for re-labeling.
We have a larger problem here where lookups that allow containers of potentially resolvable members as right-hand-side (e.g. __range
, __in
) that are not considered as resolvable themselves that we should address but the immediate solution for RangeLookup
appears to be wrapping it's right-hand-side in an ExpressionList
at resolving time.
comment:5 by , 4 weeks ago
comment:6 by , 4 weeks ago
Can't reproduce with the minimal example. I get:
{'out_of_bound_count': 0, 'project_id_subquery_sum': 3}
with that script:
project1 = Project.objects.create( start_date='2024-01-01', end_date='2024-06-30') project2 = Project.objects.create( start_date='2024-07-01', end_date='2024-12-31') due_date1 = timezone.make_aware(timezone.datetime(2024, 4, 15, 10, 0, 0)) due_date2 = timezone.make_aware(timezone.datetime(2024, 10, 1, 12, 0, 0)) work_order1 = WorkOrder.objects.create( project=project1, due_date=due_date1, ) work_order2 = WorkOrder.objects.create( project=project2, due_date=due_date1, ) result = WorkOrder.objects.annotate( project_id_subquery=Subquery(Project.objects.filter( id=OuterRef("project_id")).values("id")), ).aggregate( out_of_bound_count=Count("id", filter=Q(due_date__date__range=( F('project__start_date'), F('project__end_date')))), project_id_subquery_sum=Sum("project_id_subquery"), ) print(result)
comment:7 by , 4 weeks ago
Has patch: | set |
---|---|
Owner: | set to |
Status: | new → assigned |
I think I can replicate, on SQLite I get an error:
django.db.utils.OperationalError: no such column: appname_project.start_date
I think I can simplify the example a bit
Then I get the same error with: