Opened 7 years ago
Last modified 10 months ago
#29214 new Bug
Invalid SQL generated when annotating a subquery with an outerref to an annotated field.
Reported by: | Oskar Persson | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | queryset annotations |
Cc: | Mariusz Felisiak, Simon Charette, Chetan Khanna | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
An SQL-error is raised when updating a queryset which includes an annotation created using a subquery, which in turn also uses an annotation.
class Recursive(models.Model): name = models.CharField(max_length=10) link = models.IntegerField() parent = models.ForeignKey('self', models.CASCADE, null=True) from django.db.models import OuterRef, Subquery, F rec1 = Recursive.objects.create(name="r1", link=1) rec2 = Recursive.objects.create(name="r2", link=1) rec3 = Recursive.objects.create(name="r11", parent=rec1, link=2) latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id') Recursive.objects.filter(parent__isnull=False) \ .annotate(parent_link=F('parent__link')) \ .annotate(p=Subquery(latest_parent.values('pk')[:1])) \ .update(parent_id=F('p'))
Traceback (most recent call last): File "tests.py", line 88, in test_update_annotated_using_related_queryset .update(parent_id=F('p')) File "/git/django/django/db/models/query.py", line 647, in update rows = query.get_compiler(self.db).execute_sql(CURSOR) File "/git/django/django/db/models/sql/compiler.py", line 1204, in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) File "/git/django/django/db/models/sql/compiler.py", line 899, in execute_sql raise original_exception OperationalError: no such column: T2.link
I've tried this in SQLite and MySQL in Django 1.11.11 and Django 2.0 using Python 3.6.0. All raising similar errors.
Attachments (1)
Change History (25)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|---|
Summary: | SQL-error when updating after annotating using subquery → Invalid SQL when updating after annotating using subquery |
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 6 years ago
Cc: | added |
---|
follow-up: 9 comment:5 by , 6 years ago
Summary: | Invalid SQL when updating after annotating using subquery → Invalid SQL generated when annotating a subquery with an outerref to an annotated field. |
---|
#30009 was a duplicate that wasn't relying on update()
.
comment:6 by , 6 years ago
As mentioned in #30009 this has something to do with alias quoting and is probably related to Subquery.resolve_expression
's alteration of external_alias
as a comment mentions it's done to prevent quoting.
comment:7 by , 5 years ago
Version: | 1.11 |
---|
comment:8 by , 5 years ago
Version: | → master |
---|
comment:9 by , 5 years ago
Replying to Simon Charette:
#30009 was a duplicate that wasn't relying on
update()
.
I don't think this issue is strictly related to annotate
. Possibly, its the update()
clause. If we remove the update()
clause, the query runs just fine.
latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id') Recursive.objects.filter(parent__isnull=False) \ .annotate(parent_link=F('parent__link')) \ .annotate(p=Subquery(latest_parent.values('pk')[:1])) \ .values()
which produces the following SQL:
SELECT "recursive_model"."id", "recursive_model"."name", "recursive_model"."link", "recursive_model"."parent_id", T2."link" AS "parent_link", ( SELECT U0."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ORDER BY U0."id" ASC LIMIT 1 ) AS "p" FROM "recursive_model" INNER JOIN "recursive_model" T2 ON ("recursive_model"."parent_id" = T2."id") WHERE "recursive_model"."parent_id" IS NOT NULL LIMIT 21
Also the query on https://code.djangoproject.com/ticket/30009 seems to run just fine on current master. (I had to tweak it a bit since the otherwise the ORM complained about multiple columns being returned)
query:
Task.objects.annotate( top_case_id=Coalesce(F('case__parent_case__parent_case_id'), F('case__parent_case_id'), F('case_id')), subject=Subquery(Subject.objects.filter(case_id=OuterRef('top_case_id')).values('id'), output_field=IntegerField()) ).all()
corresponding SQL:
SELECT "ticket29214_task"."id", "ticket29214_task"."num", "ticket29214_task"."case_id", COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id") AS "top_case_id", ( SELECT U0."id" FROM "ticket29214_subject" U0 WHERE U0."case_id" = COALESCE(T3."parent_case_id", "ticket29214_case"."parent_case_id", "ticket29214_task"."case_id") ) AS "subject" FROM "ticket29214_task" LEFT OUTER JOIN "ticket29214_case" ON ("ticket29214_task"."case_id" = "ticket29214_case"."id") LEFT OUTER JOIN "ticket29214_case" T3 ON ("ticket29214_case"."parent_case_id" = T3."id") LIMIT 21
(No extra quoting aroung "T3" as mentioned in the description)
Database used was PostgreSQL and SQLs are picked from shell_plus
of django_extensions
.
comment:10 by , 5 years ago
Cc: | added |
---|
Chetan did you reproduce the update()
crash against latest master as well? There has been a few tweaks to subquery aliases quoting in the past weeks.
Could you provide the generated problematic UPDATE
SQL since it's missing from the ticket?
comment:11 by , 5 years ago
Cc: | added |
---|
Yes. On running the same query, I get the following SQL and traceback:
latest_parent = Recursive.objects.filter(link=OuterRef('parent_link')).order_by('id') Recursive.objects.filter(parent__isnull=False) \ .annotate(parent_link=F('parent__link')) \ .annotate(p=Subquery(latest_parent.values('pk')[:1])) \ .update(parent_id=F('p'))
SQL:
UPDATE "recursive_model" SET "parent_id" = ( SELECT U0."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ORDER BY U0."id" ASC LIMIT 1 ) WHERE "recursive_model"."id" IN ( SELECT V0."id" FROM "recursive_model" V0 INNER JOIN "recursive_model" V1 ON (V0."parent_id" = V1."id") WHERE V0."parent_id" IS NOT NULL )
Traceback:
ProgrammingError: missing FROM-clause entry for table "t2" LINE 1: ...."id" FROM "recursive_model" U0 WHERE U0."link" = T2."link" ...
Unfortunately I haven't looked deeper, but I am planning to during this weekend. I'll report if I get something else.
comment:12 by , 5 years ago
Thanks!
Something suspicious here is the usage of F('parent__link')
which should be disallowed per https://code.djangoproject.com/ticket/14104#comment:1. We don't support JOIN
s in UPDATE
unless I'm mistaken so I suspect the JOIN
'ed T2
table is simply get pruned by SQLUpdateCompiler
.
The only way to add support for such query would be to make OuterRef('parent__link')
result in a JOIN
within the subquery, notice how the filter(parent__isnull=False)
predicate results in a subquery instead of an INNER JOIN
like it does when .update
is not used in comment:9.
I assume we'd want do something similar for OuterRef
that include a __
so the resulting query is
latest_parent = Recursive.objects.filter(link=OuterRef('parent__link')).order_by('id') Recursive.objects.filter( parent__isnull=False ).update(parent_id=Subquery(latest_parent.values('pk')[:1])
UPDATE "recursive_model" SET "parent_id" = ( SELECT U0."id" FROM "recursive_model" U0 WHERE U0."link" IN ( SELECT U1."link" FROM "recursive_model" U1 WHERE U1."id" = "recursive_model"."parent_id" ) ORDER BY U0."id" ASC LIMIT 1 ) WHERE "recursive_model"."id" IN ( SELECT V0."id" FROM "recursive_model" V0 INNER JOIN "recursive_model" V1 ON (V0."parent_id" = V1."id") WHERE V0."parent_id" IS NOT NULL )
comment:13 by , 5 years ago
That comment was really insightful, thank you! :)
If you don't mind confirming, then this is what I could get from the comment:
Whenever there is a filter()
clause, preceding an update()
clause, and containing fields from related models joined via __
, the ORM gives back a SQL with an INNER JOIN
inside a subquery. And for this issue, we want to have a similar functionality in the OuterRef
that contain __
.
If so, I think I have created a test from our test suite that might serve as a starting point.
comment:14 by , 5 years ago
Chetan, almost correct. The query will not necessarily involve generating an INNER JOIN
but it happens to do so in this case.
Most of the logic lives in SQLUpdateCompiler but the idea is that if the backend supports it the whole filtering part of the queryset will be pushed down to a subquery, JOIN
s included, and filtered against using pk IN
.
This will be way trickier to implement for subqueries and I'm starting to wonder if we didn't get it wrong in the first place by making OuteRef('outerqs__lookup')
result in a new JOIN
in the outer query instead of encapsulating it in the subquery. The way we do it right now breaks aggregation in subtle ways #29214 and happens to break update as well as we came to discover.
If the JOIN
was always encapsulated in the subquery in the first place it would solve this ticket, #29214 and I suspect it would make implementing #28296 way easier.
comment:15 by , 5 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Yes, it sure is tricky!
https://github.com/django/django/pull/12546
I have prepared a patch that should work for the update case at least. And sorry for prolonging this, I had my university exams in the middle and the implementation didn't come to me at once but I still wanted to give it a shot. I've tried to follow whatever was said in https://code.djangoproject.com/ticket/29214#comment:12 as closely as possible. Awaiting review :)
comment:16 by , 5 years ago
Patch needs improvement: | set |
---|
comment:17 by , 5 years ago
Sorry, I forgot that. I got occupied with my GSoC application since the submissions have already started.
comment:19 by , 3 years ago
Is there a way I can help with this issue? I have just reported #32839 and I am eager to resolve the problem, but I can see here that there is already a work in progress.
comment:20 by , 3 years ago
Absolutely! If you want to use the previous patch as a starting point, be sure to credit the original author with Co-authored-by: Name <email>
in the commit message. Assign yourself the ticket when you intend to get going.
comment:21 by , 3 years ago
Ok, I have super simple solution, but I would like to pre check if this is a possible way to go. Can we use lower case table aliases?
I have applied the following monkeypatch, and everything works fine for my case:
Query.alias_prefix = 't' Query.subq_aliases = frozenset(['t']) def bump_prefix(self, outer_query): ... alphabet = ascii_lowercase ... Query.bump_prefix = bump_prefix
comment:22 by , 3 years ago
No it's just working around the issue and will break on backends that follow the SQL standard with regards to unquoted aliases (e.g. Oracle).
comment:23 by , 2 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:24 by , 10 months ago
Well, I came here from #32839 and believe that it better explains the issue. I also see a 'missing FROM-clause entry for table "T5" ' error when using an annotated field in a subquery with OuterRef which is joined multiple times in the query and gets the name of T5. When I checked the raw query I saw that the table joined as T5 but used in the subquery as "T5" and then raised an error.
Reproduced at feb683c4c2c5ecfb61e4cb490c3e357450c0c0e8.