#22429 closed Bug (fixed)
Wrong SQL generated when using ~Q and F
Reported by: | sipp11 | Owned by: | superemily |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Release blocker | Keywords: | |
Cc: | loic@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I found this problem on 1.6.2 and also on 1.8.0alpha (master as of today commit b82f30785ff0f9fedf38fc79624a9064a903de4a)
How to reproduce:
create school app. Then add models and run a query.
models.py
Code highlighting:
class School(models.Model): school_id = models.CharField('School ID', max_length=10, unique=True) class Student(models.Model): school = models.ForeignKey(School, related_name='students') prefix = models.CharField('Prefix', max_length=3, blank=True, default='') student_id = models.CharField('Student ID', max_length=15) class Classroom(models.Model): school = models.ForeignKey(School, related_name='classrooms') students = models.ManyToManyField(Student, related_name='classroom', blank=True)
Then go to django shell
Code highlighting:
>>> Student.objects.filter(~Q(classroom__school=F('school'))) Traceback (most recent call last): File "<console>", line 1, in <module> File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 116, in __repr__ data = list(self[:REPR_OUTPUT_SIZE + 1]) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 141, in __iter__ self._fetch_all() File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 961, in _fetch_all self._result_cache = list(self.iterator()) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 265, in iterator for row in compiler.results_iter(): File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/sql/compiler.py", line 698, in results_iter for rows in self.execute_sql(MULTI): File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/sql/compiler.py", line 784, in execute_sql cursor.execute(sql, params) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 74, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 59, in execute return self.cursor.execute(sql, params) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/utils.py", line 94, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 59, in execute return self.cursor.execute(sql, params) File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/sqlite3/base.py", line 480, in execute return Database.Cursor.execute(self, query, params) OperationalError: no such column: U1.student_id >>>
Apparently SQL was missing U1 declaration.
Code highlighting:
>>> print Student.objects.filter(~Q(classroom__school=F('school'))).query SELECT "school_student"."id", "school_student"."school_id", "school_student"."prefix", "school_student"."student_id" FROM "school_student" WHERE NOT ("school_student"."id" IN ( SELECT U1."student_id" AS "student_id" FROM "school_student" U0 INNER JOIN "school_classroom_students" U2 ON ( U0."id" = U2."student_id" ) INNER JOIN "school_classroom" U3 ON ( U2."classroom_id" = U3."id" ) WHERE U3."school_id" = (U0."school_id")) )
Above is information I got from testing with master (1.8.0alpha)
This is another trackback from 1.6.2, https://dpaste.de/Yzja, which is in pretty much the same.
Change History (11)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 1.6 → master |
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 11 years ago
Has patch: | set |
---|---|
Severity: | Normal → Release blocker |
Here's a PR. Since it's a regression 1.6, it qualifies for a backport there so a mention in the 1.6.4 release notes would be good.
comment:4 by , 11 years ago
I updated the patch from @superemily: https://github.com/loic/django/compare/review22429
We discussed this with @akaariai on IRC a couple of days ago, he'll do a final review.
The release notes will need to be pushed to 1.6.5, if that doesn't make it on time for 1.6.4.
comment:5 by , 11 years ago
Cc: | added |
---|
comment:6 by , 11 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
RFC pending final review as noted above.
comment:7 by , 11 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
I don't think the patch is correct. Patched code works a little better than current code, but I am sure it will fail for more complex queries.
Below is some background information needed for understanding how the code is failing.
When an exclude() filter condition needs to be pushed into a subquery, Django proceeds as follows:
- Construct a new query (that is, start from a freshly created Query object) for the filter condition using build_filter(). The new Query object is going to be used as subquery.
- In build_filter() the F() object is added to the query first
- Then, the joins for the lookup path are generated
- End result is that
Q(classroom__school=F('school'))
generates first one join for the F('school'), then needed joins for theclassroom__school
part
- Next, joins from the beginning of the newly generated query are trimmed. The aim is to find a split position as deep as possible in the new query. (That is, we try to add all possible joins to the original query, and remove as many joins as possible from the start of the new query).
- The code that does the trimming assumes that the joins generated for the
classroom__school
path are the first joins generated for the query, that is, the join paths used forclasroom__school
are first joins in self.tables. As noted above, this is not true when using F() objects. - End result is that due to the incorrect assumption, the code ends up trimming the join for F('school'), thus the reference to unused alias.
The proposed fix doesn't actually correct the join trimming logic to do the right thing. The trimming logic in beginning of trim_start() still might unref joins that it shouldn't.
I tried to fix this by changing the order of join generation for F() objects and the lookup path, that is first generate the joins needed for lookup path, then joins needed for the F() object. This results in numerous failures, so I didn't investigate that option further.
I am going to investigate how to make the join trimming a bit more reliable, that is, make sure it trims only the correct tables. I am afraid the changes are going to look somewhat ugly.
comment:8 by , 11 years ago
Patch needs improvement: | unset |
---|
Another try available from https://github.com/akaariai/django/compare/ticket_22429
The idea this time is to explicitly communicate the joins generated for the lookup path to trim_start(). This is done in an ugly way, that is the joins generated are stored in self._lookup_joins in build_filter(). Works, but this isn't pretty.
When the joins generated for the lookup path are known in trim_start() the code can operate on the tables used for the lookup path, and thus avoid trimming wrong joins. I think the fix is correct, but extra eyes are very welcome.
comment:9 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Hi,
I can indeed reproduce this on master and 1.6 but the example code seems to work in 1.5 (at least it doesn't trigger any traceback).
Using
git bisect
, I found that commit b4492a8ca4a7ae4daa3a6b03c3d7a845fad74931 is the one that introduced the change.Thanks.