Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#22429 closed Bug (fixed)

Wrong SQL generated when using ~Q and F

Reported by: sipp11 Owned by: superemily
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords:
Cc: loic@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I found this problem on 1.6.2 and also on 1.8.0alpha (master as of today commit b82f30785ff0f9fedf38fc79624a9064a903de4a)

How to reproduce:
create school app. Then add models and run a query.

models.py

Code highlighting:

class School(models.Model):
    school_id = models.CharField('School ID', max_length=10,
                                 unique=True) 


class Student(models.Model):
    school = models.ForeignKey(School, related_name='students')
    prefix = models.CharField('Prefix', max_length=3, blank=True, default='')
    student_id = models.CharField('Student ID', max_length=15)


class Classroom(models.Model):
    school = models.ForeignKey(School, related_name='classrooms')
    students = models.ManyToManyField(Student, related_name='classroom', blank=True)

Then go to django shell

Code highlighting:

>>> Student.objects.filter(~Q(classroom__school=F('school')))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 116, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 141, in __iter__
    self._fetch_all()
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 961, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/query.py", line 265, in iterator
    for row in compiler.results_iter():
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/sql/compiler.py", line 698, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/models/sql/compiler.py", line 784, in execute_sql
    cursor.execute(sql, params)
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 74, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 59, in execute
    return self.cursor.execute(sql, params)
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/utils.py", line 94, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/utils.py", line 59, in execute
    return self.cursor.execute(sql, params)
  File "/Users/sipp11/.virtualenvs/psis/lib/python2.7/site-packages/Django-1.8.dev20140413012625-py2.7.egg/django/db/backends/sqlite3/base.py", line 480, in execute
    return Database.Cursor.execute(self, query, params)
OperationalError: no such column: U1.student_id
>>>

Apparently SQL was missing U1 declaration.

Code highlighting:

>>> print Student.objects.filter(~Q(classroom__school=F('school'))).query
SELECT "school_student"."id", "school_student"."school_id", "school_student"."prefix", "school_student"."student_id" 
FROM "school_student" 
WHERE NOT ("school_student"."id" IN (
      SELECT U1."student_id" AS "student_id" 
      FROM "school_student" U0
      INNER JOIN "school_classroom_students" U2 ON ( U0."id" = U2."student_id" ) 
      INNER JOIN "school_classroom" U3 ON ( U2."classroom_id" = U3."id" ) 
      WHERE U3."school_id" = (U0."school_id"))
)

Above is information I got from testing with master (1.8.0alpha)

This is another trackback from 1.6.2, https://dpaste.de/Yzja, which is in pretty much the same.

Change History (11)

comment:1 by Baptiste Mispelon, 11 years ago

Triage Stage: UnreviewedAccepted
Version: 1.6master

Hi,

I can indeed reproduce this on master and 1.6 but the example code seems to work in 1.5 (at least it doesn't trigger any traceback).

Using git bisect, I found that commit b4492a8ca4a7ae4daa3a6b03c3d7a845fad74931 is the one that introduced the change.

Thanks.

comment:2 by superemily, 11 years ago

Owner: changed from nobody to superemily
Status: newassigned

comment:3 by Tim Graham, 11 years ago

Has patch: set
Severity: NormalRelease blocker

Here's a PR. Since it's a regression 1.6, it qualifies for a backport there so a mention in the 1.6.4 release notes would be good.

comment:4 by loic84, 11 years ago

I updated the patch from @superemily: https://github.com/loic/django/compare/review22429

We discussed this with @akaariai on IRC a couple of days ago, he'll do a final review.

The release notes will need to be pushed to 1.6.5, if that doesn't make it on time for 1.6.4.

comment:5 by loic84, 11 years ago

Cc: loic@… added

comment:6 by Tim Graham, 11 years ago

Triage Stage: AcceptedReady for checkin

RFC pending final review as noted above.

comment:7 by Anssi Kääriäinen, 11 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

I don't think the patch is correct. Patched code works a little better than current code, but I am sure it will fail for more complex queries.

Below is some background information needed for understanding how the code is failing.

When an exclude() filter condition needs to be pushed into a subquery, Django proceeds as follows:

  • Construct a new query (that is, start from a freshly created Query object) for the filter condition using build_filter(). The new Query object is going to be used as subquery.
    • In build_filter() the F() object is added to the query first
    • Then, the joins for the lookup path are generated
    • End result is that Q(classroom__school=F('school')) generates first one join for the F('school'), then needed joins for the classroom__school part
  • Next, joins from the beginning of the newly generated query are trimmed. The aim is to find a split position as deep as possible in the new query. (That is, we try to add all possible joins to the original query, and remove as many joins as possible from the start of the new query).
  • The code that does the trimming assumes that the joins generated for the classroom__school path are the first joins generated for the query, that is, the join paths used for clasroom__school are first joins in self.tables. As noted above, this is not true when using F() objects.
  • End result is that due to the incorrect assumption, the code ends up trimming the join for F('school'), thus the reference to unused alias.

The proposed fix doesn't actually correct the join trimming logic to do the right thing. The trimming logic in beginning of trim_start() still might unref joins that it shouldn't.

I tried to fix this by changing the order of join generation for F() objects and the lookup path, that is first generate the joins needed for lookup path, then joins needed for the F() object. This results in numerous failures, so I didn't investigate that option further.

I am going to investigate how to make the join trimming a bit more reliable, that is, make sure it trims only the correct tables. I am afraid the changes are going to look somewhat ugly.

comment:8 by Anssi Kääriäinen, 11 years ago

Patch needs improvement: unset

Another try available from https://github.com/akaariai/django/compare/ticket_22429

The idea this time is to explicitly communicate the joins generated for the lookup path to trim_start(). This is done in an ugly way, that is the joins generated are stored in self._lookup_joins in build_filter(). Works, but this isn't pretty.

When the joins generated for the lookup path are known in trim_start() the code can operate on the tables used for the lookup path, and thus avoid trimming wrong joins. I think the fix is correct, but extra eyes are very welcome.

comment:9 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 5e1f4656b98816c96a1cc051224c1b699db480e0:

Fixed #22429 -- Incorrect SQL when using ~Q and F

comment:10 by Anssi Kääriäinen <akaariai@…>, 11 years ago

In 402fc4f6c9921f56eecbb6e3fc69154270be6468:

[1.7.x] Fixed #22429 -- Incorrect SQL when using ~Q and F

Backport of 5e1f4656b9 from master

comment:11 by Anssi Kääriäinen <akaariai@…>, 11 years ago

In 0e3704963693ddfaaa525054c42333f6ffdb4b47:

[1.6.x] Fixed #22429 -- Incorrect SQL when using ~Q and F

Backpatch of 5e1f4656b98816c96a1cc051224c1b699db480e0 from master.

Conflicts:

django/db/models/sql/query.py
tests/queries/models.py
tests/queries/tests.py

Note: See TracTickets for help on using tickets.
Back to Top