Opened 10 years ago
Closed 4 years ago
#23797 closed Bug (fixed)
SQL generated for negated F() expressions is incorrect
Reported by: | Suriya Subramanian | Owned by: | Jacob Walls |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.7 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consider the following model definition.
from django.db import models class Rectangle(models.Model): length = models.IntegerField(null=True) width = models.IntegerField(null=True)
We make the following queries: Q1) Rectangles that are squares. Q2a) Rectangles that are not squares (length != width). Q2b) Rectangles that are not squares (width != length). Queries Q2a and Q2b semantically mean the same. However, the ORM generates different SQL for these two queries.
import django django.setup() from django.db.models import F from testapp.models import Rectangle print '--- Q1: Get the rectangles that are squares' print Rectangle.objects.filter(length=F('width')).values('pk').query print '--- Q2a: Get the rectangles that are not squares' print Rectangle.objects.exclude(length=F('width')).values('pk').query print '--- Q2b: Get the rectangles that are not squares' print Rectangle.objects.exclude(width=F('length')).values('pk').query
The generated SQL is
--- Q1: Get the rectangles that are squares SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE "testapp_rectangle"."length" = ("testapp_rectangle"."width") --- Q2a: Get the rectangles that are not squares SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE NOT ("testapp_rectangle"."length" = ("testapp_rectangle"."width") AND "testapp_rectangle"."length" IS NOT NULL) --- Q2b: Get the rectangles that are not squares SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE NOT ("testapp_rectangle"."width" = ("testapp_rectangle"."length") AND "testapp_rectangle"."width" IS NOT NULL)
Note the asymmetry between Q2a and Q2b. These queries will return different results.
Reddit user /u/charettes set up this useful SQL fiddle with the above mentioned schema and test values: http://sqlfiddle.com/#!12/c8283/4 Here's my reddit post on this issue: http://www.reddit.com/r/django/comments/2lxjcc/unintuitive_behavior_of_f_expression_with_exclude/
Change History (14)
comment:1 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 10 years ago
After a quick test, it seems the WHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS NOT true
approach will not work that well. At least PostgreSQL wont use indexes for a query select * from foobar where (id > 1 and id < 20) is true
, but will use the index for select * from foobar where (id > 1 and id < 20)
. This tells me PostgreSQL's optimizer will not handle the is true / is not true conditions well.
If somebody is able to provide a somewhat clean solution to this, I am all for fixing this.
comment:3 by , 10 years ago
Is it the case that filter()
and exclude()
should produce the complement of each other? If that is not the case, the obvious queries would be WHERE (length = width)
and WHERE NOT (length = width)
.
If the results should be complimentary, couldn't we use WHERE (length = width)
for the filter and WHERE (length = width)) IS NOT true
for the exclude. The exclude would be slower, but at least the results would be consistent.
comment:4 by , 10 years ago
@akaariai what about adding another IS NOT NULL
clause here based on value
we should end up with NOT ((length = width) AND (length IS NULL) AND (width IS NULL))
?
If the query planer use the index when the generated constraint is NOT ((length = width) AND (length IS NULL))
then it should also be able to use it in this case.
comment:5 by , 10 years ago
Cc: | added |
---|
comment:6 by , 10 years ago
Yes, .exclude() should be the complement of .filter(). As said in comment:2 the WHERE (length = width)) IS NOT true
way is just way too slow at least on PostgreSQL (always a sequential scan according to my tests). Going for the NOT ((length = width) AND (length IS NULL) AND (width IS NULL)) should likely work.
I'd like to do this in the Lookup class. We can't do this directly in as_sql() as we don't have enough context there. Maybe a new method get_extra_restriction(self, is_negated)
could work. The method returns an expression (a Lookup instance or WhereNode instance) that will be ANDed to the lookup's main condition. I'll try to find the time to write a small proof of concept.
comment:7 by , 10 years ago
I have to take back what I said earlier. Unfortunately we have backwards compatibility requirements for the pre-Lookup classes way of writing custom lookups. So, we actually don't necessarily have a lookup class at the point where we want to add the (width IS NULL) condition. So, for now we have to do this directly in build_filter() method.
comment:8 by , 10 years ago
Just for the record, the (x=y) is true
syntax is not supported on Oracle (or Mysql, IIRC). It's a database, it doesn't really handle logic.
comment:9 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:10 by , 4 years ago
Has patch: | set |
---|
Looks like there was consensus here around the proper query:
NOT ((length = width) AND (length IS NULL) AND (width IS NULL))
as well as its location in build_query
.
I submitted a patch that builds this query after checking if the comparison value is an instance of Col
.
https://github.com/django/django/pull/13118
comment:11 by , 4 years ago
Patch needs improvement: | set |
---|
comment:12 by , 4 years ago
Patch needs improvement: | unset |
---|
Updated patch to address review comments.
It is worth mentioning the patch only addresses the case where both columns are nullable. If the source column is not nullable and the target column of the F() expression is nullable, exclude() still does not return records where the F() expression evaluates to NULL.
I'm open to thoughts on whether this ticket should be left open to address that unhandled case, or whether another ticket should track it.
comment:13 by , 4 years ago
Updated the patch to handle the case I mentioned above, so no need to split tickets.
Unfortunately this will be messy to fix. It is true that .exclude() should produce the complement of .filter(), and that doesn't happen here.
One possible solution is to use
WHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS NOT true
. This should match both null and false values (the .filter() query could be written asWHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS true
. I have no idea how well IS NOT true is optimized, and if it is available for all backends.Another possibility is to also filter F() NULL values, but as said that will be messy.