Opened 10 years ago

Closed 4 years ago

#23797 closed Bug (fixed)

SQL generated for negated F() expressions is incorrect

Reported by: Suriya Subramanian Owned by: Jacob Walls
Component: Database layer (models, ORM) Version: 1.7
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following model definition.

from django.db import models

class Rectangle(models.Model):
    length = models.IntegerField(null=True)
    width = models.IntegerField(null=True)

We make the following queries: Q1) Rectangles that are squares. Q2a) Rectangles that are not squares (length != width). Q2b) Rectangles that are not squares (width != length). Queries Q2a and Q2b semantically mean the same. However, the ORM generates different SQL for these two queries.

import django
django.setup()

from django.db.models import F
from testapp.models import Rectangle

print '--- Q1: Get the rectangles that are squares'
print Rectangle.objects.filter(length=F('width')).values('pk').query

print '--- Q2a: Get the rectangles that are not squares'
print Rectangle.objects.exclude(length=F('width')).values('pk').query

print '--- Q2b: Get the rectangles that are not squares'
print Rectangle.objects.exclude(width=F('length')).values('pk').query

The generated SQL is

--- Q1: Get the rectangles that are squares
SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE "testapp_rectangle"."length" = ("testapp_rectangle"."width")
--- Q2a: Get the rectangles that are not squares
SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE NOT ("testapp_rectangle"."length" = ("testapp_rectangle"."width") AND "testapp_rectangle"."length" IS NOT NULL)
--- Q2b: Get the rectangles that are not squares
SELECT "testapp_rectangle"."id" FROM "testapp_rectangle" WHERE NOT ("testapp_rectangle"."width" = ("testapp_rectangle"."length") AND "testapp_rectangle"."width" IS NOT NULL)

Note the asymmetry between Q2a and Q2b. These queries will return different results.

Reddit user /u/charettes set up this useful SQL fiddle with the above mentioned schema and test values: http://sqlfiddle.com/#!12/c8283/4 Here's my reddit post on this issue: http://www.reddit.com/r/django/comments/2lxjcc/unintuitive_behavior_of_f_expression_with_exclude/

Change History (14)

comment:1 by Anssi Kääriäinen, 10 years ago

Triage Stage: UnreviewedAccepted

Unfortunately this will be messy to fix. It is true that .exclude() should produce the complement of .filter(), and that doesn't happen here.

One possible solution is to use WHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS NOT true. This should match both null and false values (the .filter() query could be written as WHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS true. I have no idea how well IS NOT true is optimized, and if it is available for all backends.

Another possibility is to also filter F() NULL values, but as said that will be messy.

comment:2 by Anssi Kääriäinen, 10 years ago

After a quick test, it seems the WHERE ("testapp_rectangle"."length" = ("testapp_rectangle"."width")) IS NOT true approach will not work that well. At least PostgreSQL wont use indexes for a query select * from foobar where (id > 1 and id < 20) is true, but will use the index for select * from foobar where (id > 1 and id < 20). This tells me PostgreSQL's optimizer will not handle the is true / is not true conditions well.

If somebody is able to provide a somewhat clean solution to this, I am all for fixing this.

comment:3 by Suriya Subramanian, 10 years ago

Is it the case that filter() and exclude() should produce the complement of each other? If that is not the case, the obvious queries would be WHERE (length = width) and WHERE NOT (length = width).

If the results should be complimentary, couldn't we use WHERE (length = width) for the filter and WHERE (length = width)) IS NOT true for the exclude. The exclude would be slower, but at least the results would be consistent.

comment:4 by Simon Charette, 10 years ago

@akaariai what about adding another IS NOT NULL clause here based on value we should end up with NOT ((length = width) AND (length IS NULL) AND (width IS NULL))?

If the query planer use the index when the generated constraint is NOT ((length = width) AND (length IS NULL)) then it should also be able to use it in this case.

comment:5 by Simon Charette, 10 years ago

Cc: Simon Charette added

comment:6 by Anssi Kääriäinen, 10 years ago

Yes, .exclude() should be the complement of .filter(). As said in comment:2 the WHERE (length = width)) IS NOT true way is just way too slow at least on PostgreSQL (always a sequential scan according to my tests). Going for the NOT ((length = width) AND (length IS NULL) AND (width IS NULL)) should likely work.

I'd like to do this in the Lookup class. We can't do this directly in as_sql() as we don't have enough context there. Maybe a new method get_extra_restriction(self, is_negated) could work. The method returns an expression (a Lookup instance or WhereNode instance) that will be ANDed to the lookup's main condition. I'll try to find the time to write a small proof of concept.

comment:7 by Anssi Kääriäinen, 10 years ago

I have to take back what I said earlier. Unfortunately we have backwards compatibility requirements for the pre-Lookup classes way of writing custom lookups. So, we actually don't necessarily have a lookup class at the point where we want to add the (width IS NULL) condition. So, for now we have to do this directly in build_filter() method.

comment:8 by Shai Berger, 10 years ago

Just for the record, the (x=y) is true syntax is not supported on Oracle (or Mysql, IIRC). It's a database, it doesn't really handle logic.

comment:9 by Jacob Walls, 4 years ago

Owner: changed from nobody to Jacob Walls
Status: newassigned

comment:10 by Jacob Walls, 4 years ago

Has patch: set

Looks like there was consensus here around the proper query:
NOT ((length = width) AND (length IS NULL) AND (width IS NULL))

as well as its location in build_query.

I submitted a patch that builds this query after checking if the comparison value is an instance of Col.
https://github.com/django/django/pull/13118

Version 0, edited 4 years ago by Jacob Walls (next)

comment:11 by Mariusz Felisiak, 4 years ago

Patch needs improvement: set

comment:12 by Jacob Walls, 4 years ago

Patch needs improvement: unset

Updated patch to address review comments.

It is worth mentioning the patch only addresses the case where both columns are nullable. If the source column is not nullable and the target column of the F() expression is nullable, exclude() still does not return records where the F() expression evaluates to NULL.

I'm open to thoughts on whether this ticket should be left open to address that unhandled case, or whether another ticket should track it.

comment:13 by Jacob Walls, 4 years ago

Updated the patch to handle the case I mentioned above, so no need to split tickets.

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 512da9d5:

Fixed #23797 -- Fixed QuerySet.exclude() when rhs is a nullable column.

Note: See TracTickets for help on using tickets.
Back to Top