Opened 14 years ago

Closed 11 years ago

Last modified 11 years ago

#14056 closed Bug (fixed)

Wrong query generated when using reverse foreign key

Reported by: premalshah Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: foreign key, reverse lookup
Cc: dtyschenko@…, Łukasz Rekucki, Seán Hayes Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Here is the model setup

from django.db import models

class Base(models.Model):
    id = models.AutoField(primary_key=True)
    field1 = models.CharField(max_length=11)
    
class Child(models.Model):
    id = models.AutoField(primary_key=True)
    obj = models.ForeignKey(Base)
    field1 = models.CharField(max_length=11)
    field2 = models.IntegerField()

Here are some sample queries and the corresponding sql statements generated and sql statements expected

Query 1

Base.objects.filter(child__obj=1).query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;

Query 2
Base.objects.filter(child__obj=1, child__field1='a').query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` T4 
ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1  AND T4.`field1` = a );

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1  AND `apptest_child`.`field1` = a );

Query 3

Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj').query

Generated SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) INNER JOIN `apptest_child` T4 
ON (`apptest_base`.`id` = T4.`obj_id`) WHERE (`apptest_base`.`id` = 1  AND T4.`field1` = a ) 
ORDER BY `apptest_child`.`field2` ASC, `apptest_base`.`id` ASC;

Expected SQL

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` LEFT OUTER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE (`apptest_child`.`obj_id` = 1  AND 
`apptest_child`.`field1` = a ) ORDER BY `apptest_child`.`field2` ASC, `apptest_child`.`obj_id` ASC;

Problems:

1) Query 1 is missing a join.

2) Query 2 is using apptest_base.id in the where clause instead of apptest_child.obj_id. The results from the generated sql and expected sql would be the same. However, indexes created on the apptest_child table to optimize the query cannot be used hence leading to a slow query.

3) Query 3 has the same problem as query 2 in the where clause as well as the order by clause. In addition to that, it also performs an extra join on the apptest_child table which gives different results from the generated sql and the expected sql statements.

I believe that this happens because of the reverse foreign key field obj. When child__obj is used, the query generation logic does not know that we are referencing the obj field on the Child table and not the id field on the Base table. If we could explicitly specify child__obj_id, it would help, but thats not an options with the current database model api code.

Attachments (1)

14056_join_by_reverse_relation.diff (7.1 KB ) - added by Alexey Smolsky 14 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 by premalshah, 14 years ago

This bug also applies to django version 1.0 - 1.2

comment:2 by premalshah, 14 years ago

Version: 0.961.0

Clarifications on the version #:
I've changed the version to 1.0 since that is what Im using right now and also because lot of people on the django IRC channel complained about the version #. But I have seen the same behavior in 0.96, 1.1 and 1.2.

Clarifications for Query 1:

Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.

Clarifications for Query 2:

There is a multi-column index on fields obj_id, field1 which speeds up the query quite a bit. Query 2 is a query optimization issue.

Evidence for wrong query results for Query 3:

Please add this to the class called "Base"

    def __str__(self):
        return self.field1

Now execute this:

b = Base(field1='ABC')
b.save()
c1 = Child(obj=b, field1='a', field2=1)
c2 = Child(obj=b, field1='b', field2=2)
c3 = Child(obj=b, field1='c', field2=3)
c1.save()
c2.save()
c3.save()

Base.objects.filter(child__obj=1, child__field1='a').order_by('child__field2', 'child__obj')
>>> [<Base: ABC>, <Base: ABC>, <Base: ABC>]

The actual result should be
>>> [<Base: ABC>]

comment:3 by di0, 14 years ago

Most people on the django IRC channel said "The join is not required and so django removes it." To concentrate on issues with Query 2 and 3, I would agree.

It's black magic. Black magic is bad. You've told to use child model, but ORM does not.

comment:4 by di0, 14 years ago

Cc: dtyschenko@… added

comment:5 by Łukasz Rekucki, 14 years ago

Cc: Łukasz Rekucki added

comment:6 by anonymous, 14 years ago

I agree the current behavior is incorrect because:

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` WHERE `apptest_base`.`id` = 1;

will return ALL Base objects with an id of 1, whereas:

SELECT `apptest_base`.`id`, `apptest_base`.`field1` FROM `apptest_base` INNER JOIN `apptest_child` 
ON (`apptest_base`.`id` = `apptest_child`.`obj_id`) WHERE `apptest_child`.`obj_id` = 1;

will only return Base objects with an id of 1 who are referenced to by a Child model, which is what 'Base.objects.filter(childobj=1)' implies. So a join is in fact required.

comment:7 by Seán Hayes, 14 years ago

My above comment was in reference to Query 1.

comment:8 by Seán Hayes, 14 years ago

Cc: Seán Hayes added

comment:9 by Russell Keith-Magee, 14 years ago

Triage Stage: UnreviewedAccepted

Regarding query 1 -- the join *is* required, but it's being incorrectly optimized. This optimization is only valid when it is known that child IDs will shadow parent IDs. This will always happen for model inheritance, but that isn't what is happening in the sample models.

The remaining queries all seem to be consequences of the same thing. Fix Q1, and the rest should follow.

Although it doesn't look like it on first inspection, I suspect this may be closely related to #11319.

comment:10 by Alexey Smolsky, 14 years ago

Has patch: set
Keywords: reverse lookup added
Version: 1.0SVN

Got an idea, that trim optimization of the join, which has been got by the simple reverse relation, -- is incorrect, because first "real table" doesn't contain any relation to second one, so it can't get sought-for field value.

Will attach possible straight-forward solution with tests in the next comment.

I'm not sure of this ticket is related to #11319. It seems that to_field attribute is partially broken.

by Alexey Smolsky, 14 years ago

comment:11 by Julien Phalip, 14 years ago

Severity: Normal
Type: Bug

comment:12 by Demetrius Cassidy <dcassidy36@…>, 14 years ago

Easy pickings: unset
milestone: 1.31.4
UI/UX: unset

comment:13 by Jacob, 13 years ago

milestone: 1.4

Milestone 1.4 deleted

comment:14 by Anssi Kääriäinen, 11 years ago

The generated queries on master are:

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE "queries_basechild"."obj_id" = 1 

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE ("queries_basechild"."field1" = a  AND "queries_basechild"."obj_id" = 1 )

SELECT "queries_base"."id", "queries_base"."field1"
  FROM "queries_base"
 INNER JOIN "queries_basechild" ON ( "queries_base"."id" = "queries_basechild"."obj_id" )
 WHERE ("queries_basechild"."field1" = a  AND "queries_basechild"."obj_id" = 1 )
 ORDER BY "queries_basechild"."field2" ASC, "queries_base"."id" ASC

These look correct to me. The last one has INNER JOIN instead of the description's expected LEFT OUTER JOIN - but INNER JOIN is correct here as the query can't produce results if no rows match the join condition.

Also, the last one has ORDER BY ..., queries_base.id instead of "queries_base"."obj_id". But this is correct -the join is INNER JOIN, so obj_id and id have always equivalent values.

There is one bug though, doing Base.objects.order_by('basechild__field2', 'basechild__obj').query will generate a LEFT JOIN, but use base's id. This doesn't necessarily have the same value as obj_id as the join is LEFT JOIN. A fix coming soon.

comment:15 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 905409855c6b69f613190fcc2d8bd8bf5e1580b5:

Fixed #14056 -- Made sure LEFT JOIN aren't trimmed in ORDER BY

If LEFT JOINs are required for correct results, then trimming the join
can lead to incorrect results. Consider case:

TBL A: ID | TBL B: ID A_ID

1 1 1
2

Now A.order_by('ba') did use a join to B, and B's a_id column. This
was seen to contain the same value as A's id, and so the join was
trimmed. But this wasn't correct as the join is LEFT JOIN, and for row
A.id = 2 the B.a_id column is NULL.

comment:16 by Andrew Godwin <andrew@…>, 11 years ago

In cea720450485e0871daa7f9477fdf2bff5a5b821:

Fixed #14056 -- Made sure LEFT JOIN aren't trimmed in ORDER BY

If LEFT JOINs are required for correct results, then trimming the join
can lead to incorrect results. Consider case:

TBL A: ID | TBL B: ID A_ID

1 1 1
2

Now A.order_by('ba') did use a join to B, and B's a_id column. This
was seen to contain the same value as A's id, and so the join was
trimmed. But this wasn't correct as the join is LEFT JOIN, and for row
A.id = 2 the B.a_id column is NULL.

Note: See TracTickets for help on using tickets.
Back to Top