Opened 14 years ago
Closed 14 years ago
#15009 closed (invalid)
Generated sql for "or"-ing two filters over multiple joins is wrong
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.2 |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In the following data model:
FirstRelated <- m2m -> BaseModel <- m2m -> SecondRelated <- fk <- ThirdRelated
I am doing a filter on the BaseModel with an "or" between:
- a condition on the ThirdRelated
- a condition on the FirstRelated
as BaseModel.objects.filter(Q(seconds__thirds__condition=True) | Q(firsts__condition=True))
.
Each of the two filters behaves correctly when executed, but when "or"-ed some records are returned more than once.
The problem is related to the generated SQL, because 'or'-ing two conditions make the joins optional, but then if a condition fails a record can be returned because of the other condition.
The solution probably would be to include a DISTINCT() clause when 'or'-ing the conditions.
Here (and attached, too) you have the models and the tests for a failing example. I leave out the wrong_or/settings.py
because they are all standard.
File wrong_or/wrong/models.py
:
from django.db import models class First(models.Model): condition = models.BooleanField() class Second(models.Model): pass class BaseModel(models.Model): firsts = models.ManyToManyField(First) seconds = models.ManyToManyField(Second) class Third(models.Model): second = models.ForeignKey(Second, related_name='thirds') condition = models.BooleanField()
File wrong_or/wrong/tests.py
:
from django.db import models from django.test import TestCase from wrong_or.wrong.models import BaseModel, First, Second, Third class SimpleTest(TestCase): def test_wrong(self): bo = BaseModel.objects.create() f = First.objects.create(condition=True) bo.firsts.add(f) s1 = Second.objects.create() t1 = Third.objects.create(second=s1, condition=False) s2 = Second.objects.create() t2 = Third.objects.create(second=s2, condition=False) bo.seconds.add(s1, s2) q1 = models.Q(firsts__condition=True) # match the only BaseModel q2 = models.Q(seconds__thirds__condition=True) # match nothing self.assertTrue(BaseModel.objects.filter(q1).exists()) self.assertEqual(1, BaseModel.objects.filter(q1).count()) self.assertFalse(BaseModel.objects.filter(q2).exists()) # A filter should not duplicate records. # This assertion will fail, because there are two copies of # the 'bo' record. self.assertEqual(1, BaseModel.objects.filter(q1|q2).count())
Attachments (2)
Change History (5)
by , 14 years ago
by , 14 years ago
comment:1 by , 14 years ago
follow-up: 3 comment:2 by , 14 years ago
Ok, you are right. I always try to read fully the documentation, but a simple search today showed me this:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#distinct
... (Could you mark this as RTFM???)
comment:3 by , 14 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Replying to Saverio Trioni <strioni@flumotion.com>:
... (Could you mark this as RTFM???)
Invalid is as close as we have to that.
I'm going to mark this wontfix. You're correct that results will be returned more than once, but that's a leaky abstraction born of the relational nature of SQL. There are many places and queries where you can get duplicate results -- all you need to do is traverse more than one join. Although your dataset evidently doesn't reveal this, each of your standalone queries would be prone to duplicate results, under the right conditions.