#28292 closed Bug (invalid)
ORing on ManyToMany to same model can result in duplicate items in queryset
Reported by: | Till Theato | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
One of my models has two ManyToManyFields to another model:
class Bar(models.Model): created = models.DateTimeField(auto_now_add=True) class Foo(models.Model): rel1 = models.ManyToManyField('Bar', related_name='foo_rel1') rel2 = models.ManyToManyField('Bar', related_name='foo_rel2')
I have to query all instances of Foo that point to the same Bar via any of the both relationships. However when an instance of Foo had multiple relationships to unrelated Bar instances, I got duplicated instances in the resulting queryset:
b1 = Bar.objects.create() b2 = Bar.objects.create() b3 = Bar.objects.create() f = Foo.objects.create() f.rel1.set([b2,b3]) f.rel2.set([b1]) assert Foo.objects.filter(rel2=b1).count() == 1 # Good assert Foo.objects.filter(rel1=b1).count() == 0 # Good qs = Foo.objects.filter(Q(rel1=b1) | Q(rel2=b1)) assert qs.count() == 1 # AssertionError, qs.count() == 2 print(qs.values('rel1', 'rel2')) # <QuerySet [{'rel1': 2, 'rel2': 1}, {'rel1': 3, 'rel2': 1}]>
As the last output shows, one entry per relationship in rel1 is returned.
Change History (5)
comment:1 by , 7 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 7 years ago
In the original code no ManyToMany fields are included in the values call. I added this only for documentation purposes, but that appears to make no sense. Sorry about that.
The actual problem lies one line ahead in the failing assertion: The count of the queryset is 2 instead of 1 as expected.
comment:3 by , 7 years ago
I believe you need to use QuerySet.distinct():
By default, a
QuerySet
will not eliminate duplicate rows. In practice, this is rarely a problem, because simple queries such asBlog.objects.all()
don’t introduce the possibility of duplicate result rows. However, if your query spans multiple tables, it’s possible to get duplicate results when aQuerySet
is evaluated. That’s when you’d usedistinct()
.
comment:4 by , 7 years ago
I am aware of Queryset.distinct(), however I could not believe that the observed behavior was expected. Before reporting this bug I already tried using distinct on the original query where the problem occurred, but this did not help because I calculate a cumulative sum in an annotation of the query using a custom Func:
class CumSum(Func): function = 'sum' template = '%(function)s(%(expressions)s) OVER (ORDER BY created)'
The annotation happens before distinctness is enforced and therefore the CumSum of entries after duplication is wrong.
Looks like I will have to pull that out of DB then – if the reported problem is really expected behavior.
comment:5 by , 7 years ago
I'll admit that I don't have a deep understanding of the ORM. If you can offer a patch showing why Django is at fault, feel free to reopen the ticket.
This is documented behavior. See the warning in the QuerySet.values() doc: