#31535 closed Bug (invalid)
Calling annotate with a case statement that references a recursive many to many relationship creates duplicates.
Reported by: | David Cooke | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
So I have a Challenge model that has an unlocks field which is a recursive, asymmetric many to many field with a reverse name of unlocked_by. I call annotate on it to add an "unlocked" field which checks if the challenges unlocked_by contains any ids in a list of solved challenges, doing this inserts all the solved challenges into the queryset that annotate is being called on. There's an example of this here: https://github.com/david-cooke/djangobug/blob/master/bug/tests.py
The sql being run is
SELECT "bug_challenge"."id", "bug_challenge"."name", CASE WHEN "bug_challenge_unlocks"."from_challenge_id" IN (5, 1) THEN True ELSE False END AS "unlocked" FROM "bug_challenge" LEFT OUTER JOIN "bug_challenge_unlocks" ON ("bug_challenge"."id" = "bug_challenge_unlocks"."to_challenge_id")
so it seems like the left outer join is adding the duplicates although thats needed for the case statement.
It is possible to work around this by calling value_list(flat=True).distinct('pk') however isn't compatible with more complicated examples such as https://github.com/ractf/core/blob/master/challenge/views.py#L50
Change History (3)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Summary: | Calling annotate with a case statement that references a recursive many to many relationship on the same model as the queryset will create duplicates → Calling annotate with a case statement that references a recursive many to many relationship creates duplicates. |
comment:3 by , 5 years ago
The ORM is able to optimize .filter(unlocked_by__in=solved_challenges)
into an INNER JOIN
but it cannot do the same for annotations because they could be doing some special handling of NULL
and it would require some pretty sophisticated introspection to determine it's safe to do. It would also be backward incompatible at this point. You should rely on Subquery
or even Exists
in your reported case instead.
This is documented behavior (see the warning in the QuerySet.values() docs):
You can use
Subquery()
or.distinct(...)
to avoid duplicate.