Opened 18 years ago

Last modified 4 months ago

#2361 assigned Bug

QuerySet.filter(m2mfield__isnull=False) may return duplicates

Reported by: daniel.tietze@… Owned by: Norbert Stüken
Component: Database layer (models, ORM) Version: dev
Severity: normal Keywords:
Cc: Fabio Sangiovanni Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

My Django installation is SVN Revision: 3238

Filtering models by a ManyToMany field does not appear to work.
See Google groups thread http://groups.google.com/group/django-users/browse_thread/thread/d0d799a45cb92d35

I have two classes -- Blog and Submission:

class Blog( models.Model ):
    entries = models.ManyToManyField( Submission )

class Submission( models.Model ):
  [... whatever ]

I want to fetch a list of all Blog instances which have at least one
Submission , i.e. entries.count() > 0.

The suggested

Blog.objects.filter(entries__isnull = False)

(suggested by Malcolm Tredinnick) returns:

psycopg.ProgrammingError: FEHLER:  Spalte
m2m_models_blog__entries.entries existiert nicht

SELECT
"models_blog"."id","models_blog"."title","models_blog"."description","models_blog"."region_id","models_blog"."regionname","models_blog"."date_submitted","models_blog"."author_id","models_blog"."visible"
FROM "models_blog" LEFT OUTER JOIN "models_blog_entries" AS
"m2m_models_blog__entries" ON "models_blog"."id" =
"m2m_models_blog__entries"."blog_id" WHERE
("m2m_models_blog__entries"."entries" IS NOT NULL) 

m2m_models_blogentries is an alias for models_blog_entries -- I
*have* this table, but it has no column "entries". This is what it looks like:

# \d models_blog_entries

Tabelle »public.models_blog_entries«
    Spalte     |   Typ   |                              Attribute
---------------+---------+---------------------------------------------------------------------
 id            | integer | not null default
nextval('public.models_blog_entries_id_seq'::text)
 blog_id       | integer | not null
 submission_id | integer | not null
Indexe:
    »models_blog_entries_pkey« PRIMARY KEY, btree (id)
    »models_blog_entries_blog_id_key« UNIQUE, btree (blog_id,
submission_id)
Fremdschlüssel-Constraints:
    »models_blog_entries_blog_id_fkey« FOREIGN KEY (blog_id)
REFERENCES models_blog(id)
    »models_blog_entries_submission_id_fkey« FOREIGN KEY
(submission_id) REFERENCES models_submission(id)

Also, if I do this:

Blog.objects.filter(entries__blog__id__isnull = False)

I get a list of Blogs which have entries, however I have duplicates in
the list, one for each Submission which is in a M2M relationship (i.e.
each Blog is shown n times if it has n entries).

Attachments (1)

2361-test.diff (600 bytes ) - added by Tim Graham 9 years ago.

Download all attachments as: .zip

Change History (24)

comment:1 by anonymous, 18 years ago

Can you please attach a model file that fails. The example fragment you have posted works correctly for me (with three different databases, so it's not backend dependent). Without a failing example this will not be easy to fix.

A couple of side notes about things not relevant to the problem at hand: You seem to have called your applcation "models". This is certainly going to cause problems because doing anything like from models import ... inside the models application is ambiguous (you will have a models.py file in there as well). Also, the multiple results you are seeing is completely unrelated and correct behaviour: there are multiple results in the database so we return them all. That is why distinct() exists for QuerySets.

comment:2 by mir@…, 18 years ago

Hi Daniel,

in the meantime a lot of stuff in m2m fields has changed. Has this bug been resolved, or is it still there?

comment:3 by Chris Beaven, 18 years ago

Resolution: invalid
Status: newclosed

Closing. If anyone's having this problem still, please reopen.

comment:4 by dangerman200k, 9 years ago

Easy pickings: unset
Resolution: invalid
Status: closednew
UI/UX: unset

As far as I can tell I am having this issue too, reopened.

Unlike Daniel I can run Blog.objects.filter(entries__isnull = False) but I get the same result Daniel described as such:
"I get a list of Blogs which have entries, however I have duplicates in
the list, one for each Submission which is in a M2M relationship (i.e.
each Blog is shown n times if it has n entries)."

I can lookup something like Blog.objects.filter(entries__name = 'blah') just fine, but the isnull is returning the duplicate results and that's what I want to use...

comment:5 by Tim Graham, 9 years ago

Summary: Filtering on count of ManyToManyFieldQuerySet.filter(m2mfield__isnull=False) may return duplicates
Triage Stage: UnreviewedAccepted
Type: defectBug

I'm not positive this can be fixed and shouldn't just be solved by using .distinct() but if that's the case, maybe the documentation could be clarified.

I'm attaching a test which currently passes but demonstrates that duplicates are returned in the results.

by Tim Graham, 9 years ago

Attachment: 2361-test.diff added

comment:6 by Calvin DeBoer, 7 years ago

Owner: changed from Adrian Holovaty to Calvin DeBoer
Status: newassigned

comment:7 by Almaz Kunpeissov, 4 years ago

Not relevant anymore.

Model.objects.filter(ManyToManyField__isnull=False) In Django 3.1 works properly.

Tested here:
https://github.com/almazkun/django_ticket_2361

However, in Django 1.11 shows duplicates as described above.

comment:8 by Almaz Kunpeissov, 4 years ago

Resolution: fixed
Status: assignedclosed

comment:9 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: closednew

Almaz, thanks for checking this ticket, however it still doesn't work for me. Have you checked test attached by Tim? It returns duplicates on master:

>>> Item.objects.filter(tags__isnull=False)
[<Item: four>, <Item: one>, <Item: one>, <Item: two>, <Item: two>]

comment:10 by Aleks-Daniel Jakimenko-Aleksejev, 2 years ago

Hit the same bug in 3.2.13 (and was *very* surprised!). Adding .distinct() helps, but it feels like a workaround.

comment:11 by Mariusz Felisiak, 2 years ago

Owner: Calvin DeBoer removed
Status: newassigned

comment:12 by Mariusz Felisiak, 2 years ago

Status: assignednew

comment:13 by Norbert Stüken, 2 years ago

Owner: set to Norbert Stüken
Status: newassigned

comment:14 by Norbert Stüken, 2 years ago

I was able to reproduce the problem in Django 4.2.dev20220923141503 and have adapted and extended Tim 's test a bit:

    def test_ticket_2361(self):
        """
        Prove that QuerySet.filter(m2mfield__isnull=False) may return duplicates.
        """
        # Tags without items are returned, but several times if they link to multiple tags.
        self.assertQuerysetEqual(
            Item.objects.filter(tags__isnull=False),
            ['<Item: four>', '<Item: one>', '<Item: one>', '<Item: two>', '<Item: two>'],
            transform=repr
        )

        # Adding distinct helps, but feels like a workaround
        self.assertQuerysetEqual(
            Item.objects.filter(tags__isnull=False).distinct(),
            ['<Item: four>', '<Item: one>', '<Item: two>'],
            transform=repr
        )

To finally close the ticket after 16 years, it needs a decision from the Django team with the following options:

  1. We change the ORM in such a way that it automatically executes a distinct() in the described edge case. This would be a breaking change.
  2. We don't change anything in the Django code, but add a note to the Django documentation that in this particular case the unexpected behavior may occur and can be fixed with a distinct() .
  3. Another solution is found.

comment:15 by Carlton Gibson, 2 years ago

Hi Norbert. Thanks for looking at this!

An option may be to document the current situation whilst also waiting for the other solution, so 2 and 3.

Would you like to prepare a suggestion for 2?

comment:16 by Norbert Stüken, 2 years ago

Hi Carlton,

it took me a while to work through the contributor documentation, but here are my suggested changes:
https://github.com/django/django/compare/main...stueken:django:ticket_2361

The first commit proves the mentioned behavior in Django, the second adds a note with an example to the documentation.

Could you have a look at it?

comment:17 by Mariusz Felisiak, 2 years ago

The current behavior is already documented in "Spanning multi-valued relationship" section. I think there is no need to document the same thing twice 🤔.

comment:18 by Norbert Stüken, 2 years ago

Hi Mariusz,

both examples address potential duplicates when using m2m relationships. However, the already documented example at the end of the "Spanning multi-valued relationship" section states that the duplicates are yielded since multiple filters are chained and therefore resulting in multiple joins compared to using only one filter.

My example does not address multiple filters, but shows the quite unexpected behavior of m2m relations using just one filter which is the reason this ticket was created.
I can't follow this behavior out of the example in "Spanning multi-valued relationships".

As stated, in my opinion, both examples show different screnarios of how m2m relationships can yield duplicates. Maybe linking both examples to each other and explaining their difference would make it clearer.

comment:19 by Carlton Gibson, 2 years ago

FWIW I also didn't read the "Spanning multi-valued relationships" filter(A).filter(B) vs filter(A, B) example as entailing the filter(m2m__isnull) example here.
(I'd have to sit down with a piece of paper to see why that's equivalent. 🙂)

Maybe expanding that section though, so it's on topic (by section title at least)?

comment:20 by Norbert Stüken, 2 years ago

Thanks for the quick replies!

I like your example namings, very clear (could be used for example namings):

  • Example 1: filter(A).filter(B) vs filter(A, B)
  • Example 2: filter(m2m__isnull)

I think, both examples are placed correctly under the respective headings and would possibly be missed if put below the other section:

  • Making Queries > Retrieving objects > Lookups that span relationships > Spanning multi-valued relationships --> Note with example 1 for possible duplicates when using multiple filters
  • Making Queries > Related objects > Many-to-many relationships --> Note with example 2 for possible duplicates when using isnull in a filter

So if the examples address different causes for duplicate entries in m2m relationships, I would prefer to link the examples to each other.

comment:21 by Fabio Sangiovanni, 15 months ago

Cc: Fabio Sangiovanni added

comment:22 by Csirmaz Bendegúz, 4 months ago

I think this is a well-known Django "gotcha", distinct() has a big performance impact so I would definitely advise against it.

comment:23 by Simon Charette, 4 months ago

FWIW the most straightforward way to work around this issue today is to use Exists instead

Blog.objects.filter(
    Exists(Entry.objects.filter(blog=OuterRef("pk")))
)

Unfortunately the framework doesn't allow you to register transforms on related fields lookups otherwise this could be as simple as

Blog.objects.filter(entries__exists=True)

Here's a very stale and early attempt at playing this concept if anyone is interested in trying.

My thoughts at the time was that allowing such transforms could greatly reduce a lot of the boilerplate associated with multi-valued relationships and filtering in hope to eventually resolve #2361.

For example

Blog.objects.filter(entries__exists=Q(published=True))
# Instead of
Blog.objects.filter(
    Exists(Entry.objects.filter(blog=OuterRef("pk"), published=True)))
)

Blog.objects.filter(entries__count__gte=10)
# Instead of
Blog.objects.annotate(
    entries__count=Count("entries")
).filter(entries__count__gte=10)
Last edited 4 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top