Opened 12 years ago

Closed 12 years ago

#19837 closed Bug (fixed)

Regression in QuerySet.exclude and many to many

Reported by: Tim Graham Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The refactorings in #10790 appears to have cause an issue with QuerySet.exclude and many to many fields.

Attached is a regression test which demonstrates the problem. The test passes using checkout f811649710fb51e48217e9a78991735977decfd8, but fails against the next changeset (the ticket I mentioned above) through master.

In the SQL query from the test, you can see that the "WHERE NOT" clause compares program IDs to a subquery that retrieves identifier IDs: queries_program"."id" IN (SELECT U1."identifier_id

SELECT "queries_identifier"."id", "queries_identifier"."name"
FROM "queries_identifier"
LEFT OUTER JOIN "queries_program" ON (
    "queries_identifier"."id" = "queries_program"."identifier_id"
) WHERE NOT ((
    "queries_program"."id" IN (
        SELECT U1."identifier_id"
        FROM "queries_program" U1 
        INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id")
        WHERE (U2."channel_id" = 1  AND U1."identifier_id" IS NOT NULL)
    ) AND NOT ("queries_program"."identifier_id" IS NULL)
    AND "queries_program"."id" IS NOT NULL 
    AND "queries_program"."id" IS NOT NULL
)) ORDER BY "queries_identifier"."name" ASC

Attachments (1)

19837-test.diff (1.9 KB ) - added by Tim Graham 12 years ago.

Download all attachments as: .zip

Change History (6)

by Tim Graham, 12 years ago

Attachment: 19837-test.diff added

comment:1 by Russell Keith-Magee, 12 years ago

Severity: NormalRelease blocker

comment:2 by Carl Meyer, 12 years ago

Just to be extra-clear for anyone scanning release blockers for 1.5: the patch for #10790 is not present on the 1.5.x branch, so while this is a release blocker, it only blocks 1.6, not 1.5.

comment:3 by Anssi Kääriäinen, 12 years ago

A patch is available from https://github.com/akaariai/django/compare/ticket_19837.

The regression was caused by using different split position in the inner and outer query, that is the trimmed_prefix in split_exclude() didn't match the query generated by set_start().

The patch does a bit larger refactor. I do think the patch now does something that can be reasoned to be correct, whereas before in master the correctness was based on wishful thinking.

Compared to 1.5.x the split_exclude() is more complex. In 1.5 the complexity however was handled by trim_joins(), and if you look at trim_joins() now and in 1.5 there is clear improvement. At least now the complexity is confined into split_exclude().

The patched code seems to generate a little bit more efficient queries. Compare patched:

SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
  FROM "queries_tag"
 WHERE NOT (
    ("queries_tag"."parent_id" IN (
        SELECT U2."tag_id"
          FROM "queries_annotation" U2
         WHERE (U2."name" = a1  AND U2."tag_id" IS NOT NULL)
    ) AND "queries_tag"."parent_id" IS NOT NULL))
 ORDER BY "queries_tag"."name" ASC

to 1.5

SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id"
  FROM "queries_tag"
  LEFT OUTER JOIN "queries_tag" T2 ON ("queries_tag"."parent_id" = T2."id")
 WHERE NOT (
    ("queries_tag"."parent_id" IN (
         SELECT U2."tag_id"
           FROM "queries_annotation" U2
          WHERE (U2."name" = a1  AND U2."tag_id" IS NOT NULL)
     ) AND "queries_tag"."parent_id" IS NOT NULL AND T2."id" IS NOT NULL))
 ORDER BY "queries_tag"."name" ASC

(from queries.test_tickets_8921_9188(), print(Tag.objects.exclude(parent__annotation__name="a1").query))

the extra join to queries_tag T2 seems non-necessary. If there isn't anything in T2 for the join, then queries_tag.parent_id is referencing something that doesn't exists in the DB, and this is a foreign key constraint violation. Thus the trim is legal here. And of course if something exists, then the whole join is just noise.

Also, compare:

SELECT "queries_identifier"."id", "queries_identifier"."name"
  FROM "queries_identifier"
  LEFT OUTER JOIN "queries_program" ON ("queries_identifier"."id" = "queries_program"."identifier_id")
 WHERE NOT (("queries_program"."id" IN (
     SELECT U2."program_id"
       FROM "queries_channel_programs" U2
      WHERE (U2."channel_id" = 1  AND U2."program_id" IS NOT NULL)
    ) AND NOT ("queries_program"."identifier_id" IS NULL)
  AND "queries_program"."id" IS NOT NULL)) ORDER BY "queries_identifier"."name" ASC

to

SELECT "queries_identifier"."id", "queries_identifier"."name"
  FROM "queries_identifier"
 WHERE NOT (("queries_identifier"."id" IN (
     SELECT U1."identifier_id"
       FROM "queries_program" U1
      INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id")
      WHERE (U2."channel_id" = 1  AND U1."identifier_id" IS NOT NULL)
    ) AND "queries_identifier"."id" IS NOT NULL))
 ORDER BY "queries_identifier"."name" ASC

(this is from this ticket's test)

Here the subquery's join is moved to the original query, and thus the subquery is simpler. The hope is that this is more efficient to execute. In addition the original query's join is now reusable. Sadly the join is LEFT OUTER with a NOT NULL where condition. It would be nice to fix that, but not this ticket's problem...

comment:4 by Tim Graham, 12 years ago

The patch fixes the issue in the app I discovered this in.

comment:5 by Anssi Kääriäinen <akaariai@…>, 12 years ago

Resolution: fixed
Status: newclosed

In b4492a8ca4a7ae4daa3a6b03c3d7a845fad74931:

Fixed #19837 -- Refactored split_exclude() join generation

The refactoring mainly concentrates on making sure the inner and outer
query agree about the split position. The split position is where the
multijoin happens, and thus the split position also determines the
columns used in the "WHERE col1 IN (SELECT col2 from ...)" condition.

This commit fixes a regression caused by #10790 and commit
69597e5bcc89aadafd1b76abf7efab30ee0b8b1a. The regression was caused
by wrong cols in the split position.

Note: See TracTickets for help on using tickets.
Back to Top