Opened 12 years ago
Closed 12 years ago
#19837 closed Bug (fixed)
Regression in QuerySet.exclude and many to many
Reported by: | Tim Graham | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Release blocker | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The refactorings in #10790 appears to have cause an issue with QuerySet.exclude and many to many fields.
Attached is a regression test which demonstrates the problem. The test passes using checkout f811649710fb51e48217e9a78991735977decfd8, but fails against the next changeset (the ticket I mentioned above) through master.
In the SQL query from the test, you can see that the "WHERE NOT" clause compares program IDs to a subquery that retrieves identifier IDs: queries_program"."id" IN (SELECT U1."identifier_id
SELECT "queries_identifier"."id", "queries_identifier"."name" FROM "queries_identifier" LEFT OUTER JOIN "queries_program" ON ( "queries_identifier"."id" = "queries_program"."identifier_id" ) WHERE NOT (( "queries_program"."id" IN ( SELECT U1."identifier_id" FROM "queries_program" U1 INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id") WHERE (U2."channel_id" = 1 AND U1."identifier_id" IS NOT NULL) ) AND NOT ("queries_program"."identifier_id" IS NULL) AND "queries_program"."id" IS NOT NULL AND "queries_program"."id" IS NOT NULL )) ORDER BY "queries_identifier"."name" ASC
Attachments (1)
Change History (6)
by , 12 years ago
Attachment: | 19837-test.diff added |
---|
comment:1 by , 12 years ago
Severity: | Normal → Release blocker |
---|
comment:2 by , 12 years ago
comment:3 by , 12 years ago
A patch is available from https://github.com/akaariai/django/compare/ticket_19837.
The regression was caused by using different split position in the inner and outer query, that is the trimmed_prefix in split_exclude() didn't match the query generated by set_start().
The patch does a bit larger refactor. I do think the patch now does something that can be reasoned to be correct, whereas before in master the correctness was based on wishful thinking.
Compared to 1.5.x the split_exclude() is more complex. In 1.5 the complexity however was handled by trim_joins(), and if you look at trim_joins() now and in 1.5 there is clear improvement. At least now the complexity is confined into split_exclude().
The patched code seems to generate a little bit more efficient queries. Compare patched:
SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id" FROM "queries_tag" WHERE NOT ( ("queries_tag"."parent_id" IN ( SELECT U2."tag_id" FROM "queries_annotation" U2 WHERE (U2."name" = a1 AND U2."tag_id" IS NOT NULL) ) AND "queries_tag"."parent_id" IS NOT NULL)) ORDER BY "queries_tag"."name" ASC
to 1.5
SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id" FROM "queries_tag" LEFT OUTER JOIN "queries_tag" T2 ON ("queries_tag"."parent_id" = T2."id") WHERE NOT ( ("queries_tag"."parent_id" IN ( SELECT U2."tag_id" FROM "queries_annotation" U2 WHERE (U2."name" = a1 AND U2."tag_id" IS NOT NULL) ) AND "queries_tag"."parent_id" IS NOT NULL AND T2."id" IS NOT NULL)) ORDER BY "queries_tag"."name" ASC
(from queries.test_tickets_8921_9188(), print(Tag.objects.exclude(parent__annotation__name="a1").query))
the extra join to queries_tag T2 seems non-necessary. If there isn't anything in T2 for the join, then queries_tag.parent_id is referencing something that doesn't exists in the DB, and this is a foreign key constraint violation. Thus the trim is legal here. And of course if something exists, then the whole join is just noise.
Also, compare:
SELECT "queries_identifier"."id", "queries_identifier"."name" FROM "queries_identifier" LEFT OUTER JOIN "queries_program" ON ("queries_identifier"."id" = "queries_program"."identifier_id") WHERE NOT (("queries_program"."id" IN ( SELECT U2."program_id" FROM "queries_channel_programs" U2 WHERE (U2."channel_id" = 1 AND U2."program_id" IS NOT NULL) ) AND NOT ("queries_program"."identifier_id" IS NULL) AND "queries_program"."id" IS NOT NULL)) ORDER BY "queries_identifier"."name" ASC
to
SELECT "queries_identifier"."id", "queries_identifier"."name" FROM "queries_identifier" WHERE NOT (("queries_identifier"."id" IN ( SELECT U1."identifier_id" FROM "queries_program" U1 INNER JOIN "queries_channel_programs" U2 ON (U1."id" = U2."program_id") WHERE (U2."channel_id" = 1 AND U1."identifier_id" IS NOT NULL) ) AND "queries_identifier"."id" IS NOT NULL)) ORDER BY "queries_identifier"."name" ASC
(this is from this ticket's test)
Here the subquery's join is moved to the original query, and thus the subquery is simpler. The hope is that this is more efficient to execute. In addition the original query's join is now reusable. Sadly the join is LEFT OUTER with a NOT NULL where condition. It would be nice to fix that, but not this ticket's problem...
comment:5 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Just to be extra-clear for anyone scanning release blockers for 1.5: the patch for #10790 is not present on the 1.5.x branch, so while this is a release blocker, it only blocks 1.6, not 1.5.