#34993 closed Uncategorized (duplicate)
Unexpected behaviour when filtering across mult-value (aka tomany) joins.
Reported by: | Gordon Wrigley | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.1 |
Severity: | Normal | Keywords: | |
Cc: | Julius Seporaitis | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
My overall questions here are:
1: does the below all look correct / expected
2: how should I be writing this queryset
3: what are the general guidelines I should follow when doing this kind of thing
I have some models the important bit's of which look like this
DataExport pass DataImport from_data_export = ForeignKey( DataExport, related_name="to_data_imports", null=True, ) Run data_import = ForeignKey( DataImport, related_name="runs", null=False, ) wet = BooleanField(null=False) succeeded = BooleanField(null=True)
The basic idea here is stuff gets exported out of one part of the system.
And then will be imported into another part of the system.
And those imports happen in runs that can be dry/wet and success/fail.
I'm trying to write a query to find exports that do not have successful wet import runs and the SQL it's producing is surprising in a variety of ways.
Below I'm going to show a series of querysets, with the associated SQL and some comments.
This is all on Django 4.1.13, I couldn't find anything in release notes that would suggest any of this has changed since then.
DataExport.objects.exclude( to_data_imports__runs__wet=True, to_data_imports__runs__succeeded=True, )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."succeeded" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) AND EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."wet" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) )
This is the most concise expression of intent. And so it's not working as desired is a trap for beginners. But it is consistent with where the docs say "conditions in a single exclude() call will not necessarily refer to the same item".
DataExport.objects.exclude( Q( to_data_imports__runs__wet=True, to_data_imports__runs__succeeded=True, ) )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."succeeded" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) AND EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."wet" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) )
AFAIK the docs don't say anything either way on what this should do.
DataExport.objects.filter( ~Q( to_data_imports__runs__wet=True, to_data_imports__runs__succeeded=True, ) )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."succeeded" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) AND EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" U1 INNER JOIN "data_import_run" U2 ON (U1."id" = U2."data_import_id") WHERE ( U2."wet" AND U1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) )
I am surprised this didn't have the desired result.
DataExport.objects.exclude( to_data_imports__in=DataImport.objects.filter( runs__wet=True, runs__succeeded=True, ) )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" V1 WHERE ( V1."id" IN ( SELECT U0."id" FROM "retain_dataimport" U0 INNER JOIN "data_import_run" U1 ON (U0."id" = U1."data_import_id") WHERE (U1."succeeded" AND U1."wet") ) AND V1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) )
This one does what I want but the nested query structure seems excessive, particularly the dual FROM "retain_dataimport"
, I'm not sure if the planner will do a good job of that.
DataExport.objects.exclude( to_data_imports__runs__in=Run.objects.filter( wet=True, succeeded=True, ) )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "retain_dataimport" V1 INNER JOIN "data_import_run" V2 ON (V1."id" = V2."data_import_id") WHERE ( V2."id" IN ( SELECT U0."id" FROM "data_import_run" U0 WHERE (U0."succeeded" AND U0."wet") ) AND V1."from_data_export_id" = ("approval_dataexport"."id") ) LIMIT 1 ) )
Basically the same as the above except now the dual selects are on data_import_run.
DataExport.objects.exclude( Exists( Run.objects.filter( data_import__from_data_export__id=OuterRef('id'), wet=True, succeeded=True, ) ) )
SELECT <snip> FROM "approval_dataexport" WHERE NOT ( EXISTS( SELECT 1 AS "a" FROM "data_import_run" U0 INNER JOIN "retain_dataimport" U1 ON (U0."data_import_id" = U1."id") WHERE ( U1."from_data_export_id" = ("approval_dataexport"."id") AND U0."succeeded" AND U0."wet" ) LIMIT 1 ) )
This one is starting to look decent
Change History (4)
comment:1 by , 13 months ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Version: | 4.2 → 4.1 |
comment:2 by , 13 months ago
Cc: | added |
---|
comment:3 by , 13 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:4 by , 13 months ago
Could you direct me to that documention? Are you referring to the second note here https://docs.djangoproject.com/en/4.2/topics/db/queries/#spanning-multi-valued-relationships
Should my mental model be "always use subqueries when I want two fields off the same multi value relation"?
You didn't address the deeply nested and redundant inner query behaviour, would you like me to reraise that somewhere else?
This was discussed many times, e.g. #27936. It's recommended and documented to use subqueries in such cases. In the future, start by asking questions on support channels before creating a ticket.
Docs cannot describe every possible configuration,
exclude(filters)
works asexclude(Q(filters))
.