#34845 closed Cleanup/optimization (duplicate)
Slow query when using exclude
Reported by: | Amin Aminian | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | exclude, filter |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi!
I have a code snippet like this:
qs = Order.objects.exclude( payment__isnull=True )
Which creates the following SQL:
SELECT "order"."id", ... FROM "order" WHERE NOT (EXISTS (SELECT 1 AS "a" FROM "order" U0 LEFT OUTER JOIN "payment" U1 ON (U0."id" = U1."payable_id" AND (U1."payable_type_id" = 30)) WHERE (U1."id" IS NULL AND U0."id" = ("order"."id")) LIMIT 1)) LIMIT 10
(This is part of original query)
In my large DB, this section increases response time about 10s.
By changing exclude
to filter
, we will not have subquery anymore:
qs = Order.objects.filter( payment__isnull=False )
And the SQL:
SELECT "order"."id", ... FROM "order" INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id" AND ("payment"."payable_type_id" = 30)) WHERE "payment"."id" IS NOT NULL LIMIT 10
And with this, I don't face high response time anymore and everything is fine.
Why Django uses subquery when using exclude
?
I just wanted to make sure it was intended.
Change History (9)
follow-up: 3 comment:1 by , 15 months ago
comment:2 by , 15 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:3 by , 15 months ago
Replying to David Sanders:
Hello,
Trac isn't the right place to get help using Django, please refer to one of our friendly communities where someone will be able to help you: https://www.djangoproject.com/community/
Thanks!
Thanks for your response David.
But I wasn't looking for help, I think it can be a performance issue ...
comment:4 by , 15 months ago
So there's a couple of things here:
You asked:
Why Django uses subquery when using exclude ?
Questions are usually interpreted as seeking help ;) Please refer all questions to the community and not Trac ;)
I just wanted to make sure it was intended.
We're aware of some possible performance issues related to the way Django renders NOT EXISTS
, eg: https://code.djangoproject.com/ticket/34597
comment:5 by , 15 months ago
PS: Please read the thread for #34597 as there may be some useful information for you. Be aware that there was a small patch for that which looks like it will be released in the upcoming 5.0 release.
follow-up: 9 comment:7 by , 15 months ago
Amin, you haven't provided the full picture here.
I'm not sure if you're using FilteredRelation
, extra
, or a third party application that does so implicitly for you but the ORM won't insert a AND ("payment"."payable_type_id" = 30)
in the JOIN
from a simple
class Order(models.Model): pass class Payment(models.Model): order = models.ForeignKey(Order) qs = Order.objects.exclude( payment__isnull=True )
Something else is at play here and I'm pretty sure you are running into a lack of split exclude handling optimisations for __null
when dealing with FilteredRelation
or something else (e.g. third-party app) that makes use of it and not #34597 directly.
comment:8 by , 15 months ago
To add to the above, assuming you are using annotate(payment=FilteredRelation("payments", Q(payments__payable_type=payable_type)))
or an equivalent, the reason why the ORM cannot simply use a LEFT OUTER JOIN
instead of NOT EXISTS
when doing exclude(payment__isnull=False)
is that it lacks introspection capabilities into FilteredRelation.condition
to be sure that the condition added to the JOIN
wouldn't interfere.
In a sense
INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id" AND ("payment"."payable_type_id" = 30)) WHERE "payment"."id" IS NOT NULL
is a tautology. The WHERE "payment"."id" IS NOT NULL
part is unnecessary given the usage of an INNER JOIN
already ensures that "payment"."id" IS NOT NULL
since primary keys cannot be null.
Normally the ORM would be able to know that "payment"."id"
cannot be null as it's a primary key but I suspect the usage of a FilteredRelation
prevents the ORM from knowing.
What #34597 did is switch the query from using "order"."id" NOT IN (subquery using LEFT JOIN)
to NOT EXISTS(subquery using LEFT JOIN referring to "order"."id" )
so if you can confirm that your query is faster with NOT IN
than with EXISTS
it would be great to have another data point added to #34597.
comment:9 by , 15 months ago
Replying to Simon Charette:
Amin, you haven't provided the full picture here.
I'm not sure if you're using
FilteredRelation
,extra
, or a third party application that does so implicitly for you but the ORM won't insert aAND ("payment"."payable_type_id" = 30)
in theJOIN
from a simple
Hi Simon,
You are right, I've not provided full detail.
My models is sth like this:
class Order(models.Model); ... payment = GenericRelation( "Payment", related_query_name="payment", object_id_field="payable_id", content_type_field="payable_type", ) class Payment(models.Model): ... payable_type = models.ForeignKey( ContentType, on_delete=models.PROTECT, verbose_name=_("payable type") ) payable_id = models.PositiveIntegerField(verbose_name=_("payable id")) payable_object = GenericForeignKey("payable_type", "payable_id")
Also note that the ORM the I provided earlier, is not the full ORM, because I don't know if I'm allowed to share that here or not (it's for company).
The thing was that I had a slow query, which was taking about 15s, and fixed that just by using filter
instead of exclude
, so I just provided that section of the query.
Hello,
Trac isn't the right place to get help using Django, please refer to one of our friendly communities where someone will be able to help you: https://www.djangoproject.com/community/
Thanks!