Opened 14 months ago

Closed 14 months ago

Last modified 14 months ago

#34845 closed Cleanup/optimization (duplicate)

Slow query when using exclude

Reported by: Amin Aminian Owned by: nobody
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: exclude, filter
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi!

I have a code snippet like this:

qs = Order.objects.exclude(
    payment__isnull=True
)

Which creates the following SQL:

SELECT "order"."id",
       ...
FROM "order"
WHERE NOT (EXISTS
             (SELECT 1 AS "a"
              FROM "order" U0
              LEFT OUTER JOIN "payment" U1 ON (U0."id" = U1."payable_id"
                                                                  AND (U1."payable_type_id" = 30))
              WHERE (U1."id" IS NULL
                     AND U0."id" = ("order"."id"))
              LIMIT 1))
LIMIT 10

(This is part of original query)

In my large DB, this section increases response time about 10s.

By changing exclude to filter, we will not have subquery anymore:

qs = Order.objects.filter(
    payment__isnull=False
)

And the SQL:

SELECT "order"."id",
    ...
FROM "order"
INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id"
                                            AND ("payment"."payable_type_id" = 30))
WHERE "payment"."id" IS NOT NULL
LIMIT 10

And with this, I don't face high response time anymore and everything is fine.

Why Django uses subquery when using exclude ?

I just wanted to make sure it was intended.

Change History (9)

comment:1 by David Sanders, 14 months ago

Hello,

Trac isn't the right place to get help using Django, please refer to one of our friendly communities where someone will be able to help you: https://www.djangoproject.com/community/

Thanks!

comment:2 by David Sanders, 14 months ago

Resolution: invalid
Status: newclosed

in reply to:  1 comment:3 by Amin Aminian, 14 months ago

Replying to David Sanders:

Hello,

Trac isn't the right place to get help using Django, please refer to one of our friendly communities where someone will be able to help you: https://www.djangoproject.com/community/

Thanks!

Thanks for your response David.

But I wasn't looking for help, I think it can be a performance issue ...

comment:4 by David Sanders, 14 months ago

So there's a couple of things here:

You asked:

Why Django uses subquery when using exclude ?

Questions are usually interpreted as seeking help ;) Please refer all questions to the community and not Trac ;)

I just wanted to make sure it was intended.

We're aware of some possible performance issues related to the way Django renders NOT EXISTS, eg: https://code.djangoproject.com/ticket/34597

comment:5 by David Sanders, 14 months ago

PS: Please read the thread for #34597 as there may be some useful information for you. Be aware that there was a small patch for that which looks like it will be released in the upcoming 5.0 release.

comment:6 by David Sanders, 14 months ago

Resolution: invalidduplicate

Possible duplicate of #34597

comment:7 by Simon Charette, 14 months ago

Amin, you haven't provided the full picture here.

I'm not sure if you're using FilteredRelation, extra, or a third party application that does so implicitly for you but the ORM won't insert a AND ("payment"."payable_type_id" = 30) in the JOIN from a simple

class Order(models.Model):
   pass

class Payment(models.Model):
    order = models.ForeignKey(Order)

qs = Order.objects.exclude(
    payment__isnull=True
)

Something else is at play here and I'm pretty sure you are running into a lack of split exclude handling optimisations for __null when dealing with FilteredRelation or something else (e.g. third-party app) that makes use of it.

Version 1, edited 14 months ago by Simon Charette (previous) (next) (diff)

comment:8 by Simon Charette, 14 months ago

To add to the above, assuming you are using annotate(payment=FilteredRelation("payments", Q(payments__payable_type=payable_type))) or an equivalent, the reason why the ORM cannot simply use a LEFT OUTER JOIN instead of NOT EXISTS when doing exclude(payment__isnull=False) is that it lacks introspection capabilities into FilteredRelation.condition to be sure that the condition added to the JOIN wouldn't interfere.

In a sense

INNER JOIN "payment" ON ("order"."id" = "payment"."payable_id"
                                            AND ("payment"."payable_type_id" = 30))
WHERE "payment"."id" IS NOT NULL

is a tautology. The WHERE "payment"."id" IS NOT NULL part is unnecessary given the usage of an INNER JOIN already ensures that "payment"."id" IS NOT NULL since primary keys cannot be null.

Normally the ORM would be able to know that "payment"."id" cannot be null as it's a primary key but I suspect the usage of a FilteredRelation prevents the ORM from knowing.

What #34597 did is switch the query from using "order"."id" NOT IN (subquery using LEFT JOIN) to NOT EXISTS(subquery using LEFT JOIN referring to "order"."id" ) so if you can confirm that your query is faster with NOT IN than with EXISTS it would be great to have another data point added to #34597.

in reply to:  7 comment:9 by Amin Aminian, 14 months ago

Replying to Simon Charette:

Amin, you haven't provided the full picture here.

I'm not sure if you're using FilteredRelation, extra, or a third party application that does so implicitly for you but the ORM won't insert a AND ("payment"."payable_type_id" = 30) in the JOIN from a simple

Hi Simon,
You are right, I've not provided full detail.
My models is sth like this:

class Order(models.Model);
    ...
    payment = GenericRelation(
            "Payment",
            related_query_name="payment",
            object_id_field="payable_id",
            content_type_field="payable_type",
    )


class Payment(models.Model):
    ...
    payable_type = models.ForeignKey(
        ContentType, on_delete=models.PROTECT, verbose_name=_("payable type")
    )
    payable_id = models.PositiveIntegerField(verbose_name=_("payable id"))
    payable_object = GenericForeignKey("payable_type", "payable_id")

Also note that the ORM the I provided earlier, is not the full ORM, because I don't know if I'm allowed to share that here or not (it's for company).

The thing was that I had a slow query, which was taking about 15s, and fixed that just by using filter instead of exclude, so I just provided that section of the query.

Note: See TracTickets for help on using tickets.
Back to Top