Context Navigation

← Previous Ticket
Next Ticket →

#34597 new Cleanup/optimization

Queryset (split) exclude's usage of Exists is significantly slower than subquery

Reported by:	Lorand Varga	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	3.2
Severity:	Normal	Keywords:
Cc:	Simon Charette, David Sanders	Triage Stage:	Accepted
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

My app has 2 models (making things simple so debugging is easier):

class Blog(models.Model):
  title = models.CharField(max_length=60, blank=True)
  is_published = models.BooleanField(default=generate_random_bool, db_index=True)

class Translation(models.Model):
  blog = models.ForeignKey('Blog', on_delete=models.CASCADE, related_name='translation')

There is a reverse foreign key relation between translation and Blog. Also a blog can have multiple translations (or none).

I have this code block:

from django.db.models import Q
qs = Blog.objects.filter(Q(is_published=True) & ~Q(translation=None))
print(qs.query)

django 2.2 ORM generates this output:

SELECT 
  "blog"."id", 
  "blog"."title", 
  "blog"."is_published"
FROM 
  "blog" 
WHERE 
  (
    "blog"."is_published" = True 
    AND NOT (
      "blog"."id" IN (
        SELECT 
          U0."id" 
        FROM 
          "blog" U0 
          LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = ("blog"."id")
          )
      )
    )
  )

For the same code block the django 3.2+ ORM (tried also the latest version of django 4.2.1) it generates:

SELECT 
  "blog"."id", 
  "blog"."title", 
  "blog"."is_published"
FROM 
  "blog" 
WHERE 
  (
    "blog"."is_published" 
    AND NOT (
      EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "blog" U0 
          LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = "blog"."id" 
            AND "blog"."id" = "blog"."id"
          ) 
        LIMIT 
          1
      )
    )
  )

which is a whole lot slower (at least on the same postgresql 13.7 instance).
The django 2.2 version executes in a couple of seconds while the newer version executes in half an hour.

I think (but am not completely sure, might be terribly wrong) that the issue was introduced here:
https://github.com/django/django/pull/13300/files

Environment:
asgiref==3.7.1
backports.zoneinfo==0.2.1
Django==3.2.19
model-bakery==1.11.0
psycopg2-binary==2.8.6
pytz==2023.3
sqlparse==0.4.4
typing_extensions==4.6.1

Db environment:
PostgreSQL 13.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit

Bug does not manifest in django 2.2.12.
Bug manifests in django 3.2 and django 4.2.

Change History (14)

comment:1 by Simon Charette, 18 months ago

Cc:	Simon Charette added
Resolution:	→ needsinfo
Severity:	Release blocker → Normal
Status:	new → closed
Type:	Bug → Cleanup/optimization
Version:	4.2 → 3.2

More likely to be related to #32143 (8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1) which switched to using EXISTS over NOT IN and was already part of Django 3.2.

AFAIK this is the only report in three years about this change making things slower and the fact it jumped from a couple of seconds to hours makes me think there might be something else at play here (lack of analyze?). FWIW this change was advised by this article and examples in the wild of NOT IN performing poorly.

Without more details such as the Postgres execution plans (use EXPLAIN on the query) it's very hard to provide any guidance here so I'll close as needsinfo for now as I cannot reproduce any form of slowdown with the provided model.

The AND "blog"."id" = "blog"."id" part seems definitely fishy though so it'd be great to see if you get fast results without it as that appear to be a bug. You can try it out by doing

Blog.objects.filter(
    Q(is_published=True) & ~Exists(Blog.objects.filter(translation=None, id=OuterRef("id")))
)

I'd also be curious to hear if the following is faster as it should be equivalent

Blog.objects.filter(
    Q(is_published=True) & Exists(Translation.objects.filter(id=OuterRef("id")))
)

In the mean time you can use ~Q(id__in=Blog.objects.filter(translation=None, id=OuterRef("id"))) instead of ~Q(translation=None) to restore the previous behaviour if you're blocked.

Last edited 18 months ago by Simon Charette (previous) (diff)

comment:2 by Lorand Varga, 18 months ago

Resolution:	needsinfo
Status:	closed → new

Adding more information:

Django 2.2 behavior:

EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" = True 
    AND "test_app_blog"."api_has_translation" = False 
    AND NOT (
      "test_app_blog"."id" IN (
        SELECT 
          U0."id" 
        FROM 
          "test_app_blog" U0 
          LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = ("test_app_blog"."id")
          )
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC

Sort  (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual time=888.530..888.533 rows=4 loops=1)
  Sort Key: test_app_blog.updated_at DESC
  Sort Method: quicksort  Memory: 33kB
  ->  Index Scan using test_app_blog_is_published_4b47c652_uniq on test_app_blog  (cost=0.42..995257.82 rows=90293 width=1985) (actual time=72.414..888.496 rows=4 loops=1)
        Index Cond: (is_published = true)
        Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1)))
        Rows Removed by Filter: 191941
        SubPlan 1
          ->  Nested Loop Left Join  (cost=0.71..8.57 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=179902)
                Join Filter: (u0.id = u1.blog_id)
                Filter: (u1.id IS NULL)
                Rows Removed by Filter: 0
                ->  Index Only Scan using test_app_blog_pkey on test_app_blog u0  (cost=0.42..2.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=179902)
                      Index Cond: (id = test_app_blog.id)
                      Heap Fetches: 27765
                ->  Index Scan using test_app_translation_51c6d5db on test_app_translation u1  (cost=0.29..6.07 rows=5 width=8) (actual time=0.001..0.001 rows=0 loops=179902)
                      Index Cond: (blog_id = test_app_blog.id)
Planning Time: 0.500 ms
Execution Time: 888.606 ms

Django 3.2 (and 4.2) behavior:

EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND NOT (
      EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_blog" U0 
          LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = "test_app_blog"."id" 
            AND "test_app_blog"."id" = "test_app_blog"."id"
          ) 
        LIMIT 
          1
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC

 Nested Loop Anti Join  (cost=1001.15..131927.02 rows=180586 width=1985) (actual time=208264.409..3034435.256 rows=4 loops=1)
   Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id = test_app_blog.id))
   Rows Removed by Join Filter: 16853800739
   ->  Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on test_app_blog  (cost=0.42..107115.43 rows=180586 width=1985) (actual time=1.080..17454.750 rows=179902 loops=1)
         Filter: (is_published AND (NOT api_has_translation))
         Rows Removed by Filter: 27456
   ->  Materialize  (cost=1000.73..21651.35 rows=1 width=4) (actual time=0.002..9.009 rows=93684 loops=179902)
         ->  Gather Merge  (cost=1000.73..21651.34 rows=1 width=4) (actual time=6.024..192.937 rows=194102 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Merge Left Join  (cost=0.71..20651.20 rows=1 width=4) (actual time=0.046..87.573 rows=64701 loops=3)
                     Merge Cond: (u0.id = u1.blog_id)
                     Filter: (u1.id IS NULL)
                     Rows Removed by Filter: 19931
                     ->  Parallel Index Only Scan using test_app_blog_pkey on test_app_blog u0  (cost=0.42..13961.37 rows=86361 width=4) (actual time=0.018..32.796 rows=69119 loops=3)
                           Heap Fetches: 33004
                     ->  Index Scan using test_app_translation_51c6d5db on test_app_translation u1  (cost=0.29..6082.91 rows=60175 width=8) (actual time=0.025..32.500 rows=58545 loops=3)
 Planning Time: 0.882 ms
 Execution Time: 3034436.784 ms

As can be seen, there is a huge increase in execution time.
I can't vouch there isn't something else going on here also but currently the only thing changed is the django version.
Anything else in the environment is left untouched. (And django2.2 generated query is performant, while django 3.2 query is awfull).
I've managed to use your advice and modify the query to force the subquery to use IN but that leads to synthax that is more complex, while the django 2.2 synthax is a lot simpler: ~Q(translation=None)

The only synthax that manages (in django 3.2) to replicate the query correctly (and performance wise as in django2.2) seems to be

Blog.objects.filter(is_published=True, api_has_translation=False).exclude(id__in=Blog.objects.filter(translation=None, id=OuterRef("id")))

While not terrible I do think something has been broken in django since the ~Q(translation=None) seems a lot cleaner and more ORMish.

Thanks,
Lorand.

Last edited 18 months ago by Lorand Varga (previous) (diff)

comment:3 by Simon Charette, 18 months ago

Summary:	Django ORM query SQL generation changed (and is apparently a lot slower) → Queryset (split) exclude's usage of Exists is significantly slower than subquery

Thank you for providing query plans, this is helpful.

While it's undeniable that in this particular case the query is slower, Postgres wiki itself advise against using NOT IN so there is certainly something else to blame here.

The first thing that is suspicious to me is that the new query is using parallel workers which are know to be causing some slowdown when not appropriately configured. For example, if running the first query gets you close to your work_mem then the second one will definitely will cause spills to disk and could explain the significant difference in execution duration. What I see from the plan is that Postgres believes that it will be faster to parallelize the query but when it tries to materialize the results returned by workers it takes ages.

Tuning this parameter is definitely out of scope for this ticket as it dips into user / DBA support territory but it'd be great to confirm that it's the reason you are encountering this issue.

I've managed to use your advice and modify the query to force the subquery to use IN but that leads to synthax that is more complex, while the django 2.2 synthax is a lot simpler: ~Q(translation=None)

While not terrible I do think something has been broken in django since the ~Q(translation=None) seems a lot cleaner and more ORMish.

I've never argue otherwise, it was solely meant as a way to unblock you from upgrading from a Django version that has been unsupported for months and that I assumed you were eager to move away from.

As for what ORMish means that is open for interpretation. I would argue that ~Q(translation=None) is extremely ambiguous given how None doesn't map 1:1 to SQL NULL semantics that the objective of an ORM is to abstract SQL operations to do the right thing most of the time while providing escape hatches for the remaining cases. In this particular case I've provided you a reasoning about why these changes were made so hopefully it's clear to you why we need substantial evidence before reverting these changes given they have already been part of two LTS without prior reports of performance degradation.

In order to determine what might the origin your issue I'd ask you to run the following queries while preceding them with a SET max_parallel_workers_per_gather = 0

EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND NOT (
      EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_blog" U0 
          LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = "test_app_blog"."id" 
          ) 
        LIMIT 
          1
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC

EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_translation" U1
        WHERE
          U1."blog_id" = "test_app_blog"."id" 
        LIMIT 
          1
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC

comment:4 by David Sanders, 18 months ago

EDIT: Actually please disregard this, after some investigation I found the materialise is being forced by the conditions in the query, not the lack of memory.

The MATERIALIZE in the expensive plan indicates that Postgres doesn't think it has enough memory to do something more efficient. (Sometimes Postgres will choose something less optimal in order to materialise.)

Try turning off materialisation to see what plan it uses:

SET enable_material = 'off';

If it's able to use a more efficient plan then you could try adjusting your memory settings/adding more RAM.

Last edited 18 months ago by David Sanders (previous) (diff)

comment:5 by David Sanders, 18 months ago

Cc:	David Sanders added

comment:6 by David Sanders, 18 months ago

Posting some investigation here for charettes & others:

I suspect the additional redundant condition in the inner query that references the outer query is causing unwanted materialisation:

AND "test_app_blog"."id" = "test_app_blog"."id"

When I removed this – no more materialisation.

Additionally: Not sure why this is being added to the query but if you remove Q(is_published=True) so that you simply have Blog.objects.filter(~Q(translation=None)) then this redundant condition is no longer added.

I tested with the reporter's models on latest main & I'd wager this is something we can/may want to address.

Version 0, edited 18 months ago by David Sanders (next)

comment:7 by Simon Charette, 18 months ago

Thanks for the investigation David!

The AND "blog"."id" = "blog"."id" is definitely something we want to elide and was mentioned in comment:1 as a bug but the reporter said that they were also getting poor results with the provided ORM examples that explicitly don't have this criteria (e.g. ~Exists(Blog.objects.filter(translation=None, id=OuterRef("id"))) so I suggest we wait to see how well the SQL examples of comment:3 (that also don't have this criteria) perform for the reporter.

In other words, the AND "blog"."id" = "blog"."id" part is definitely a bug we want to solve but if we are going to address this issue it'd be great to confirm it's the only factor to blame.

Last edited 18 months ago by Simon Charette (previous) (diff)

comment:8 by David Sanders, 18 months ago

but the reporter said that they were also getting poor results with the provided ORM examples that explicitly don't have this criteria (e.g. ~Exists(Blog.objects.filter(translation=None, id=OuterRef("id")))

Interesting I tried that and the plan was better, though I don't have any data to work with.

Am eager to hear what the results are.

comment:9 by Lorand Varga, 18 months ago

Simon, I want to say that I really appreciate your support and patience on this. If I've been too annoying in my comments, it was not on purpose - I'm actually really curios if there is an underlying issue or not with django (since there are a lot of interactions going on). I'm not insisting for any revert unless needed and your arguments were solid and I'm very grateful for your support.

Getting back to the needy greedy details, turning off the workers does not improve anything.
The NOT(Exists):

SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND NOT (
      EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_blog" U0 
          LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = "test_app_blog"."id" 
          ) 
        LIMIT 
          1
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC
  
  
 Nested Loop Anti Join  (cost=1.13..132780.57 rows=180586 width=1985) (actual time=211403.273..3034890.176 rows=4 loops=1)
   Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id = test_app_blog.id))
   Rows Removed by Join Filter: 16853800739
   ->  Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on test_app_blog  (cost=0.42..107115.43 rows=180586 width=1985) (actual time=1.080..1356.688 rows=179902 loops=1)
         Filter: (is_published AND (NOT api_has_translation))
         Rows Removed by Filter: 27456
   ->  Materialize  (cost=0.71..22504.89 rows=1 width=4) (actual time=0.002..9.083 rows=93684 loops=179902)
         ->  Merge Left Join  (cost=0.71..22504.88 rows=1 width=4) (actual time=0.016..140.731 rows=194102 loops=1)
               Merge Cond: (u0.id = u1.blog_id)
               Filter: (u1.id IS NULL)
               Rows Removed by Filter: 59794
               ->  Index Only Scan using test_app_blog_pkey on test_app_blog u0  (cost=0.42..15170.42 rows=207267 width=4) (actual time=0.007..67.559 rows=207358 loops=1)
                     Heap Fetches: 33518
               ->  Index Scan using test_app_translation_51c6d5db on test_app_translation u1  (cost=0.29..6077.43 rows=59809 width=8) (actual time=0.005..30.515 rows=59794 loops=1)
 Planning Time: 0.936 ms
 Execution Time: 3034891.393 ms
(16 rows)

The Exists does work:

SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_translation" U1
        WHERE
          U1."blog_id" = "test_app_blog"."id" 
        LIMIT 
          1
      )
    )
ORDER BY 
  "test_app_blog"."updated_at" DESC
  
Sort  (cost=21037.38..21066.08 rows=11479 width=22) (actual time=63.818..63.820 rows=4 loops=1)
  Sort Key: test_app_blog.updated_at DESC
  Sort Method: quicksort  Memory: 25kB
  ->  Nested Loop  (cost=1501.37..20263.31 rows=11479 width=22) (actual time=17.974..63.805 rows=4 loops=1)
        ->  HashAggregate  (cost=1500.95..1632.70 rows=13175 width=4) (actual time=17.018..20.483 rows=13256 loops=1)
              Group Key: u1.blog_id
              Batches: 1  Memory Usage: 1425kB
              ->  Index Only Scan using test_app_translation_51c6d5db on test_app_translation u1  (cost=0.29..1351.42 rows=59809 width=4) (actual time=0.007..6.721 rows=59794 loops=1)
                    Heap Fetches: 735
        ->  Index Scan using test_app_blog_pkey on test_app_blog  (cost=0.42..1.47 rows=1 width=22) (actual time=0.003..0.003 rows=0 loops=13256)
              Index Cond: (id = u1.blog_id)
              Filter: (is_published AND (NOT api_has_translation))
              Rows Removed by Filter: 1
Planning Time: 0.703 ms
Execution Time: 63.872 ms

Also played with various values for work_mem and haven't seen any change in the postgres planning.

I want to also mention that David is on to something.

SET enable_material='off';
EXPLAIN ANALYZE
SELECT 
  "test_app_blog"."id", 
  "test_app_blog"."created_at", 
  "test_app_blog"."updated_at", 
  "test_app_blog"."is_published", 
  "test_app_blog"."api_has_translation" 
FROM 
  "test_app_blog" 
WHERE 
  (
    "test_app_blog"."is_published" 
    AND NOT "test_app_blog"."api_has_translation" 
    AND NOT (
      EXISTS(
        SELECT 
          (1) AS "a" 
        FROM 
          "test_app_blog" U0 
          LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") 
        WHERE 
          (
            U1."id" IS NULL 
            AND U0."id" = "test_app_blog"."id" 
          ) 
        LIMIT 
          1
      )
    )
  ) 
ORDER BY 
  "test_app_blog"."updated_at" DESC;


 Sort  (cost=143578.39..144029.85 rows=180585 width=22) (actual time=691.697..691.802 rows=4 loops=1)
   Sort Key: test_app_blog.updated_at DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Hash Anti Join  (cost=21645.24..125693.23 rows=180585 width=22) (actual time=306.930..691.769 rows=4 loops=1)
         Hash Cond: (test_app_blog.id = u0.id)
         ->  Index Scan using test_app_blog_is_published_4b47c652_uniq on test_app_blog  (cost=0.42..101768.51 rows=180586 width=22) (actual time=0.684..304.445 rows=179902 loops=1)
               Index Cond: (is_published = true)
               Filter: (NOT api_has_translation)
               Rows Removed by Filter: 12043
         ->  Hash  (cost=21644.81..21644.81 rows=1 width=4) (actual time=283.265..283.369 rows=194102 loops=1)
               Buckets: 262144 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 6145kB
               ->  Gather Merge  (cost=1000.73..21644.81 rows=1 width=4) (actual time=6.321..245.263 rows=194102 loops=1)
                     Workers Planned: 2
                     Workers Launched: 2
                     ->  Merge Left Join  (cost=0.71..20644.67 rows=1 width=4) (actual time=0.067..105.796 rows=64701 loops=3)
                           Merge Cond: (u0.id = u1.blog_id)
                           Filter: (u1.id IS NULL)
                           Rows Removed by Filter: 19931
                           ->  Parallel Index Only Scan using test_app_blog_pkey on test_app_blog u0  (cost=0.42..13961.37 rows=86361 width=4) (actual time=0.030..40.520 rows=69119 loops=3)
                                 Heap Fetches: 33632
                           ->  Index Scan using test_app_translation_51c6d5db on test_app_translation u1  (cost=0.29..6077.43 rows=59809 width=8) (actual time=0.032..42.470 rows=59627 loops=3)
 Planning Time: 2.751 ms
 Execution Time: 692.507 ms

Not sure yet why my postgres seems restricted (I remember it has "enough" RAM) but will get back with a comment once I find out more.

comment:10 by Simon Charette, 18 months ago

Thank you for your report Lorand and for providing all these details. I'm also curious about what might be causing these slowdowns.

Until we figure out what might be to blame for the materialization issues I started looking into the unnecessary AND "blog"."id" = "blog"."id" generation by generating the exact change in terms of SQL generation that 8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1 resulted in against the test suite.

The full result can be found in this gist and the issue manifests itself in two tests

I should have some time tomorrow to dig into the issue further.

Last edited 18 months ago by Simon Charette (previous) (diff)

comment:11 by Simon Charette, 18 months ago

I drafted a PR for the harmful constant expression comparison in the mean time and I'll see if there is a way to do saner NULL handling so no LEFT JOIN for the sole purpose of doing an IS NULL is necessary. I think it's optimizable when detecting a filter_rhs is None.

comment:12 by Mariusz Felisiak, 18 months ago

Triage Stage:	Unreviewed → Accepted

Tentatively accepted.

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 17 months ago

In 1c4f5f3:

Refs #32143 -- Removed superflous constraints on excluded query.

The outer query reference is not necessary when alias can be reused and
can even be harmful by confusing query planers.

Refs #34597.

comment:14 by Simon Charette, 9 months ago

Had a discussion with Postgres major contributor about this very issue and a query planer optimization in Postgres 16+ that might make the EXISTS solution worth investigating.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#34597 new Cleanup/optimization

Queryset (split) exclude's usage of Exists is significantly slower than subquery

Description

Change History (14)

comment:1 by Simon Charette, 18 months ago

comment:2 by Lorand Varga, 18 months ago

comment:3 by Simon Charette, 18 months ago

comment:4 by David Sanders, 18 months ago

comment:5 by David Sanders, 18 months ago

comment:6 by David Sanders, 18 months ago

comment:7 by Simon Charette, 18 months ago

comment:8 by David Sanders, 18 months ago

comment:9 by Lorand Varga, 18 months ago

comment:10 by Simon Charette, 18 months ago

comment:11 by Simon Charette, 18 months ago

comment:12 by Mariusz Felisiak, 18 months ago

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 17 months ago

comment:14 by Simon Charette, 9 months ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us