Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#33973 closed Uncategorized (needsinfo)

Performance regression when moving from 3.1 to 3.2

Reported by: Marc Parizeau Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: performance regression
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I am seing a sharp increase in execution time for some of my queries when moving from Django 3.1 to 3.2. And the performance hit appears to be the same for Django 4.

My backend is Postgres 14.2.

My Django project has forums for different types of content. The forum app consists essentially of 5 tables:

  1. a Post table that contains forum posts (essentially a text field);
  2. a Thread table where rows point to a specific content and post;
  3. a FollowUp table where rows point to a specific thread and post;
  4. a ThreadEntry table where rows point to a thread, a user, and the last seen thread post for this user;
  5. a FollowUpEntry table where rows point to a followup, a user, and the last seen followup post for this user.

Here is an example query that executes 2 times slower on 3.2 than on 3.1:

Content.objects.all().annotate(
  has_unread_posts=Greatest(
    # a content is unread if at least one condition is true
    Exists(
      # a thread has never been read (has no user entry)
      Thread.objects.filter(
        content=OuterRef('pk'),
      ).exclude(threadentry__user=user)
    ),
    Exists(
      # a thread entry is not up-to-date
      ThreadEntry.objects.filter(
        thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('thread__post'))
    ),
    Exists(
      # a followup has never been read
      FollowUp.objects.filter(
        thread__content=OuterRef('pk')
      ).exclude(followupentry__user=user)
    ),
    Exists(
      # a followup entry is not up-to-date
      FollowUpEntry.objects.filter(
        followup__thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('followup__post'))
    ),
  )
).filter(
  has_unread_posts=True,
).order_by(
  'course__uid',
  '-version__start',
).select_related(
  'course',
  'version',
)

Course and Version are other tables related to Content.

I want to know with this query, for each content, whether or not there is something new in the corresponding forum for a given user. There is something new if any one of the following condition is true:

  1. there exists a thread for which the user has no thread entry (an entry is added when the thread is first read by the user);
  2. there exists a user thread entry for which the last read post is not up to date with the current thread post (the thread owner has modified the post since);
  3. there exists a followup for which the user has no followup entry (an entry is added when the followup is first read by the user);
  4. there exists a user followup entry for which the last read post is not up to date with the followup post (the followup owner has modified the post since).

On my machine, just by changing the Django version using pip, and nothing else, this query takes about 1 second of execution on Django 3.1.14, and a little more than 2 seconds on Django 3.2.15, so about a 2x increase. Here are the current table sizes for my forum app:

  • Thread: ~30K
  • FollowUp: ~46K
  • ThreadEntry: ~1.3M
  • FollowUpEntry: ~4.5M
  • Post: ~103K

And there is 33 Content rows.

Am I the only one observing such performance regressions with Django 3.2? On other more complex queries that contain nested subqueries, I have seen up to 30x execution time increases.

Did something major happen in SQL generation from 3.1 to 3.2?

Am I doing something wrong? How can this happen?

Any help on understanding what is going on with Django 3.2 would be much appreciated.

Best regards,

Change History (8)

comment:1 by Marc Parizeau, 2 years ago

After more digging, it appears that the two worst culprits in my query are the 1st and 3rd Exists. The 1st is about 10x slower on 3.2.15 than on 3.1.14. The 3rd is about 6x slower on 3.2.15 than on 3.1.14. The execution times for the 2nd and 4th Exists are about the same for 3.2 and 3.1.

If I use only the 1st Exists in my query, by commenting out the 3 others, and by adding a False for the Greatest requirement of min 2 args, I obtain the following explanations (I also removed the order_by and select_related clauses to simplify the query as much as possible):

On Django 3.1.14:

time 0.03233041700000072
Sort  (cost=25581.33..25581.33 rows=1 width=1207)
  Sort Key: content_course.uid, content_version.start DESC
  ->  Nested Loop Left Join  (cost=0.14..25581.32 rows=1 width=1207)
        Join Filter: (content_content.version_id = content_version.id)
        ->  Nested Loop Left Join  (cost=0.14..13437.94 rows=1 width=1202)
              Join Filter: (content_content.course_id = content_course.id)
              ->  Nested Loop  (cost=0.14..13436.58 rows=1 width=1124)
                    ->  Seq Scan on content_teacher  (cost=0.00..7.65 rows=1 width=8)
                          Filter: (user_id = 59)
                    ->  Index Scan using content_semester_pkey on content_content  (cost=0.14..12150.00 rows=1 width=1124)
                          Index Cond: (id = content_teacher.content_id)
                          Filter: ((id <> 58) AND GREATEST((hashed SubPlan 8), false))
                          SubPlan 8
                            ->  Seq Scan on forum_thread v0_1  (cost=12114.13..13913.92 rows=14672 width=8)
                                  Filter: (NOT (hashed SubPlan 7))
                                  SubPlan 7
                                    ->  Bitmap Heap Scan on forum_threadentry u1_1  (cost=75.76..12097.65 rows=6592 width=8)
                                          Recheck Cond: (user_id = 59)
                                          Filter: (thread_id IS NOT NULL)
                                          ->  Bitmap Index Scan on forum_threadentry_user_id_585e5649  (cost=0.00..74.11 rows=6625 width=0)
                                                Index Cond: (user_id = 59)
              ->  Seq Scan on content_course  (cost=0.00..1.16 rows=16 width=86)
        ->  Seq Scan on content_version  (cost=0.00..1.24 rows=24 width=12)
        SubPlan 2
          ->  Bitmap Heap Scan on forum_thread v0  (cost=12139.37..13583.67 rows=587 width=0)
                Recheck Cond: (content_id = content_content.id)
                Filter: (NOT (hashed SubPlan 1))
                ->  Bitmap Index Scan on forum_question_semester_id_e4f29334  (cost=0.00..25.09 rows=1174 width=0)
                      Index Cond: (content_id = content_content.id)
                SubPlan 1
                  ->  Bitmap Heap Scan on forum_threadentry u1  (cost=75.76..12097.65 rows=6592 width=8)
                        Recheck Cond: (user_id = 59)
                        Filter: (thread_id IS NOT NULL)
                        ->  Bitmap Index Scan on forum_threadentry_user_id_585e5649  (cost=0.00..74.11 rows=6625 width=0)
                              Index Cond: (user_id = 59)

On Django 3.2.15:

time 0.30839854199999195
Sort  (cost=62.30..62.31 rows=1 width=1207)
  Sort Key: content_course.uid, content_version.start DESC
  ->  Nested Loop Left Join  (cost=0.41..62.29 rows=1 width=1207)
        ->  Nested Loop Left Join  (cost=0.28..40.58 rows=1 width=1202)
              ->  Nested Loop  (cost=0.14..39.68 rows=1 width=1124)
                    ->  Seq Scan on content_teacher  (cost=0.00..7.65 rows=1 width=8)
                          Filter: (user_id = 59)
                    ->  Index Scan using content_semester_pkey on content_content  (cost=0.14..28.99 rows=1 width=1124)
                          Index Cond: (id = content_teacher.content_id)
                          Filter: ((id <> 58) AND GREATEST((SubPlan 3), false))
                          SubPlan 3
                            ->  Nested Loop Anti Join  (cost=0.43..18547.19 rows=909 width=0)
                                  ->  Seq Scan on forum_thread v0_1  (cost=0.00..1799.79 rows=1174 width=8)
                                        Filter: (content_id = content_content.id)
                                  ->  Index Only Scan using forum_threadentry_thread_id_user_id_cd21c4a5_uniq on forum_threadentry u1_1  (cost=0.43..55.66 rows=33 width=8)
                                        Index Cond: ((thread_id = v0_1.id) AND (user_id = 59))
              ->  Index Scan using content_course_pkey on content_course  (cost=0.14..0.82 rows=1 width=86)
                    Index Cond: (id = content_content.course_id)
        ->  Index Scan using content_version_pkey on content_version  (cost=0.14..0.82 rows=1 width=12)
              Index Cond: (id = content_content.version_id)
        SubPlan 1
          ->  Nested Loop Anti Join  (cost=0.43..18547.19 rows=909 width=0)
                ->  Seq Scan on forum_thread v0  (cost=0.00..1799.79 rows=1174 width=8)
                      Filter: (content_id = content_content.id)
                ->  Index Only Scan using forum_threadentry_thread_id_user_id_cd21c4a5_uniq on forum_threadentry u1  (cost=0.43..55.66 rows=33 width=8)
                      Index Cond: ((thread_id = v0.id) AND (user_id = 59))

So it appears that the SQL strategies of the two Django versions are indeed not the same. And it also appears that the regression stems from the implementation of the exclude(threadentry__user=user) clause (reverse traversal of the thread entry FK).

Hope this info can help a gourou isolate the problem source (SQL optimization is out of my league).

Last edited 2 years ago by Marc Parizeau (previous) (diff)

comment:2 by Marc Parizeau, 2 years ago

So the exact Django query that lead to the above results is:

Content.objects.all().annotate(
        unread=Greatest(
            Exists(
                # a thread has no user entry (0.03s on 3.1.14 vs 0.30s on 3.2.15)
                Thread.objects.filter(
                    content=OuterRef('pk'),
                ).exclude(threadentry__user=user)
            ),
            False,
        )
    ).filter(
        unread=True,
    )

Which produces the following SQL.

On Django 3.1.14:

SELECT "content_content"."id", "content_content"."created", "content_content"."description", "content_content"."disabled", "content_content"."modified", "content_content"."objectives", "content_content"."published", "content_content"."title", "content_content"."course_id", "content_content"."version_id", "content_content"."exercise_weight", "content_content"."midterm_weight", "content_content"."final_weight", "content_content"."project_weight", "content_content"."forum_weight", "content_content"."welcome_video", "content_content"."navigate_video", "content_content"."forum_video", "content_content"."lambda_fct_id", "content_content"."dl_key_version", "content_content"."dl_key_token", "content_content"."exam_key_token", "content_content"."lectures_url", 
EXISTS(
    SELECT V0."id", V0."created", V0."disabled", V0."modified", V0."title", V0."content_id", V0."owner_id", V0."post_id", V0."tag", V0."start", V0."uid", V0."question", V0."important", V0."resolved", V0."authorized", V0."votes", V0."mptt_id", V0."linked_content_type_id", V0."linked_content_id" FROM "forum_thread" V0 WHERE (
        V0."content_id" = "content_content"."id" AND NOT (
            V0."id" IN (
                SELECT U1."thread_id" FROM "forum_threadentry" U1 WHERE (
                    U1."user_id" = 59 AND U1."thread_id" IS NOT NULL
                )
            )
        )
    )
) AS "unread" FROM "content_content" LEFT OUTER JOIN "content_course" ON (
    "content_content"."course_id" = "content_course"."id"
) LEFT OUTER JOIN "content_version" ON (
    "content_content"."version_id" = "content_version"."id"
) WHERE EXISTS(
    SELECT V0."id", V0."created", V0."disabled", V0."modified", V0."title", V0."content_id", V0."owner_id", V0."post_id", V0."tag", V0."start", V0."uid", V0."question", V0."important", V0."resolved", V0."authorized", V0."votes", V0."mptt_id", V0."linked_content_type_id", V0."linked_content_id" FROM "forum_thread" V0 WHERE (
        V0."content_id" = "content_content"."id" AND NOT (
            V0."id" IN (
                SELECT U1."thread_id" FROM "forum_threadentry" U1 WHERE (
                    U1."user_id" = 59 AND U1."thread_id" IS NOT NULL
                )
            )
        )
    )
) ORDER BY "content_course"."uid" ASC, "content_version"."start" DESC

On Django 3.2.15:

SELECT "content_content"."id", "content_content"."created", "content_content"."description", "content_content"."disabled", "content_content"."modified", "content_content"."objectives", "content_content"."published", "content_content"."title", "content_content"."course_id", "content_content"."version_id", "content_content"."exercise_weight", "content_content"."midterm_weight", "content_content"."final_weight", "content_content"."project_weight", "content_content"."forum_weight", "content_content"."welcome_video", "content_content"."navigate_video", "content_content"."forum_video", "content_content"."lambda_fct_id", "content_content"."dl_key_version", "content_content"."dl_key_token", "content_content"."exam_key_token", "content_content"."lectures_url", 
EXISTS(
    SELECT (1) AS "a" FROM "forum_thread" V0 WHERE (
        V0."content_id" = "content_content"."id" AND NOT (
            EXISTS(
                SELECT (1) AS "a" FROM "forum_threadentry" U1 WHERE (
                    U1."user_id" = 59 AND U1."thread_id" = V0."id"
                ) LIMIT 1
            )
        )
    ) LIMIT 1
) AS "unread" FROM "content_content" LEFT OUTER JOIN "content_course" ON (
    "content_content"."course_id" = "content_course"."id"
) LEFT OUTER JOIN "content_version" ON (
    "content_content"."version_id" = "content_version"."id"
) WHERE EXISTS(
    SELECT (1) AS "a" FROM "forum_thread" V0 WHERE (
        V0."content_id" = "content_content"."id" AND NOT (
            EXISTS(
                SELECT (1) AS "a" FROM "forum_threadentry" U1 WHERE (
                    U1."user_id" = 59 AND U1."thread_id" = V0."id"
                ) LIMIT 1
            )
        )
    ) LIMIT 1
) ORDER BY "content_course"."uid" ASC, "content_version"."start" DESC}}}

Again, hope this helps!
Version 4, edited 2 years ago by Marc Parizeau (previous) (next) (diff)

comment:3 by Mariusz Felisiak, 2 years ago

Did something major happen in SQL generation from 3.1 to 3.2?

Each version of Django includes many bugfixes and new ORM features that affect the generated SQL.

Can you reproduce the same regression on Django 4.1 and the current main branch? Can you bisect a regression?

comment:4 by Marc Parizeau, 2 years ago

Here is the "explanation" for Django 4.0.7:

time 0.3011052919999955
Sort  (cost=62.30..62.31 rows=1 width=1207)
  Sort Key: content_course.uid, content_version.start DESC
  ->  Nested Loop Left Join  (cost=0.41..62.29 rows=1 width=1207)
        ->  Nested Loop Left Join  (cost=0.28..40.58 rows=1 width=1202)
              ->  Nested Loop  (cost=0.14..39.68 rows=1 width=1124)
                    ->  Seq Scan on content_teacher  (cost=0.00..7.65 rows=1 width=8)
                          Filter: (user_id = 59)
                    ->  Index Scan using content_semester_pkey on content_content  (cost=0.14..28.99 rows=1 width=1124)
                          Index Cond: (id = content_teacher.content_id)
                          Filter: ((id <> 58) AND GREATEST((SubPlan 3), false))
                          SubPlan 3
                            ->  Nested Loop Anti Join  (cost=0.43..18547.19 rows=909 width=0)
                                  ->  Seq Scan on forum_thread v0_1  (cost=0.00..1799.79 rows=1174 width=8)
                                        Filter: (content_id = content_content.id)
                                  ->  Index Only Scan using forum_threadentry_thread_id_user_id_cd21c4a5_uniq on forum_threadentry u1_1  (cost=0.43..55.66 rows=33 width=8)
                                        Index Cond: ((thread_id = v0_1.id) AND (user_id = 59))
              ->  Index Scan using content_course_pkey on content_course  (cost=0.14..0.82 rows=1 width=86)
                    Index Cond: (id = content_content.course_id)
        ->  Index Scan using content_version_pkey on content_version  (cost=0.14..0.82 rows=1 width=12)
              Index Cond: (id = content_content.version_id)
        SubPlan 1
          ->  Nested Loop Anti Join  (cost=0.43..18547.19 rows=909 width=0)
                ->  Seq Scan on forum_thread v0  (cost=0.00..1799.79 rows=1174 width=8)
                      Filter: (content_id = content_content.id)
                ->  Index Only Scan using forum_threadentry_thread_id_user_id_cd21c4a5_uniq on forum_threadentry u1  (cost=0.43..55.66 rows=33 width=8)
                      Index Cond: ((thread_id = v0.id) AND (user_id = 59))

And the corresponding SQL:

SELECT "content_content"."id", "content_content"."created", "content_content"."description", "content_content"."disabled", "content_content"."modified", "content_content"."objectives", "content_content"."published", "content_content"."title", "content_content"."course_id", "content_content"."version_id", "content_content"."exercise_weight", "content_content"."midterm_weight", "content_content"."final_weight", "content_content"."project_weight", "content_content"."forum_weight", "content_content"."welcome_video", "content_content"."navigate_video", "content_content"."forum_video", "content_content"."lambda_fct_id", "content_content"."dl_key_version", "content_content"."dl_key_token", "content_content"."exam_key_token", "content_content"."lectures_url", GREATEST(EXISTS(SELECT (1) AS "a" FROM "forum_thread" V0 WHERE (V0."content_id" = ("content_content"."id") AND NOT (EXISTS(SELECT (1) AS "a" FROM "forum_threadentry" U1 WHERE (U1."user_id" = 59 AND U1."thread_id" = (V0."id")) LIMIT 1))) LIMIT 1), False) AS "unread" FROM "content_content" INNER JOIN "content_teacher" ON ("content_content"."id" = "content_teacher"."content_id") LEFT OUTER JOIN "content_course" ON ("content_content"."course_id" = "content_course"."id") LEFT OUTER JOIN "content_version" ON ("content_content"."version_id" = "content_version"."id") WHERE ("content_teacher"."user_id" = 59 AND NOT ("content_content"."id" = 58) AND GREATEST(EXISTS(SELECT (1) AS "a" FROM "forum_thread" V0 WHERE (V0."content_id" = ("content_content"."id") AND NOT (EXISTS(SELECT (1) AS "a" FROM "forum_threadentry" U1 WHERE (U1."user_id" = 59 AND U1."thread_id" = (V0."id")) LIMIT 1))) LIMIT 1), False)) ORDER BY "content_course"."uid" ASC, "content_version"."start" DESC

As for Django 4.1, I can't do it right now, because I am getting Expression contains mixed types: IntegerField, SmallIntegerField errors which I haven't been able to remove so far.

But my issues definitely started with Django 3.2, not with Django 4 which seems to behave similarly for my specific query.

comment:5 by Mariusz Felisiak, 2 years ago

Cc: Simon Charette added
Component: UncategorizedDatabase layer (models, ORM)
Resolution: needsinfo
Status: newclosed

Using EXISTS to exclude multi-valued relationship was introduced in 8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1. However this should be is easier to optimize for query planers, it's not clear to me why you're facing a performance regression. Perhaps the query plan should be regenerated or this is an issue in PostgreSQL 14.2. I don't think there is anything that we could improve in Django.

in reply to:  5 comment:6 by Marc Parizeau, 2 years ago

Replying to Mariusz Felisiak:

Using EXISTS to exclude multi-valued relationship was introduced in 8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1. However this should be is easier to optimize for query planers, it's not clear to me why you're facing a performance regression. Perhaps the query plan should be regenerated or this is an issue in PostgreSQL 14.2. I don't think there is anything that we could improve in Django.

Not sure I understand what you are saying. You are saying that the regression I am observing is definitely not caused by the Django generated SQL? That the issue must stem from the query planers of postgres? Are you sure about that?

Any suggestion on what I should do moving forward? What postgres version do you suggest I use to get the proper query planers for Django 3.2/4?

comment:7 by Simon Charette, 2 years ago

Bonjour Marc,

There are some sources listed on the PR that introduced these changes that documents the performance benefits of using EXISTS over IN and the case where IN will be faster which I believe is what you're experiencing here.

Given there are performance benefits in most cases and IN is only faster when the subquery returns a small number of rows as a hashmap can be created I believe we should keep the change around.

The good news is that you can restore the previous behaviour by being more explicit in your usage of exclude by using an explicit IN lookup instead.

user_thread_entries = ThreadEntry.objects.filter(user=user, thread__isnull=False).values("thread")
Content.objects.annotate(
    unread=Exists(
        Thread.objects.filter(
            content=OuterRef('pk'),
        ).exclude(pk__in=user_thread_entries)
    )
).filter(
    unread=True,
)

Keep in mind though that, as pointed out in the article, this query will only be faster for users with a relatively small number of thread entries as Postgres will turn the set of rows into a hashmap which is means that this query will become more expensive as the number of threads user creates increase.

in reply to:  7 comment:8 by Marc Parizeau, 2 years ago

Replying to Simon Charette:

Thanks for the suggestion. I've tried it and it does not seem to reduce the execution time for my test case that has about 26K threads for it's user. So, the limitation that you mention probably applies. I am still trying to understand what might be going on (my mastery of SQL is somewhat limited), and why Django 3.1 was doing better than 3.2 for my specific query. Anyway, for the time being, I've decided to move on to other problems, since this one is mostly severe for myself, and not for my regular users who have access to only a subset of the available contents. I worry, however, that the problem will only grow as more content is added...

I've also tried indexing, but so far to no improvement.

If I eventually find a solution, I will report it here.

Thanks for your help!

Note: See TracTickets for help on using tickets.
Back to Top