Opened 4 years ago

Closed 4 years ago

#31904 closed Bug (invalid)

Django queryset cannot call distinct() and order_by() on separate fields.

Reported by: Logan Knecht Owned by: nobody
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: PostgreSQL, Django, order_by, distinct, queryset
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Introduction

Hello there!

I want to start by saying thank you so much for this library! It is absolutely wonderful!

Data Model

There are Learners
There are Schools
A School has many Courses
A Course has many Lessons`
A School has many Resources
A Lesson has one Resource

Problem

I want all the Schools ordered by the most Learners that are enrolled, and I want the returned Schools to be distinct.

I have an issue where I've got to do a really complex query across multiple models in order to generate the full data of the school and return it. I have to use a heavily prefetched and select_related query to get what I need so that it's performant. When I query the School model that's returned it is duplicated multiple times.

I can de-duplicate the School by calling distinct("id").

However, now they're not in the correct order, so I want to call order_by("-learner_enrolled_count")

When I do this the query looks like this:

user_query_set = (UserModel.objects.all())
# ---
school_board_headmaster_query_set = (SchoolBoardHeadmasterModel.objects.all()
                                 .select_related("learner",
                                                 "school_board"))
school_board_member_query_set = (SchoolBoardMemberModel.objects.all()
                             .select_related("learner",
                                             "school_board"))
# ---
flash_card_set_query_set = (FlashCardSetModel.objects.all()
                        .select_related("school"))
flash_card_query_set = (FlashCardModel.objects.all()
                    .select_related("flash_card_set",
                                    "playsheetmusicflashcardmodel",
                                    "school",
                                    "trueorfalseflashcardmodel"))
play_sheet_music_flash_card_query_set = (PlaySheetMusicFlashCardModel.objects.all()
                                     .select_related("flash_card",
                                                     "school",
                                                     "sheet_music"))
true_or_false_flash_card_query_set = (TrueOrFalseFlashCardModel.objects.all()
                                  .select_related("flash_card",
                                                  "school"))
sheet_music_query_set = (SheetMusicModel.objects.all()
                     .select_related("school"))
# ---
school_course_query_set = (SchoolCourseModel.objects.all()
                       .select_related("school"))
# ---
school_lesson_query_set = (SchoolLessonModel.objects.all()
                       .select_related("course",
                                       "flash_card_set",
                                       "school"))

# ----------------------------------------------------------------------
# Prefetch
# ----------------------------------------------------------------------
school_models = (super()
             .get_queryset()
             .select_related("school_board")
             .prefetch_related(
# ------------------------------------------------------------------
# School Board
# ------------------------------------------------------------------
Prefetch("school_board__school_board_headmaster",
         school_board_headmaster_query_set),
Prefetch("school_board__school_board_headmaster__learner__user",
         user_query_set),
Prefetch("school_board__school_board_member",
         school_board_member_query_set),
Prefetch("school_board__school_board_member__learner__user",
         user_query_set),
# ------------------------------------------------------------------
# School Course
# ------------------------------------------------------------------
Prefetch("school_course",
         school_course_query_set),
# ------------------------------------------------------------------
# School Course -> School Lesson
# ------------------------------------------------------------------
Prefetch("school_course__school_lesson",
         school_lesson_query_set),
# ------------------------------------------------------------------
# School Course -> School Lesson -> Flash Card Set
# ------------------------------------------------------------------
Prefetch("school_course__school_lesson__flash_card_set",
         flash_card_set_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card",
         flash_card_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel",
         play_sheet_music_flash_card_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel__sheet_music",
         sheet_music_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__trueorfalseflashcardmodel",
         true_or_false_flash_card_query_set),
# ------------------------------------------------------------------
# School Resources
# ------------------------------------------------------------------
Prefetch("flashcardsetmodel_set",
         flash_card_set_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card",
         flash_card_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel",
         play_sheet_music_flash_card_query_set),
Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel__sheet_music",
         sheet_music_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card__trueorfalseflashcardmodel",
         true_or_false_flash_card_query_set),
# ---
Prefetch("sheetmusicmodel_set",
         sheet_music_query_set)
))

all_school_models = school_models.annotate(
learner_enrolled_count=Case(
    When(learner_enrolled_school__learner_enrolled_course__learner_enrolled_lesson__is_enrolled=True,
         then=1),
    default=0,
    output_field=IntegerField()
))

So I then call

all_school_models.distinct("id").order_by("-learner_enrolled_count")

Which then gives me this output:

Internal Server Error: /api/v1/schools/
Traceback (most recent call last):
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.InvalidColumnReference: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano...
                            ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
    return view_func(*args, **kwargs)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/viewsets.py", line 114, in view
    return self.dispatch(request, *args, **kwargs)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 505, in dispatch
    response = self.handle_exception(exc)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 465, in handle_exception
    self.raise_uncaught_exception(exc)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
    raise exc
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 502, in dispatch
    response = handler(request, *args, **kwargs)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/views/school_view.py", line 102, in list
    request)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/pagination/header_link_pagination.py", line 91, in paginate_queryset
    total_result_count = len(queryset)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1144, in execute_sql
    cursor.execute(sql, params)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 198, in execute
    return self._record(self.cursor.execute, sql, params)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 133, in _record
    return method(sql, params)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano...
                            ^

[15/Aug/2020 06:42:27] "GET /api/v1/schools/?order_by=-learner_enrolled_count HTTP/1.1" 500 27740

Understanding

My understanding of this is that this is caused by PostgresSQL AND Django

If you read the documentation for Django it says

When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.

So in this case the solution, if I'm lazy is just to do

all_school_models.distinct("learner_enrolled_count", "id").order_by("-learner_enrolled_count", "id")

But that doesn't return the correct information because now the results are distinct for the School models that have a distinct learner_enrolled_count AND id when I just want the distinct Schools and then now they are first ordered by learner_enrolled_count and immediately by id which is also incorrect.

Additional References

I'm struggling with this in the django-filter library and created an to issue to track it:

Conclusion

I have no idea how to achieve the ability to select distinct schools and order_by a separate column, and it doesn't seem like an unreasonable request? I have looked all over and see this issue repeated, but no work arounds provided.

Do you have guidance on how to achieve this?

This seems like an oversight of Django to not support the ability to distinctly select fields and order arbitrarily.

Change History (2)

comment:1 by Logan Knecht, 4 years ago

Type: UncategorizedBug

comment:2 by Mariusz Felisiak, 4 years ago

Resolution: invalid
Status: newclosed
Summary: Django `queryset` Cannot Call `distinct()` and `order_by()` On Separate Fields When Using Postgres SQLDjango queryset cannot call distinct() and order_by() on separate fields.

This seems like an oversight of Django to not support the ability to distinctly select fields and order arbitrarily.

This is not a Django limitation that's how DISTINCT works in databases, it's also widely explained in docs.

You should try to rewrite a queryset to omit duplicate rows, e.g. by using Exists() instead of Case() for learner_enrolled_count.

Closing per TicketClosingReasons/UseSupportChannels.

Note: See TracTickets for help on using tickets.
Back to Top