Opened 4 years ago
Closed 4 years ago
#31904 closed Bug (invalid)
Django queryset cannot call distinct() and order_by() on separate fields.
Reported by: | Logan Knecht | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Normal | Keywords: | PostgreSQL, Django, order_by, distinct, queryset |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Introduction
Hello there!
I want to start by saying thank you so much for this library! It is absolutely wonderful!
Data Model
There are Learners
There are Schools
A School
has many Courses
A Course has many
Lessons`
A School
has many Resources
A Lesson
has one Resource
Problem
I want all the Schools
ordered by the most Learners
that are enrolled, and I want the returned Schools
to be distinct.
I have an issue where I've got to do a really complex query across multiple models in order to generate the full data of the school and return it. I have to use a heavily prefetched
and select_related
query to get what I need so that it's performant. When I query the School
model that's returned it is duplicated multiple times.
I can de-duplicate the School
by calling distinct("id")
.
However, now they're not in the correct order, so I want to call order_by("-learner_enrolled_count")
When I do this the query looks like this:
user_query_set = (UserModel.objects.all()) # --- school_board_headmaster_query_set = (SchoolBoardHeadmasterModel.objects.all() .select_related("learner", "school_board")) school_board_member_query_set = (SchoolBoardMemberModel.objects.all() .select_related("learner", "school_board")) # --- flash_card_set_query_set = (FlashCardSetModel.objects.all() .select_related("school")) flash_card_query_set = (FlashCardModel.objects.all() .select_related("flash_card_set", "playsheetmusicflashcardmodel", "school", "trueorfalseflashcardmodel")) play_sheet_music_flash_card_query_set = (PlaySheetMusicFlashCardModel.objects.all() .select_related("flash_card", "school", "sheet_music")) true_or_false_flash_card_query_set = (TrueOrFalseFlashCardModel.objects.all() .select_related("flash_card", "school")) sheet_music_query_set = (SheetMusicModel.objects.all() .select_related("school")) # --- school_course_query_set = (SchoolCourseModel.objects.all() .select_related("school")) # --- school_lesson_query_set = (SchoolLessonModel.objects.all() .select_related("course", "flash_card_set", "school")) # ---------------------------------------------------------------------- # Prefetch # ---------------------------------------------------------------------- school_models = (super() .get_queryset() .select_related("school_board") .prefetch_related( # ------------------------------------------------------------------ # School Board # ------------------------------------------------------------------ Prefetch("school_board__school_board_headmaster", school_board_headmaster_query_set), Prefetch("school_board__school_board_headmaster__learner__user", user_query_set), Prefetch("school_board__school_board_member", school_board_member_query_set), Prefetch("school_board__school_board_member__learner__user", user_query_set), # ------------------------------------------------------------------ # School Course # ------------------------------------------------------------------ Prefetch("school_course", school_course_query_set), # ------------------------------------------------------------------ # School Course -> School Lesson # ------------------------------------------------------------------ Prefetch("school_course__school_lesson", school_lesson_query_set), # ------------------------------------------------------------------ # School Course -> School Lesson -> Flash Card Set # ------------------------------------------------------------------ Prefetch("school_course__school_lesson__flash_card_set", flash_card_set_query_set), Prefetch("school_course__school_lesson__flash_card_set__flash_card", flash_card_query_set), Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel", play_sheet_music_flash_card_query_set), Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel__sheet_music", sheet_music_query_set), Prefetch("school_course__school_lesson__flash_card_set__flash_card__trueorfalseflashcardmodel", true_or_false_flash_card_query_set), # ------------------------------------------------------------------ # School Resources # ------------------------------------------------------------------ Prefetch("flashcardsetmodel_set", flash_card_set_query_set), # --- Prefetch("flashcardsetmodel_set__flash_card", flash_card_query_set), # --- Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel", play_sheet_music_flash_card_query_set), Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel__sheet_music", sheet_music_query_set), # --- Prefetch("flashcardsetmodel_set__flash_card__trueorfalseflashcardmodel", true_or_false_flash_card_query_set), # --- Prefetch("sheetmusicmodel_set", sheet_music_query_set) )) all_school_models = school_models.annotate( learner_enrolled_count=Case( When(learner_enrolled_school__learner_enrolled_course__learner_enrolled_lesson__is_enrolled=True, then=1), default=0, output_field=IntegerField() ))
So I then call
all_school_models.distinct("id").order_by("-learner_enrolled_count")
Which then gives me this output:
Internal Server Error: /api/v1/schools/ Traceback (most recent call last): File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) psycopg2.errors.InvalidColumnReference: SELECT DISTINCT ON expressions must match initial ORDER BY expressions LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano... ^ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner response = get_response(request) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response response = self.process_exception_by_middleware(e, request) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view return view_func(*args, **kwargs) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/viewsets.py", line 114, in view return self.dispatch(request, *args, **kwargs) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 505, in dispatch response = self.handle_exception(exc) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 465, in handle_exception self.raise_uncaught_exception(exc) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception raise exc File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 502, in dispatch response = handler(request, *args, **kwargs) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/views/school_view.py", line 102, in list request) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/pagination/header_link_pagination.py", line 91, in paginate_queryset total_result_count = len(queryset) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 258, in __len__ self._fetch_all() File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 1261, in _fetch_all self._result_cache = list(self._iterable_class(self)) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 57, in __iter__ results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1144, in execute_sql cursor.execute(sql, params) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 198, in execute return self._record(self.cursor.execute, sql, params) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 133, in _record return method(sql, params) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 100, in execute return super().execute(sql, params) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute return self._execute_with_wrappers(sql, params, many=False, executor=self._execute) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers return executor(sql, params, many, context) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute return self.cursor.execute(sql, params) django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano... ^ [15/Aug/2020 06:42:27] "GET /api/v1/schools/?order_by=-learner_enrolled_count HTTP/1.1" 500 27740
Understanding
My understanding of this is that this is caused by PostgresSQL AND Django
If you read the documentation for Django it says
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
- https://docs.djangoproject.com/en/3.0/ref/models/querysets/#django.db.models.query.QuerySet.order_by
- https://docs.djangoproject.com/en/3.0/ref/models/querysets/#distinct
So in this case the solution, if I'm lazy is just to do
all_school_models.distinct("learner_enrolled_count", "id").order_by("-learner_enrolled_count", "id")
But that doesn't return the correct information because now the results are distinct for the School
models that have a distinct learner_enrolled_count
AND id
when I just want the distinct Schools
and then now they are first ordered by learner_enrolled_count
and immediately by id
which is also incorrect.
Additional References
I'm struggling with this in the django-filter
library and created an to issue to track it:
Conclusion
I have no idea how to achieve the ability to select distinct
schools and order_by
a separate column, and it doesn't seem like an unreasonable request? I have looked all over and see this issue repeated, but no work arounds provided.
Do you have guidance on how to achieve this?
This seems like an oversight of Django to not support the ability to distinctly select fields and order arbitrarily.
Change History (2)
comment:1 by , 4 years ago
Type: | Uncategorized → Bug |
---|
comment:2 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Summary: | Django `queryset` Cannot Call `distinct()` and `order_by()` On Separate Fields When Using Postgres SQL → Django queryset cannot call distinct() and order_by() on separate fields. |
This is not a Django limitation that's how
DISTINCT
works in databases, it's also widely explained in docs.You should try to rewrite a queryset to omit duplicate rows, e.g. by using
Exists()
instead ofCase()
forlearner_enrolled_count
.Closing per TicketClosingReasons/UseSupportChannels.