#35830 closed Bug (invalid)
Queryset Union doesn't persist column names
Reported by: | Paul Landon Tuckett | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | orm, database, queryset, union, sql |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In Django v3.2.25 doing a queryset.union() would preserve the original column names. Upgrading to Django 4.2.16, this is no longer the case. It converts all the columns (minus annotations), to col1, col2, etc. The column names are the same in each queryset and the data types are the same. Even if you do a union() on the exact same queryset, the columns are renamed, ie: queryset.union(queryset). If this is expected behavior, please disregard. I'm not seeing anything in the change log that references this change.
Attachments (1)
Change History (5)
comment:1 by , 2 months ago
by , 2 months ago
Attachment: | django_bug.py added |
---|
comment:2 by , 2 months ago
Hello Simon,
Sure, I attached a file that shows a union between 2 querysets of the same model and the sql output. Let me know if that works or you need more information. I tested it both with the upgrade and without it.
comment:3 by , 2 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
By bisecting the changes since 3.2 I identified 70499b25c708557fb9ee2264686cd172f4b2354e as the change that introduced the aliasing in order to resolve #34123.
The problem it addresses arise when using select_related
in queryset that use union as when it's the case and an ambiguous alias is referenced for ordering purposes. For example if you do
base = Book.objects.select_related("author") base.filter(rating__gt=3).union(base.filter(title__contains="Foo")).order_by("id")
The generated outer ORDER BY
clause cannot simply reference "id"
as it's ambiguous whether it's book.id
or author.id
so book.id
must be aliased.
It it understood that only ambiguous column names that are referenced by ORDER BY
absolutely need to be aliased but it was much easier to systematically alias all columns instead.
So to answer your question it is expected that the SQL changed and since it doesn't change the semantic of the query it is not considered a bug. If you care about the exact name returned from the query you should resort to values
instead.
comment:4 by , 2 months ago
I totally understand that column references can be ambiguous, but we convert these Django queries into SQL and inject them into raw sql, so it's a major overhaul for us to have to convert all the Django queries to raw sql because of this change. We do use values
and/or values_list
, but it still doesn't preserve the column names. Even with an order_by
and values
combination, same result. If there is another way we can force the Django ORM to preserve these column names, please advise.
Hello Paul.
If you could provide a set of models and a queryset that reproduces it would be easier to bisect the changeset that caused it and determine if it was intended or not. I suspect this was a necessary change to address an issue when combining querysets mixing different field names but it's hard to tell for sure without a way to reproduce what you are experiencing.