Opened 7 years ago

Closed 22 months ago

Last modified 7 months ago

#28553 closed Bug (fixed)

Querysets: annotate() columns are forced into a certain position which may disrupt union()

Reported by: David Sanders Owned by: David Wobrock
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords:
Cc: Ole Laursen, David Wobrock, Simon Charette, Sylvain Fankhauser Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

(Reporting possible issue found by a user on #django)

Using values() to force selection of certain columns in a certain order proved useful unioning querysets with union() for the aforementioned user. The positioning of columns added with annotate() is not controllable with values() and has the potential to disrupt union() unless this fact is known and the ordering done in a certain way to accommodate it.

I'm reporting this mainly for posterity but also as a highlight that perhaps this should be mentioned in the documentation. I'm sure there are reasons why the annotations are appended to the select but if someone feels that this is changeable then it would be a bonus outcome.

Change History (17)

comment:1 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

(The ticket component should change to "Documentation" if there aren't any code changes to make here. I'm not sure.)

comment:2 by Sergey Fedoseev, 7 years ago

Probably duplicate of #28900.

Last edited 22 months ago by Mariusz Felisiak (previous) (diff)

comment:3 by Flávio Juvenal, 7 years ago

I've stumbled upon a case in production where this limitation prevents me for making a useful query. I've been able to create a test to reproduce this problem.
It works with sqlite but fails with postgres.

Add this test to tests/queries/test_qs_combinators.py:

from django.db.models import F, IntegerField, TextField, Value

def test_union_with_two_annotated_values_on_different_models(self):
    qs1 = Number.objects.annotate(
        text_annotation=Value('Foo', TextField())
    ).values('text_annotation', 'num')
    qs2 = ReservedName.objects.annotate(
        int_annotation=Value(1, IntegerField()),
    ).values('name', 'int_annotation')
    self.assertEqual(qs1.union(qs2).count(), 10)

In current master (78f8b80f9b215e50618375adce4c97795dabbb84), running ./runtests.py --parallel=1 --settings=tests.test_postgres queries.test_qs_combinators.QuerySetSetOperationTests.test_union_with_two_annotated_values_on_different_models fails:

Testing against Django installed in 'django/django'
Creating test database for alias 'default'...
Creating test database for alias 'other'...
System check identified no issues (1 silenced).
E
======================================================================
ERROR: test_union_with_two_annotated_values_on_different_models (queries.test_qs_combinators.QuerySetSetOperationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: UNION types integer and character varying cannot be matched
LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "django/tests/queries/test_qs_combinators.py", line 140, in test_union_with_two_annotated_values_on_different_models
    self.assertEqual(qs1.union(qs2).count(), 10)
  File "django/django/db/models/query.py", line 382, in count
    return self.query.get_count(using=self.db)
  File "django/django/db/models/sql/query.py", line 494, in get_count
    number = obj.get_aggregation(using, ['__count'])['__count']
  File "django/django/db/models/sql/query.py", line 479, in get_aggregation
    result = compiler.execute_sql(SINGLE)
  File "django/django/db/models/sql/compiler.py", line 1054, in execute_sql
    cursor.execute(sql, params)
  File "django/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "django/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
  File "django/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: UNION types integer and character varying cannot be matched
LINE 1: ..._annotation" FROM "queries_number") UNION (SELECT "queries_r...
                                                             ^


----------------------------------------------------------------------
Ran 1 test in 0.007s

FAILED (errors=1)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

My tests/test_postgres.py is:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'django-test',
        'HOST': '127.0.0.1',
    },
    'other': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'django-test-other',
    }
}

SECRET_KEY = "django_tests_secret_key"

# Use a fast hasher to speed up tests.
PASSWORD_HASHERS = [
    'django.contrib.auth.hashers.MD5PasswordHasher',
]

comment:4 by Ole Laursen, 6 years ago

Cc: Ole Laursen added
Type: Cleanup/optimizationBug

There's a ticket with a test case in #30211.

That ticket was reported as a bug, and I think this ticket should be a bug too, so I'm changing the classification for now (apologies if that's inappropriate). I think it's a bug because the documentation in my reading implies that as long as the columns match, union will work. So it really comes as a bit of a surprise that Django overrides the order in values_list().

In my particular case, I'm using union() to combine two different tables that I need to get out in sorted order, and I was trying to use annotate() + values_list() to add a NULL filler to one table as it lacks a column from the other.

Also, I suppose the ORM could possibly also be a bit more efficient if it could return values_list() tuples directly from the select instead of having to rearrange them?

in reply to:  4 comment:5 by Kal Sze, 6 years ago

Actually there could be a different problem as well. We were lucky in that the ORM generated SQL where the data types of the columns do not match. But what happens when the data types of the columns match? I'm afraid you would get a query that doesn't throw an exception but is in fact subtly broken, especially if the values of the different columns happen to be similar, in which case it might take a long time for app developers to realize that the query is broken.

Last edited 5 years ago by Kal Sze (previous) (diff)

comment:6 by Matthias Kestenholz, 2 years ago

I just hit this as well. The solution (workaround) seems to be using F() and Value() freely and consistently for all querysets even if it doesn't look necessary on the surface.

See https://github.com/matthiask/feincms3-forms/commit/c112a7d613e991780f383393fd05f1c84c81a279

(It's a bit surprising that values_list doesn't produce a SQL query with the exact same ordering of values, but after thinking about it some more I'm not sure if that's really a bug or just a sharp edge of the current implementation.)

comment:7 by Matthias Kestenholz, 2 years ago

I submitted a failing unit test for this issue which demonstrates the problem: https://github.com/django/django/pull/16577

This seems hard to fix. I don't expect to do any work on this in the next few months since I have found a workaround for my use case. Hopefully the test is useful to someone.

comment:8 by David Wobrock, 23 months ago

Cc: David Wobrock added
Has patch: set
Owner: changed from nobody to David Wobrock
Status: newassigned

PR

The same issue can probably occur with extra() fields, but the PR doesn't fix that as we no longer fix bugs for this method.

comment:9 by Mariusz Felisiak, 23 months ago

Needs tests: set
Patch needs improvement: set

comment:10 by Mariusz Felisiak, 22 months ago

Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 22 months ago

Resolution: fixed
Status: assignedclosed

In d6b6e5d0:

Fixed #28553 -- Fixed annotation mismatch with QuerySet.values()/values_list() on compound queries.

Co-authored-by: Matthias Kestenholz <mk@…>

comment:12 by Simon Charette, 22 months ago

Cc: Simon Charette added

Small note that the cases reported in comment:3 is not fixed, the merged patch only focuses on the case where only annotations are referenced in values/values_list which is a subset of the problem reported here as mixing field references, extra, and annotations demonstrate.

The crux of the issue is that SQLCompiler.get_select ignores Query.values_select and always generate selected columns as follow

  1. Start with Query.extra_select
  2. Then Query.select
  3. End with Query.annotation_select

The patch here only makes sure that the order of annotation_select is preserved. What should be done instead is adjust get_select to be Query.values_select aware as pointed out on the MR.

in reply to:  12 comment:13 by Mariusz Felisiak, 22 months ago

Replying to Simon Charette:

Small note that the cases reported in comment:3 is not fixed, ...

Yes, but this case is described in #28900. Am I right?

comment:14 by Simon Charette, 22 months ago

#28900 seems to be an even more complex case where none of the combined queries use explicit values but the result of the query combination does.

In the comment:3 example both queries use values but happen to mix field references and annotations which is not covered by the test included in d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67.

I'm bringing this up because most if not all of the changes made to sql.Query for change the type of annotation_mask are unnecessary to solve #28553 entirely.

in reply to:  14 comment:15 by Mariusz Felisiak, 22 months ago

Replying to Simon Charette:

I'm bringing this up because most if not all of the changes made to sql.Query for change the type of annotation_mask are unnecessary to solve #28553 entirely.

I will be happy to review any adjustments in this area, +1.

comment:16 by Sylvain Fankhauser, 10 months ago

Cc: Sylvain Fankhauser added

comment:17 by Sarah Boyce <42296566+sarahboyce@…>, 7 months ago

In 65ad4ade:

Refs #28900 -- Made SELECT respect the order specified by values(*selected).

Previously the order was always extra_fields + model_fields + annotations with
respective local ordering inferred from the insertion order of *selected.

This commits introduces a new Query.selected propery that keeps tracks of the
global select order as specified by on values assignment. This is crucial
feature to allow the combination of queries mixing annotations and table
references.

It also allows the removal of the re-ordering shenanigans perform by
ValuesListIterable in order to re-map the tuples returned from the database
backend to the order specified by values_list() as they'll be in the right
order at query compilation time.

Refs #28553 as the initially reported issue that was only partially fixed
for annotations by d6b6e5d0fd4e6b6d0183b4cf6e4bd4f9afc7bf67.

Thanks Mariusz Felisiak and Sarah Boyce for review.

Note: See TracTickets for help on using tickets.
Back to Top