#35194 closed Bug (needsinfo)
Postgres 16.2 with _iexact leads to IndeterminateCollation
Reported by: | Aldalen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, David Sanders | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
There is something up with how "_iexact" GeneratedField expressions are turned into SQL for postgres 16.2. This does not occur with 16.1. This leads to a "psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function" error. This was caught in our CI/CD env when I revved it to 16.2. Existing / old migrations failed. This is a reduced examples from our code, with exception paths redacted where necessary.
Model Definition:
class TestModel(models.Model): test = models.CharField(max_length=254, blank=True) test_gen = models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True)
Migration content:
migrations.AddField( model_name='testmodel', name='test_gen', field=models.GeneratedField(db_persist=True, expression=models.Q(("test__iexact",'yes')), output_field=models.BooleanField()), )
Error:
Applying module.migration_name...Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute return self.cursor.execute(sql) ^^^^^^^^^^^^^^^^^^^^^^^^ psycopg2.errors.IndeterminateCollation: could not determine which collation to use for upper() function HINT: Use the COLLATE clause to set the collation explicitly. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/builds/[redacted]/manage.py", line 15, in <module> execute_from_command_line(sys.argv) File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line utility.execute() File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 413, in run_from_argv self.execute(*args, **cmd_options) File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 459, in execute output = self.handle(*args, **options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/base.py", line 107, in wrapper res = handle_func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 356, in handle post_migrate_state = executor.migrate( ^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 135, in migrate state = self._migrate_all_forwards( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 167, in _migrate_all_forwards state = self.apply_migration( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/executor.py", line 252, in apply_migration state = migration.apply(state, schema_editor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/migration.py", line 132, in apply operation.database_forwards( File "/opt/[redacted]/lib/python3.11/site-packages/django/db/migrations/operations/fields.py", line 108, in database_forwards schema_editor.add_field( File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 750, in add_field self.execute(sql, params) File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/postgresql/schema.py", line 48, in execute return super().execute(sql, None) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/base/schema.py", line 201, in execute cursor.execute(sql, params) File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 122, in execute return super().execute(sql, params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 79, in execute return self._execute_with_wrappers( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 92, in _execute_with_wrappers return executor(sql, params, many, context) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 100, in _execute with self.db.wrap_database_errors: File "/opt/[redacted]/lib/python3.11/site-packages/django/db/utils.py", line 91, in __exit__ raise dj_exc_value.with_traceback(traceback) from exc_value File "/opt/[redacted]/lib/python3.11/site-packages/django/db/backends/utils.py", line 103, in _execute return self.cursor.execute(sql) ^^^^^^^^^^^^^^^^^^^^^^^^ django.db.utils.ProgrammingError: could not determine which collation to use for upper() function HINT: Use the COLLATE clause to set the collation explicitly.
Change History (14)
comment:2 by , 10 months ago
Description: | modified (diff) |
---|
comment:3 by , 10 months ago
Component: | contrib.postgres → Database layer (models, ORM) |
---|---|
Owner: | set to |
Resolution: | → needsinfo |
Status: | new → closed |
Summary: | Postgres 16.2 with _iexact leads to psycopg2.errors.IndeterminateCollation → Postgres 16.2 with _iexact leads to IndeterminateCollation |
Thanks for the report, this looks like a bug in PostgreSQL. I couldn't find anything related with this change in PostgreSQL 16.2 release notes. I've checked a few ways to define that kind of expression in the GeneratedField
and it seems to be related to calling UPPER()
on strings:
models.GeneratedField(expression=models.Q(test__iexact='yes'), output_field=models.BooleanField(), db_persist=True)
crashes 💥- SQL:
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test"::text) = UPPER('yes')) STORED
- SQL:
models.GeneratedField(expression=lookups.Exact(Upper("test"), "YES"), output_field=models.BooleanField(), db_persist=True)
works ✅- SQL:
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = ('YES')) STORED
- SQL:
Removing ::text
doesn't change anything, it also crashes 💥:
"test_gen" boolean GENERATED ALWAYS AS (UPPER("test") = UPPER('yes')) STORED
Please reopen the ticket if you provide details that this is an intentional behavior change in PostgreSQL.
comment:4 by , 8 months ago
Cc: | added |
---|---|
Severity: | Normal → Release blocker |
Triage Stage: | Unreviewed → Accepted |
Re-opening as a release blocker for 5.0 at it's a bug in a new feature (GeneratedField
) after some sleuthing on #35368.
The culprit is this change Postgres >= 12.18, 13.14, 14.11, 15.6, 16.2 released on 2024-02-08.
Fix function volatility checking for GENERATED and DEFAULT expressions (Tom Lane)
These places could fail to detect insertion of a volatile function default-argument expression, or decide that a polymorphic function is volatile although it is actually immutable on the datatype of interest. This could lead to improperly rejecting or accepting a GENERATED clause, or to mistakenly applying the constant-default-value optimization in ALTER TABLE ADD COLUMN.
In essence the problem seems similar to #34955 as UPPER
can return different value depending on the collation and thus is not immutable per-se?
I've tried to come up with a workaround but I'm not sure what should be done. The following doesn't work either
UPPER("text" COLLATE "C") COLLATE "C" = UPPER('test' COLLATE "C") COLLATE "C"
so it's possible there might be a bug on the Postgres side as well? In all cases keeping this ticket open should bring visibility to the issue.
A work around in the mean time is likely to use explicit collations
comment:5 by , 8 months ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:8 by , 8 months ago
One option I have found (could be a bad idea) is to revert some of #3575. This was an optimisation where ILIKE
was removed in preference of using UPPER(field) LIKE UPPER('blah')
.
If we use ILIKE
I no longer get an error here. I guess the question is, the change of #3575 was implemented many years ago and the performance of Postgres in this case may have moved on.
Looks a bit like:
-
django/db/backends/postgresql/base.py
diff --git a/django/db/backends/postgresql/base.py b/django/db/backends/postgresql/base.py index e97ab6aa89..4e3f7b3658 100644
a b class DatabaseWrapper(BaseDatabaseWrapper): 154 154 "exact": "= %s", 155 155 "iexact": "= UPPER(%s)", 156 156 "contains": "LIKE %s", 157 "icontains": " LIKE UPPER(%s)",157 "icontains": "ILIKE %s", 158 158 "regex": "~ %s", 159 159 "iregex": "~* %s", 160 160 "gt": "> %s", … … class DatabaseWrapper(BaseDatabaseWrapper): 163 163 "lte": "<= %s", 164 164 "startswith": "LIKE %s", 165 165 "endswith": "LIKE %s", 166 "istartswith": " LIKE UPPER(%s)",167 "iendswith": " LIKE UPPER(%s)",166 "istartswith": "ILIKE %s", 167 "iendswith": "ILIKE %s", 168 168 } 169 169 170 170 # The patterns below are used to generate SQL pattern lookup clauses when -
django/db/backends/postgresql/operations.py
diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py index 4b179ca83f..af2463b1d6 100644
a b class DatabaseOperations(BaseDatabaseOperations): 172 172 else: 173 173 lookup = "%s::text" 174 174 175 # Use UPPER(x) for case-insensitive lookups; it's faster.176 if lookup_type in ("iexact", "icontains", "istartswith", "iendswith"):177 lookup = "UPPER(%s)" % lookup178 179 175 return lookup 180 176 181 177 def no_limit_value(self): -
tests/schema/tests.py
diff --git a/tests/schema/tests.py b/tests/schema/tests.py index 3a2947cf43..182e3486e0 100644
a b class SchemaTests(TransactionTestCase): 913 913 editor.create_model(GeneratedFieldContainsModel) 914 914 915 915 field = GeneratedField( 916 expression=Q(text__ contains="foo"),916 expression=Q(text__icontains="FOO"), 917 917 db_persist=True, 918 918 output_field=BooleanField(), 919 919 )
comment:10 by , 8 months ago
It is effectively a solution but I'm not convinced this will do more good than harm.
#3575 has been merged 16 years ago this means in between now and then thousands of projects were created and added a functional index on UPPER("col")
to make i(exact|contains|startwith)
use an index and the moment they upgrade to a minor version of 5.0 their database will start running slow queries as their indices will be unsuables.
On the other hand we have a bug in a newly introduced feature for a very particular use case that might be affecting only a few users (must use generated field, must be on a latest version of Postgres, must use i(exact|contains|startwith)
.
I appreciate the intent to solve this issue but I think we need to dig deeper to truly understand why this is happening before jumping to conclusions here as there are no true urgency to get things right here; the release blocker assignment is self-imposed and nothing prevents us from deferring a solution to this problem to a future 5.0 release if we can't understand why this is happening before the May release as for all we know if might be a bug in Postgres itself.
I tried reaching out on libera.chat#postgres IRC to get an answer but no one could answer me their (first time this happens) so I was planing to reach out to their mailing list this week but I might run out of time so if someone feels comfortable doing so please do.
To summarize I think we should understand why this is happening before taking any potential harmful action here. For all we know many other functions and lookups could be affected and this is just the tip of the iceberg.
comment:11 by , 8 months ago
Cc: | added |
---|
David, I think you have successfully engaged with the PostgreSQL team/devs in the past, leading to productive conversations. Would you have some availability to reach out to them again to seek their assistance in debugging this specific issue we're encountering with PostgreSQL >= 12.18, 13.14, 14.11, 15.6, 16.2?
comment:12 by , 8 months ago
Has patch: | unset |
---|
comment:13 by , 8 months ago
Resolution: | → needsinfo |
---|---|
Severity: | Release blocker → Normal |
Status: | new → closed |
Triage Stage: | Accepted → Unreviewed |
It is currently not clear whether the bug is on Postgres side or not. Until this is clear, we can keep the ticket in the "needsinfo" state as it best represents it's current status.
Reopening the ticket did bring it to the attention of others and has helped us look into this a little more, but even in a closed state, the ticket is still "visible"/searchable.
We can reopen once we better understand the root of the problem and it's clear that Django needs to implement changes.
comment:14 by , 3 months ago
We are also running into this issue after upgrading from Postgres 14.10 to 14.11, iexact
and icontains
in generated fields were replaced with similar iregex
queries as a workaround.
This was tested with django 5.0.2 and latest psycopg2 2.9.9. Changing to
_icontains
solved the issue, but we have quite a few with_iexact
....