Opened 16 months ago

Last modified 9 days ago

#34898 assigned Bug

Adding non-deterministic collations to unique CharFields crashes on PostgreSQL.

Reported by: Mariusz Felisiak Owned by: Tom Carrick
Component: Migrations Version: 4.2
Severity: Normal Keywords: PostgreSQL collation
Cc: Jacob Walls, Peter Thomassen, Ülgen Sarıkavak, Yiwei Gao, Sudarshan R Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Adding non-deterministic collations to unique CharFields crashes on PostgreSQL. A regression test:

  • tests/schema/tests.py

    diff --git a/tests/schema/tests.py b/tests/schema/tests.py
    index 68b6442794..ba8d9e99ec 100644
    a b class SchemaTests(TransactionTestCase):  
    14021402        )
    14031403        self.assertIn("field", self.get_uniques(CiCharModel._meta.db_table))
    14041404
     1405    @unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
     1406    @skipUnlessDBFeature(
     1407        "supports_collation_on_charfield",
     1408        "supports_non_deterministic_collations",
     1409    )
     1410    def test_add_collation_to_unique_charfield(self):
     1411        ci_collation = "case_insensitive"
     1412
     1413        def drop_collation():
     1414            with connection.cursor() as cursor:
     1415                cursor.execute(f"DROP COLLATION IF EXISTS {ci_collation}")
     1416
     1417        with connection.cursor() as cursor:
     1418            cursor.execute(
     1419                f"CREATE COLLATION IF NOT EXISTS {ci_collation} (provider = icu, "
     1420                f"locale = 'und-u-ks-level2', deterministic = false)"
     1421            )
     1422        self.addCleanup(drop_collation)
     1423
     1424        # Create the table.
     1425        with connection.schema_editor() as editor:
     1426            editor.create_model(AuthorWithUniqueName)
     1427        # Add db_collation.
     1428        old_field = AuthorWithUniqueName._meta.get_field("name")
     1429        new_field = CharField(max_length=255, unique=True, db_collation=ci_collation)
     1430        new_field.model = AuthorWithUniqueName
     1431        new_field.set_attributes_from_name("name")
     1432        with connection.schema_editor() as editor:
     1433            editor.alter_field(
     1434                AuthorWithUniqueName, old_field, new_field, strict=True
     1435            )
     1436
     1437        self.assertEqual(
     1438            self.get_column_collation(AuthorWithUniqueName._meta.db_table, "field"),
     1439            ci_collation,
     1440        )
     1441        self.assertIn("field", self.get_uniques(AuthorWithUniqueName._meta.db_table))
     1442
    14051443    @isolate_apps("schema")
    14061444    @unittest.skipUnless(connection.vendor == "postgresql", "PostgreSQL specific")
    14071445    @skipUnlessDBFeature(

Reported by Adam.

Change History (13)

comment:1 by Tom Carrick, 15 months ago

Interesting. I looked into it and it is indeed only this case that breaks.

If you add them both at once, it's fine. If you add the collation, then the unique, it's fine. But if it's unique, then you try to add the collation, boom.

As you might expect, this also happens for db_index.

However, it doesn't happen when using Meta indexes / constraints unless you have opclasses=["varchar_pattern_ops"], and if you remove this in the same migration as adding the collation, this also seems to work fine (I'm not sure if by luck of because Django first removes the constrain, then does other stuff, then adds the constraint purposefully).

Given that db_index is discouraged in the docs... and I'd make the same argument for doing the same for unique=True, maybe a note in the docs about this is enough?

I didn't look very hard to find a solution, but I think it should be possible by adding stuff to django.db.backends.postgresql.schema.DatabaseSchemaEditor._alter_field. I'll take a look.

comment:2 by Tom Carrick, 15 months ago

Owner: changed from nobody to Tom Carrick
Status: newassigned

comment:3 by Tom Carrick, 15 months ago

I looked into it a bit.

I think handling this in the schema module is a really bad idea, as we would be dropping and creating an index or constraint without the user's knowledge. I don't see another way to handle it.

The next option is to detect this when running makemigrations. I know very little about how makemigrations works so I may be totally wrong here but it seems like at the moment it doesn't really know anything about the database state. If we were to fix this bug there, we'd have to add a query to determine if the collation is deterministic. It would also need to have some knowledge of Postgres that seems not to be known anywhere else in this part of the code.

So I again come back to this: I think we should document this and not try to fix it in code, and perhaps also further discourage use of unique=True and db_index=True.

in reply to:  3 comment:4 by Mariusz Felisiak, 14 months ago

Replying to Tom Carrick:

I think handling this in the schema module is a really bad idea, as we would be dropping and creating an index or constraint without the user's knowledge. I don't see another way to handle it.

I don't think it's that bad. We've already don't create varchar_pattern_ops/text_pattern_ops indexes when non-deterministic db_collation is set, so removing them before adding a collation would be somehow consistent with the current behavior.

comment:5 by Tom Carrick, 14 months ago

I agree that it's consistent. What I don't think is a good idea though, is dropping and recreating the entire index. This could take a fairly long time on some databases, and it can be quite surprising that a change to the collation would cause another operation that takes some time.

I can have a look though. Maybe we can at least do it concurrently.

in reply to:  5 comment:6 by Mariusz Felisiak, 14 months ago

Replying to Tom Carrick:

I agree that it's consistent. What I don't think is a good idea though, is dropping and recreating the entire index. This could take a fairly long time on some databases, and it can be quite surprising that a change to the collation would cause another operation that takes some time.

I can have a look though. Maybe we can at least do it concurrently.

Can we recreate them? For now, we completely skip varchar_pattern_ops/text_pattern_ops indexes when non-deterministic db_collation is set. Is it not enough to remove them before adding a collation?

comment:7 by Tom Carrick, 14 months ago

The opposite case works fine because there's no index yet. The user is adding an index, so it's expected that the index gets created, and of course we need it without varchar_pattern_ops.

In this case though, the user already has a varchar_pattern_ops index. Postgres offers no way to alter an existing index, it's drop and recreate only, so we need to drop the index, change the collation, then add the index again but without varchar_patterm_ops. My concern here is not that it's difficult, it's that we're performing potentially expensive operations without informing the user about them (as they don't show up in the migration file, or anywhere else). Ideally the user should know they can't use varchar_pattern_ops with a non-deterministic collation, but db_index and unique are quite opaque about what is actually happening, relative to the more explicit Meta.indexes and Meta.constraints, so it's hard for them to know (unless it's very explicitly documented somewhere) what the consequences are of changing the collation in this scenario.

comment:8 by Mariusz Felisiak, 14 months ago

Ah, right. TBH, the current NotSupportedError: nondeterministic collations are not supported for operator class "varchar_pattern_ops" error is not so bad. Maybe it's enough to add a warning in docs (as you suggested).

comment:9 by Jacob Walls, 9 months ago

Cc: Jacob Walls added

comment:10 by Peter Thomassen, 5 months ago

Cc: Peter Thomassen added

This is a regression caused by the commit which removes django.contrib.postgres.fields.CIText/CICharField/CIEmailField/CITextField (04eb1b4567c96ccb167c16a95ca12c336b0c791b).

Using the code immediately before that commit (i.e., using 6e4e5523a8f40b63f3e74889266a7d638f6364dc), the ALTER statement setting the collation is preceded by DROP INDEX IF EXISTS "desecapi_user_email_fa6ced42_like".

When applying the next commit (04eb1b4567c96ccb167c16a95ca12c336b0c791b), this DROP INDEX statement is gone, and the ALTER statement then causes nondeterministic collations are not supported for operator class "varchar_pattern_ops".

We used to have a unique CIEmailField. When that was deprecated, we created a migration that turns this into a standard EmailField, but a case-insensitive collation; the uniqueness properties were not changed. See: https://github.com/desec-io/desec-stack/blob/41ee90eecb4f54d141956b33066ae9ef69442120/api/desecapi/migrations/0031_alter_user_email.py#L24

As a result, it's not possible to initialize the database when running on Django 5.1.

So, I don't think this is merely a "don't do this, it's not supported" case; it is in fact a regression from 5.0 to 5.1.

How do you advise to proceed, both with this bug generally, and with our Django upgrade in particular? Thanks!

comment:11 by Ülgen Sarıkavak, 4 months ago

Cc: Ülgen Sarıkavak added

comment:12 by Yiwei Gao, 3 months ago

Cc: Yiwei Gao added

comment:13 by Sudarshan R, 9 days ago

Cc: Sudarshan R added
Note: See TracTickets for help on using tickets.
Back to Top