Opened 7 months ago

Closed 7 months ago

Last modified 7 months ago

#34955 closed Cleanup/optimization (fixed)

Make Concat() use the database operator `||` on PostgreSQL.

Reported by: Paolo Melchiorre Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: field, database, generated, output_field
Cc: Simon Charette, David Wobrock Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The django.db.models.functions.Concat database ORM function in the PostgreSQL database backend use the CONCAT SQL function.

Unfortunately, the CONCAT function in PostgreSQL is not IMMUTABLE and cannot be used in functional indexes or generated fields.

For example, if you use Concat("first_name", Value(" "), "last_name") in a GeneratedField you get the following error:

psycopg.errors.InvalidObjectDefinition: generation expression is not immutable

I propose to make the string concatenation operator || available in the PostgreSQL database backend in addition to the already available string concatenation function CONCAT.

Change History (18)

comment:1 by Mariusz Felisiak, 7 months ago

I'm not convinced, there are other built-in functions that could be implemented differently and currently cannot be used in GeneratedFields, e.g. SHA512, SHA384, SHA224, SHA1, or MD5 on Oracle. I'm pessimistic about adding multiple new functions for such cases. This may open a can of worms. The current thread is to keep Django a core framework, not providing every utility which might be useful.

in reply to:  1 comment:2 by Paolo Melchiorre, 7 months ago

Type: New featureBug

Replying to Mariusz Felisiak:

I'm not convinced, there are other built-in functions that could be implemented differently and currently cannot be used in GeneratedFields

The string concatenation operator || is not an exotic function specific to PostgreSQL, but an operator already used in other database backends, so your concern about the possible increase in requests to add new database functions to databases does not apply.

I would add that I would have liked to open this issue since the introduction of functional indexes in Django 3.2, but I never did so due to lack of time.

I decided to open this issue now after seeing that the choice to use the non-immutable concatenation function CONCAT instead of the immutable concatenation operator || in the PostgreSQL backend also affects the case of GeneratedFields.

I'm pessimistic about adding multiple new functions for such cases. This may open a can of worms. The current thread is to keep Django a core framework, not providing every utility which might be useful.

Among other things, the string concatenation example is the most used to explain Generated Fields (e.g fly.io, djangochat.com, paulox.net, ...), but, due to this bug in the PostgreSQL backend database, it only works on a subset of Django backend databases, unlike the current thread of keeping Django a core framework, as you reiterated.

comment:3 by Mariusz Felisiak, 7 months ago

Type: BugNew feature

... databases does not apply.

I don't understand why TBH. Do you want to say that it doesn't apply because it's PostgreSQL and we should treat it specially? I would argue with that.

This is definitively not a bug. Concat() is implemented this way from the very beginning and database limitations for GeneratedField are documented.

comment:4 by Mariusz Felisiak, 7 months ago

Moreover, there is no easy way to implement Concat() on PostgreSQL such that it uses || , is immutable, and work the same way as now, because it uses COALESCE() which is also considered mutable by PostgreSQL. I understand that you want to use new goodies on PostgreSQL but it's a database limitation, not a Django fault.

in reply to:  4 comment:5 by Paolo Melchiorre, 7 months ago

Replying to Mariusz Felisiak:

... I understand that you want to use new goodies on PostgreSQL ...

This isn't about me wanting to play with the latest Django features with PostgreSQL, I solved the problem in my code some time ago by writing a function that uses || instead of CONCAT.

The problem, however, remains non-expert users to whom we reiterate that the generated fields are compatible with all supported databases, and then the first example we do, uses strings concatenations, that we already know will not work with at least one of these databases out-of-the-box.

Moreover, there is no easy way to implement Concat() on PostgreSQL such that it uses || , is immutable, and work the same way as now, because it uses COALESCE() which is also considered mutable by PostgreSQL ... it's a database limitation, not a Django fault.

On the contrary, in PostgreSQL, you can concatenate strings into a generated column without any problem.

See for example:

CREATE TABLE "samples_fullnames" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "first_name" varchar(150) NOT NULL,
    "last_name" varchar(150) NOT NULL,
    "full_name" text GENERATED ALWAYS AS (
        "first_name"::text || ' '::text || "last_name"::text
    ) STORED
);

But also: (mimicking the SQL code generated by the ORM for SQLite)

CREATE TABLE "samples_names" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "first_name" varchar(150) NOT NULL,
    "last_name" varchar(150) NOT NULL,
    "full_name" text GENERATED ALWAYS AS (
        COALESCE(("first_name")::text, '') || COALESCE(COALESCE((' ')::text, '') || COALESCE(("last_name")::text, ''), '')
    ) STORED
);

If there is a limitation I would say it lies in how the Django ORM generates the SQL code for PostgreSQL.

CREATE TABLE "samples_user" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "first_name" varchar(150) NOT NULL,
    "last_name" varchar(150) NOT NULL,
    "full_name" text GENERATED ALWAYS AS (
        CONCAT(("first_name")::text, (CONCAT((' ')::text, ("last_name")::text))::text)
    ) STORED
);

in reply to:  3 comment:6 by Paolo Melchiorre, 7 months ago

Replying to Mariusz Felisiak:

... databases does not apply.

I don't understand why TBH.

You quoted my sentence so much that I had to go and reread it :-D

Do you want to say that it doesn't apply because it's PostgreSQL and we should treat it specially? I would argue with that.

My intention was not to ask for special treatment for PostgreSQL, but on the contrary to ask that it not be the only one treated differently in this particular case.

I tried to explain that the || operator is not a function, but an operator, and that among other things it has already been used in other supported backend databases, so requesting that it ALSO be used in PostgreSQL doesn't seem right to me. a special request.


This is definitively not a bug. Concat() is implemented this way from the very beginning and database limitations for GeneratedField are documented.

I don't know the reasons for using CONCAT instead of ||, but I reiterate that this choice was already a problem for functional indexes, and now generated fields have been added. It therefore seems to me that that choice can at least be reevaluated.

Note: I realize that explaining myself in writing in English could be a limitation of mine, and in case what I write is misunderstood or arrogant, I reiterate here that this is not my intention. :-)

comment:7 by David Sanders, 7 months ago

Throwing some fuel on the fire with a few fyi points

  • Firstly just want to make it clear that pg marking concat() not immutable is a feature (by design) not a bug :D (it's "stable" but not "immutable" because settings can still affect the output)
  • Apparently || is ANSI (TIL)
  • Even though it is ANSI you still have to set mode PIPES_AS_CONCAT [1] on MySQL
  • Concat() doesn't actually use Coalesce() for pg because it doesn't need it, NULLs are ignored with CONCAT() on pg [2]
  • Unfortunately we can't change Concat() to use || because on pg it _does_ evaluate NULLs. If it were to use || then we _would_ need to make use of Coalesce() to make it consistent.


I'm kinda hoping though that we do get a (separate) immutable concat for pg because concatenating strings seems to be a common desire, at least for me :D But at the same time I recognise how easy it is to create one and agree with the whole Django-isnt-a-kitchen-sink-framework.

Lastly can I just say that (somewhat jokingly but semi-serious)... we may be missing the perfect opportunity to make use of the __floordiv__() operator method [3] ;D

GeneratedField(expression=F('first_name') // F('last_name'))

[1] https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html#sqlmode_pipes_as_concat
[2] https://www.postgresql.org/docs/current/functions-string.html
[3] https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types

Edit: I mean NOT immutable.

Last edited 7 months ago by David Sanders (previous) (diff)

comment:8 by Mariusz Felisiak, 7 months ago

If casting to text data-type fixes the issue then I'd consider documenting this workaround instead of adding a PostgreSQL-specific function (where text casting would still be required). For example, adding the following sentence:

"In order to make an IMMUTABLE expression MUTABLE on PostgreSQL, you can wrap the expression with Cast(), e.g. Cast(Concat(F("first_name"), Value(" "), F("last_name")), TextField())."

in the warning about IMMUTABLE functions on PostgreSQL.

in reply to:  8 comment:9 by Paolo Melchiorre, 7 months ago

Replying to Mariusz Felisiak:

If casting to text data-type fixes the issue then I'd consider documenting this workaround instead of adding a PostgreSQL-specific function (where text casting would still be required). For example, adding the following sentence:

"In order to make an IMMUTABLE expression MUTABLE on PostgreSQL, you can wrap the expression with Cast(), e.g. Cast(Concat(F("first_name"), Value(" "), F("last_name")), TextField())."

in the warning about IMMUTABLE functions on PostgreSQL.

Unfortunately the SQL code generate by the Django ORM is still broken in PostgreSQL.

I wrote the class as susggested by Mariusz:

class Person(models.Model):
    first_name = models.CharField()
    last_name = models.CharField()
    full_name = models.GeneratedField(
        expression=Cast(
            Concat("first_name", V(" "), "last_name"),
            models.TextField(),
        ),
        db_persist=True,
        output_field=models.TextField(),
    )

But the migrations generate this SQL code:

CREATE TABLE "samples_person" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "first_name" varchar NOT NULL,
    "last_name" varchar NOT NULL,
    "full_name" text GENERATED ALWAYS AS (
        (
            CONCAT(
                ("first_name")::text,
                (CONCAT((' ')::text, ("last_name")::text))::text
            )
        )::text
    ) STORED
);

that generated this error message psycopg.errors.InvalidObjectDefinition: generation expression is not immutable

comment:10 by David Sanders, 7 months ago

I need to clarify an incorrect assumption I was making in my earlier comment.

COALESCE() is acceptable for use in generated fields & function indexes. This is because it's not a "real" function, it's an expression.

This means that Paolo's proposal to make Concat() on Postgres more like it does on SQLite an acceptable change keeping behaviour consistent.

comment:11 by Simon Charette, 7 months ago

Cc: Simon Charette added

Unfortunately we can't change Concat() to use || because on pg it _does_ evaluate NULLs. If it were to use || then we _would_ need to make use of Coalesce() to make it consistent.

That's the crux of the issue and why we had to go back and forth on how it was implemented on Postgres when adding support for functional index and constraints. There are few tickets related to this such one such as #29582 and #30385.

the migrations generate this SQL code (code snipped with many unnecessary ::text)

This is a side effect of 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca (#33308) to accommodate support for psycopg>3 server-side bindings. See 779cd28acb1f7eb06f629c0ea4ded99b5ebb670a (#34840) which removed many of them but some still remain.

Based on #30385 I think that the best way forward here is to stop using CONCAT entirely and use a strategy where it relies on || with Coalesce and Cast appropriately when dealing with expressions that are nullable and/or non-text. The challenge here is that we can't trust .null as the our output field resolving strategy doesn't carry null affinity. If we want to make sure Concat maintains is previous behaviour on Postgres we must wrap every source expression in Coalesce.

Normally when we change the SQL generated by an expression it leaves all index generated with the previous implementation unusable and forces users to re-create them but in this case it wasn't possible to even create such index as Concat is not IMMUTABLE so I don't think that's an issue.

in reply to:  11 comment:12 by Paolo Melchiorre, 7 months ago

Replying to Simon Charette:

Unfortunately we can't change Concat() to use || because on pg it _does_ evaluate NULLs. If it were to use || then we _would_ need to make use of Coalesce() to make it consistent.

That's the crux of the issue and why we had to go back and forth on how it was implemented on Postgres when adding support for functional index and constraints. There are few tickets related to this such one such as #29582 and #30385.

Thanks, for pointing us to these old issues.

the migrations generate this SQL code (code snipped with many unnecessary ::text)

This is a side effect of 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca (#33308) to accommodate support for psycopg>3 server-side bindings. See 779cd28acb1f7eb06f629c0ea4ded99b5ebb670a (#34840) which removed many of them but some still remain.

Are you suggesting opening an issue to remove all the remaining unnecessary CAST?

Based on #30385 I think that the best way forward here is to stop using CONCAT entirely and use a strategy where it relies on || with Coalesce and Cast appropriately when dealing with expressions that are nullable and/or non-text. The challenge here is that we can't trust .null as the our output field resolving strategy doesn't carry null affinity. If we want to make sure Concat maintains is previous behaviour on Postgres we must wrap every source expression in Coalesce.

I understand that GeneratedField is only the last one affected by the fact that CONCAT is not IMMUTABLE

I agree with the plan to replace CONCAT everywhere with || given how many problems it would solve.

Normally when we change the SQL generated by an expression it leaves all index generated with the previous implementation unusable and forces users to re-create them but in this case it wasn't possible to even create such index as Concat is not IMMUTABLE so I don't think that's an issue.

Better this way I would say, there will be no indexes that the user needs to re-create.

comment:13 by Mariusz Felisiak, 7 months ago

Summary: Make available the string concatenation operator `||` for PostgreSQLMake Concat use the database operator `||` on PostgreSQL.
Triage Stage: UnreviewedAccepted
Type: New featureCleanup/optimization

comment:14 by Mariusz Felisiak, 7 months ago

Summary: Make Concat use the database operator `||` on PostgreSQL.Make Concat() use the database operator `||` on PostgreSQL.

comment:15 by Simon Charette, 7 months ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:16 by Simon Charette, 7 months ago

Has patch: set

comment:17 by David Wobrock, 7 months ago

Cc: David Wobrock added

comment:18 by Mariusz Felisiak <felisiak.mariusz@…>, 7 months ago

Resolution: fixed
Status: assignedclosed

In 6364b6ee:

Fixed #34955 -- Made Concat() use
operator on PostgreSQL.

This also avoids casting string based expressions in Concat() on
PostgreSQL.

Last edited 7 months ago by Mariusz Felisiak (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top