Opened 3 months ago

Closed 6 weeks ago

#35778 closed Cleanup/optimization (fixed)

Use native JSONObject on Postgres 16+ with server side bindings

Reported by: john-parton Owned by: john-parton
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette, Sarah Boyce, Mariusz Felisiak, Sage Abdullah Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by john-parton)

JSONObject on Postgres 16 with server side bindings recently resulted in a crash. The most recent fix is to fallback to the use of jsonb_build_object on postgres 16 when using server side bindings.

See https://code.djangoproject.com/ticket/35734
And https://github.com/django/django/pull/18549

It is possible to use the native JSONObject with server side bindings, but it requires a little bit of use of cast.

See https://github.com/django/django/commit/0f53d48115ba0295cefea33512dc146caad39443

There are two minor issues:

  1. Should Postgres 16 *without* server-side bindings use "cast" even though it's not strictly necessary? It it desirable or preferable to keep the generated SQL the same when toggling the server-side binding feature? I mentioned digging through logs as one example where it might matter.
  2. Use of both cast and native json will require at least a minor change to escaping. This is because we use the double-colon operator to cast and the native json syntax uses a single colon to separate key-value pairs. This creates a parsing ambiguity which results in a syntax error (on at least one version of postgres). For solutions, they're all pretty similar

Options for minor issue 2:

  1. Update the as_native function to wrap the keys in parenthesis, effectively resolving the ambiguity. (This does raise yet another question, a question within a question: should we go ahead and wrap the keys in parenthesis on ALL backends? I think Oracle doesn't necessary require that for example.)
  2. Update the Cast function to always wrap values in parenthesis in all contexts. This seems like overkill.
  3. Change postgres from using the double-colon operator to the CAST(x AS type) syntax. This also seems like overkill, and results in sql being generated that is less postgres-y, if that makes sense.

Change History (20)

comment:1 by john-parton, 3 months ago

Description: modified (diff)

comment:2 by john-parton, 3 months ago

My general opinion/gut feeling is:

  1. Always cast, even when server-side bindings is off. It's one fewer conditional which means the function is lower complexity.
  2. Wrap keys in parenthesis for all backends that use the as_native method. Again, fewer conditionals, which means lower complexity.

But I'm happy to be swayed.

comment:3 by john-parton, 3 months ago

In order to help facilitate discussion, I went ahead and opened a pull request with my preferred changes: https://github.com/django/django/pull/18622

When there's consensus on the the two issues:

  1. Whether to cast keys on postgres 16+ without server side bindings; and
  2. The best way to ensure that keys are escaped to avoid the colon/double-colon parsing error

I'll update the PR and we can proceed.

comment:4 by john-parton, 3 months ago

Has patch: set

comment:5 by john-parton, 3 months ago

Owner: set to john-parton
Status: newassigned

comment:6 by Natalia Bidart, 3 months ago

Cc: Simon Charette Sarah Boyce Mariusz Felisiak added
Triage Stage: UnreviewedAccepted

Accepting following the referenced PR conversation. I'm unclear about the best solution given the presented alternatives. Sarah, Simon, Mariusz: would you have a recommendation?

comment:7 by Simon Charette, 3 months ago

I'm of opinion that we should try to generate the same SQL with and without server-side bindings enabled. That should put us in a better place for eventual support for prepared statements (#20516).

As for the parenthesis wrapping I think that another option that wasn't discussed is to use a Func(expression, template="CAST(%(expressions)s AS text)") instead of Cast in this particular case to avoid the parsing ambiguity in as_postgres while leaving Cast.as_postgres alone.

comment:8 by Sage Abdullah, 3 months ago

Cc: Sage Abdullah added

comment:9 by john-parton, 3 months ago

Good info with prepared statements. That meshes with my intuition.

Regarding Func(expression, template="CAST(%(expressions)s AS text)"), that has a bit of code smell to me, abstraction inversion.

I sort of question the wisdom of using the :: operation on postgres at all. The comment https://github.com/django/django/blob/6765b6adf924c1bc8792a4a454d5a788c1abc98e/django/db/models/functions/comparison.py#L52-L61 suggests it makes the SQL "more readable", which is true in the context that you're more familiar with postgres than any RDMS. I'd argue that the Cast(...) function is possibly "more readable" than the double colon in the case that the user is more familiar with other databases or just SQL in general, and *relevant to this issue* it is more readable in the case that there are colons used elsewhere in the expression, i.e. JSON_OBJECT. (It's so unreadable with the double colons that even the parser can't "read" it, haha)

I think an argument can be made to just delete those 10 lines and have the output SQL closer to standards. It seems like part of the underlying philosophy of moving postgres from JSONB_BUILD_OBJECT to JSON_OBJECT is to settle on standards, so in that sense, reworking Cast() is in line with the overall spirit of the project.

Additional Note: If you can delete ten lines of code from the codebase and all the tests still pass, that indicates to me the possibility of some kind of defect or design problem, although not necessarily an error.

Last edited 3 months ago by john-parton (previous) (diff)

comment:10 by john-parton, 3 months ago

Pros in favor or removing :: in favor of CAST

  • Less code
  • Generated SQL on Postgres will match other RDMS output in many cases
  • It's more explicit
  • "Standardization"

Cons

  • It's not _strictly_ required to solve this specific issue
  • Users more familiar with Postgres likely would find the double-colon notation more readable in most circumstances
  • Perhaps there's a goal to make the generated SQL look as if it were written by an experienced programmer writing in that specific dialect. "Idiomatic" postgres?

comment:11 by Simon Charette, 3 months ago

Another con of changing :: at this point is that will change the SQL generated in any usage Cast in expression indices which might cause them not to be usable anymore when users upgrade their version of Django.

For example, say that you have a model of the form

class People(models.Model):
    phone_number = models.IntegerField()

    class Meta:
        indexes = [
            Index(name="phone_number_str", Cast("phone_number", models.TextField()))
        ]

Then the resulting index will be on (phone_number)::text and queries of the form filter(phone_number__startswith="123") will generate WHERE (phone_number)::text LIKE '123%' will be able to use the index.

If we change Cast to use CAST instead of :: I'm not entirely confident that the Postgres planer will be smart enough to use the pre-existing (phone_number)::text index for queries of the form CAST(phone_number AS text) LIKE '123%'.

From past experience the expression has to exactly match for the planner to consider the index as potential candidate so unless (expr)::type and CAST(expr AS type) result in the same AST I doubt we can safely change it without forcing users to rebuild all of their indices making use of Cast which seems way beyond the scope of this ticket.

Last edited 3 months ago by Simon Charette (previous) (diff)

comment:12 by john-parton, 3 months ago

Darn. That's a pretty massive con, I would agree. Let's say that's a non-option then. Thank you for your insight.

So with respect to my open PR, it sounds like we should use the double-colon as well, in the (probably unlikely case) that the field in question has an index on it. Based on what you said, if we use "CAST(%(expression)s)" it might result in an index a user defined with the expectation it be used, going unused due to the text not being exactly the same in the query. Make sense?

Quick edit: As I typed this out, I realized I'm not sure when Postgres would even _use_ an index in the context of JSONObject. That seems extremely unlikely actually.

Last edited 3 months ago by john-parton (previous) (diff)

comment:13 by john-parton, 3 months ago

In the (probably unlikely) case that a user has created an index using JSONObject, isn't it also possible that migrating from JSONB_BUILD_OBJECT to JSON_OBJECT might break those indexes in much the same way as your Cast example?

class People(models.Model):
    phone_number = models.IntegerField()

    class Meta:
        indexes = [
            Index(name="phone_number_obj", JSONObject(phone_number="phone_number"))
        ]

If the migration was run a while ago, the index would use the JSONB_BUILD_OBJECT function, but now queries will use the JSON_OBJECT function.

Is it worth noting SOMEWHERE in a release note or something? It seems really unlikely to me that someone would index their data this way.

comment:14 by Simon Charette, 3 months ago

Changing to JSONB_BUILD_OBJECT would also be affected but as you've pointed out it's unlikely and was less unlikely than usage of Cast. I think it might be worth a release notes yes, IIRC we did it when we changed how Concat behaved.

comment:15 by john-parton, 3 months ago

I was going to take a crack at adding a release note, modeled after the release note you referenced for Concat, but I'm not sure I've found the correct release note. Which one did you have in mind?

comment:16 by john-parton, 2 months ago

I believe this is pretty much ready to go. If we need a release note, I would appreciate a little help.

comment:17 by Natalia Bidart, 2 months ago

Thank you John, this is indeed in the review queue and we'll get to this as soon as we can.

comment:18 by Sarah Boyce, 6 weeks ago

Type: New featureCleanup/optimization

comment:19 by Sarah Boyce, 6 weeks ago

Triage Stage: AcceptedReady for checkin

comment:20 by Sarah Boyce <42296566+sarahboyce@…>, 6 weeks ago

Resolution: fixed
Status: assignedclosed

In 78c9a270:

Fixed #35778 -- Used JSON_OBJECT database function on PostgreSQL 16+ with server-side bindings.

Note: See TracTickets for help on using tickets.
Back to Top