Opened 4 years ago

Closed 4 years ago

Last modified 17 months ago

#31956 closed Bug (fixed)

QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.

Reported by: Marc DEBUREAUX Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: 3.1
Severity: Release blocker Keywords:
Cc: sage, Simon Charette, Thiago Bellini Ribeiro Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Marc DEBUREAUX)

When using ORDER BY clause after an aggregation on a JSONField as described below:

MyModel.objects.values('jsonfield__subfield').annotate(count=Count('id')).order_by('jsonfield__subfield')

I got the following error:

column "myapp_mymodel.jsonfield" must appear in the GROUP BY clause or be used in an aggregate function

The SQL query seems OK at first glance:

SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC

But it fails on PostgreSQL 12+ because ORDER BY clause doesn't include ::text casting. Instead the query must be:

SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY (("mymodel"."jsonfield" -> 'subfield'))::text ASC

Or without casting at all (prone to error?):

SELECT ("mymodel"."jsonfield" -> 'subfield'), COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY ("mymodel"."jsonfield" -> 'subfield') ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC

Change History (18)

comment:1 by Marc DEBUREAUX, 4 years ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 4 years ago

Cc: sage Simon Charette added
Owner: changed from nobody to Mariusz Felisiak
Severity: NormalRelease blocker
Status: newassigned
Summary: JSONField doesn't support ORDER BY clauses after aggregationQuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.
Triage Stage: UnreviewedAccepted

Thanks for this ticket. I was able to reproduce the crash on PostgreSQL with a JSONField with a custom decoder. It's caused by the different format used in this case. Can you confirm that the following patch fix this issue for you?

diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
index a9768919a2..af0419356a 100644
--- a/django/db/models/expressions.py
+++ b/django/db/models/expressions.py
@@ -1211,6 +1211,7 @@ class OrderBy(BaseExpression):
                 template = '%%(expression)s IS NOT NULL, %s' % template
         connection.ops.check_expression_support(self)
         expression_sql, params = compiler.compile(self.expression)
+        expression_sql, params = self.expression.select_format(compiler, expression_sql, params)
         placeholders = {
             'expression': expression_sql,
             'ordering': 'DESC' if self.descending else 'ASC',

comment:3 by Mariusz Felisiak, 4 years ago

We cannot use proposed solution because it breaks ordering for non-text values.

comment:4 by Mariusz Felisiak, 4 years ago

Has patch: set

comment:5 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 2210539:

Refs #31956 -- Added test for ordering by JSONField with a custom decoder.

comment:6 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 0be51d2:

Fixed #31956 -- Fixed crash of ordering by JSONField with a custom decoder on PostgreSQL.

Thanks Marc Debureaux for the report.
Thanks Simon Charette, Nick Pope, and Adam Johnson for reviews.

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 655e1ce6:

[3.1.x] Fixed #31956 -- Fixed crash of ordering by JSONField with a custom decoder on PostgreSQL.

Thanks Marc Debureaux for the report.
Thanks Simon Charette, Nick Pope, and Adam Johnson for reviews.

Backport of 0be51d2226fce030ac9ca840535a524f41e9832c from master

comment:8 by Agris Ameriks, 4 years ago

Hello,
I would ask to reopen this ticket.

The fix breaks default functionality of JSONField.

Let me explain the issue.

I have such an model:

class Instance(models.Model):
    title = models.CharField('Title', max_length=255, blank=True)
    responses = models.JSONField(default=dict)

Code:

        query = "SELECT id, responses FROM advancedform_instance"
        with connection.cursor() as cursor:
            cursor.execute(query)
            data = cursor.fetchall()

In django 3.1 responses is returned as a dict.
In django 3.1.1 it is returned as a string.

If I remove the code that was made to fix the issue, it works correctly.

comment:9 by Agris Ameriks, 4 years ago

Resolution: fixed
Status: closednew

comment:10 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: newclosed

Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

in reply to:  10 comment:11 by Agris Ameriks, 4 years ago

Sorry, I will add comment to already created bug for the regression (#31973).
Yes, I simplified the query that I have. It is much complex and it's much harder to implement it in ORM.
I think that JSONField should return the dict either way and it was working fine in 3.1.
Also the bug is related to "with a custom decoder", but in current scenario I'm not specifying different decoder.

Yes, I can use json.loads() and I was using it till Django started supporting JSONField with built-in decoder (not sure which version, but long time ago). I don't think that it is ok to return to "unsupporting" default behavior.

Replying to felixxm:

Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

comment:12 by Mariusz Felisiak, 4 years ago

Agris, see #31991.

comment:13 by Thiago Bellini Ribeiro, 4 years ago

Cc: Thiago Bellini Ribeiro added

comment:14 by GitHub <noreply@…>, 4 years ago

In 438b85df:

Refs #31956 -- Doc'd consequences of disabling psycopg2's JSONB typecaster.

Follow up to 0be51d2226fce030ac9ca840535a524f41e9832c.

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 6b16623b:

[3.1.x] Refs #31956 -- Doc'd consequences of disabling psycopg2's JSONB typecaster.

Follow up to 0be51d2226fce030ac9ca840535a524f41e9832c.
Backport of 438b85dfab4f16a2e709e2bcdbfefecd7bfee89e from master

in reply to:  10 comment:16 by Andrew, 4 years ago

How did this *manage* to break basic querying?

Replying to Mariusz Felisiak:

Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads() on fetched data.

This is just silly. Yes, there are *obviously* reasons to use raw sql over an ORM in some cases, this is not new. Putting a tiny note in a release that no one will see isn't enough. This needs to be documented front and center in the raw sql main docs and the postgres areas, because I did read those and no hint.

"just use x" is a bad solution to something that broke basic functionality, and isn't necessarily even easy to do. It turns what is a simple automapping into manual mapping, at best. I guess we could implement something that parses python typehints on result classes detects if the cursor has been effed with and is returning the wrong data type, though. How many more data types do you expect to be broken on purpose in the future?

Also, the workaround is just to add ::json to any jsonb columns in raw queries. So this is only broken for HALF types?

It feels like an easy fix for one problem which ignores some other underlying issue, breaking use cases that a few people don't personally use. If I want a query to work in the future *as written* am I going to have to start creating raw psycopg2 cursors? Because those still work!

Just to add, this is broken on a brand new django project with a single table. Not sure how custom json decoders factor into this.

Last edited 4 years ago by Andrew (previous) (diff)

comment:17 by nitishxp, 3 years ago

I agree with @Andrew my env also broked with no clue what happend.

comment:18 by Michael McCartney, 17 months ago

For anyone that falls into the "why isn't it a pythonic object already?" pitfall - We've been able to achieve our existing patterns, post this change, using:

  1. Take complete control over the psycopg2.extras.register_default_jsonb function, providing our own globally registered load call. Note: You can supply an alternate JSON loads function to the _original_jsonb_function as required and it will stick (hooray!)
import psycopg2

_original_jsonb_function = None
def _handle_jsonb():
    """
    Workaround for pipeline-altering solution in:

        https://code.djangoproject.com/ticket/31956
    """
    global _original_jsonb_function

    def _ignore_command(*args, **kwargs):
        pass

    if _original_jsonb_function is None:
        _original_jsonb_function = psycopg2.extras.register_default_jsonb
        # _original_jsonb_function(globally=True, loads=...) if required
        psycopg2.extras.register_default_jsonb = _ignore_command

class CoreConfig(AppConfig):
    name = 'core'
    verbose_name = 'Core'

    def ready(self):
        _handle_jsonb()
  1. Overloading the JSONField model and returning the data unharmed. This is key to allow both raw and ORM queries to operate as expected. You'll need to replace all instances of models.JSONField in your application.
class DirectJSONField(models.JSONField):
    """ ... """
    def from_db_value(self, value, expression, connection):
        if isinstance(value, (dict, list)):
            return value
        return super().from_db_value(value, expression, connection)
  1. (alternate) If you're not inclined to replace all of your JSONFields, you can monkey patch the class.
def from_db_value(self, value, expression, connection):
    if isinstance(value, (dict, list)):
        return value
    return super().from_db_value(value, expression, connection)

models.JSONField.from_db_value = from_db_value
Last edited 17 months ago by Michael McCartney (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top