#31956 closed Bug (fixed)
QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL.
Reported by: | Marc DEBUREAUX | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Release blocker | Keywords: | |
Cc: | sage, Simon Charette, Thiago Bellini Ribeiro | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When using ORDER BY clause after an aggregation on a JSONField as described below:
MyModel.objects.values('jsonfield__subfield').annotate(count=Count('id')).order_by('jsonfield__subfield')
I got the following error:
column "myapp_mymodel.jsonfield" must appear in the GROUP BY clause or be used in an aggregate function
The SQL query seems OK at first glance:
SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC
But it fails on PostgreSQL 12+ because ORDER BY clause doesn't include ::text casting. Instead the query must be:
SELECT (("mymodel"."jsonfield" -> 'subfield'))::text, COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY (("mymodel"."jsonfield" -> 'subfield'))::text ORDER BY (("mymodel"."jsonfield" -> 'subfield'))::text ASC
Or without casting at all (prone to error?):
SELECT ("mymodel"."jsonfield" -> 'subfield'), COUNT("mymodel"."id") AS "id_count" FROM "mymodel" GROUP BY ("mymodel"."jsonfield" -> 'subfield') ORDER BY ("mymodel"."jsonfield" -> 'subfield') ASC
Change History (18)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
comment:2 by , 4 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Severity: | Normal → Release blocker |
Status: | new → assigned |
Summary: | JSONField doesn't support ORDER BY clauses after aggregation → QuerySet.order_by() chained with values() crashes on JSONField with a custom decoder on PostgreSQL. |
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 4 years ago
We cannot use proposed solution because it breaks ordering for non-text values.
comment:8 by , 4 years ago
Hello,
I would ask to reopen this ticket.
The fix breaks default functionality of JSONField.
Let me explain the issue.
I have such an model:
class Instance(models.Model): title = models.CharField('Title', max_length=255, blank=True) responses = models.JSONField(default=dict)
Code:
query = "SELECT id, responses FROM advancedform_instance" with connection.cursor() as cursor: cursor.execute(query) data = cursor.fetchall()
In django 3.1 responses is returned as a dict.
In django 3.1.1 it is returned as a string.
If I remove the code that was made to fix the issue, it works correctly.
comment:9 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
follow-ups: 11 16 comment:10 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use json.loads()
on fetched data.
comment:11 by , 4 years ago
Sorry, I will add comment to already created bug for the regression (#31973).
Yes, I simplified the query that I have. It is much complex and it's much harder to implement it in ORM.
I think that JSONField should return the dict either way and it was working fine in 3.1.
Also the bug is related to "with a custom decoder", but in current scenario I'm not specifying different decoder.
Yes, I can use json.loads()
and I was using it till Django started supporting JSONField with built-in decoder (not sure which version, but long time ago). I don't think that it is ok to return to "unsupporting" default behavior.
Replying to felixxm:
Please don't reopen closed tickets. You should create a new ticket if you want to report a regression. Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use
json.loads()
on fetched data.
comment:13 by , 4 years ago
Cc: | added |
---|
comment:16 by , 4 years ago
How did this *manage* to break basic querying?
Replying to Mariusz Felisiak:
Moreover we're aware of this behavior change. Is there any reason to use a row cursor and SQL instead of the ORM? You can always use
json.loads()
on fetched data.
This is just silly. Yes, there are *obviously* reasons to use raw sql over an ORM in some cases, this is not new. Putting a tiny note in a release that no one will see isn't enough. This needs to be documented front and center in the raw sql main docs and the postgres areas, because I did read those and no hint.
"just use x" is a bad solution to something that broke basic functionality, and isn't necessarily even easy to do. It turns what is a simple automapping into manual mapping, at best. I guess we could implement something that parses python typehints on result classes detects if the cursor has been effed with and is returning the wrong data type, though. How many more data types do you expect to be broken on purpose in the future?
Also, the workaround is just to add ::json
to any jsonb
columns in raw queries. So this is only broken for HALF types?
It feels like an easy fix for one problem which ignores some other underlying issue, breaking use cases that a few people don't personally use. If I want a query to work in the future *as written* am I going to have to start creating raw psycopg2 cursors? Because those still work!
Just to add, this is broken on a brand new django project with a single table. Not sure how custom json decoders factor into this.
comment:18 by , 18 months ago
For anyone that falls into the "why isn't it a pythonic object already?" pitfall - We've been able to achieve our existing patterns, post this change, using:
- Take complete control over the
psycopg2.extras.register_default_jsonb
function, providing our own globally registered load call. Note: You can supply an alternate JSONloads
function to the_original_jsonb_function
as required and it will stick (hooray!)
import psycopg2 _original_jsonb_function = None def _handle_jsonb(): """ Workaround for pipeline-altering solution in: https://code.djangoproject.com/ticket/31956 """ global _original_jsonb_function def _ignore_command(*args, **kwargs): pass if _original_jsonb_function is None: _original_jsonb_function = psycopg2.extras.register_default_jsonb # _original_jsonb_function(globally=True, loads=...) if required psycopg2.extras.register_default_jsonb = _ignore_command class CoreConfig(AppConfig): name = 'core' verbose_name = 'Core' def ready(self): _handle_jsonb()
- Overloading the JSONField model and returning the data unharmed. This is key to allow both raw and ORM queries to operate as expected. You'll need to replace all instances of
models.JSONField
in your application.
class DirectJSONField(models.JSONField): """ ... """ def from_db_value(self, value, expression, connection): if isinstance(value, (dict, list)): return value return super().from_db_value(value, expression, connection)
- (alternate) If you're not inclined to replace all of your JSONFields, you can monkey patch the class.
def from_db_value(self, value, expression, connection): if isinstance(value, (dict, list)): return value return super().from_db_value(value, expression, connection) models.JSONField.from_db_value = from_db_value
Thanks for this ticket. I was able to reproduce the crash on PostgreSQL with a
JSONField
with a custom decoder. It's caused by the different format used in this case. Can you confirm that the following patch fix this issue for you?