#32411 closed Cleanup/optimization (fixed)
Case-insensitive lookups on JSONField doesn't work on MySQL.
Reported by: | elonzh | Owned by: | Hasan Ramezani |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Normal | Keywords: | JSONField mysql |
Cc: | Mariusz Felisiak, Sage Abdullah | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
given model
class Paper(models.Model): ... authors = models.JSONField(default=list) ...
In [53]: p = Paper.objects.get(id=1) In [54]: p.__dict__ Out[54]: {'_state': <django.db.models.base.ModelState at 0x7f2aa74b3790>, 'id': 1, ... 'authors': [{'name': 'Rehmann Kim-Thomas', 'workplace': 'SAP SE, Walldorf, Germany'}, {'name': 'Folkerts Enno', 'workplace': 'SAP SE, Walldorf, Germany'}], ... } In [55]: print(Paper.objects.filter(authors__icontains="rehmann").count()) 0 In [56]: print(Paper.objects.filter(authors__icontains="Rehmann").count()) 1 In [57]: print(Paper.objects.filter(authors__icontains="Rehmann").query) SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE JSON_UNQUOTE(`scholardata_paper`.`authors`) LIKE %Rehmann% ORDER BY `scholardata_paper`.`id` ASC In [58]: print(Paper.objects.filter(authors__iregex="rehmann").count()) 1 In [59]: print(Paper.objects.filter(authors__iregex="Rehmann").count()) 1 In [60]: print(Paper.objects.filter(authors__iregex="Rehmann").query) SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE REGEXP_LIKE(JSON_UNQUOTE(`scholardata_paper`.`authors`), Rehmann, 'i') ORDER BY `scholardata_paper`.`id` ASC In [63]: print(Paper.objects.filter(authors__0__name__icontains="rehmann").count()) 1 In [64]: print(Paper.objects.filter(authors__0__name__icontains="rehmann").query) SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`, $[0]."name"))) LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC
Change History (9)
comment:1 by , 4 years ago
Description: | modified (diff) |
---|
follow-up: 7 comment:2 by , 4 years ago
Cc: | added |
---|---|
Resolution: | → invalid |
Status: | new → closed |
comment:3 by , 4 years ago
Cc: | added |
---|---|
Keywords: | mysql added |
Resolution: | invalid |
Status: | closed → new |
Summary: | JSONField filter icontains is case sensitive in mysql → Case-insensitive lookups on JSONField doesn't work on MySQL. |
Triage Stage: | Unreviewed → Accepted |
Type: | Bug → Cleanup/optimization |
This is not a documented usage, however it doesn't work only on MySQL, so IMO we can fix this for consistency. MySQL handles strings used in JSON
context using the utf8mb4_bin
binary collation, that's why comparison of JSON values is case-sensitive and we need to use CaseInsensitiveMixin
. We can fix this be registering a custom case-insensitive lookup for JSONField
, e.g.
class JSONFieldIContains(CaseInsensitiveMixin, lookups.IContains): pass JSONField.register_lookup(JSONFieldIContains)
I don't think it makes sense to do the same for __istartswith
, iendswith
, and __iexact
.
comment:5 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:7 by , 4 years ago
Replying to Carlton Gibson:
Hi. Thanks for the report.
Your usage is wrong for what you're trying to do. See the Querying JSONField topic docs.
For JSONField
contains
takes key pairs, rather than a string:
>>> Dog.objects.filter(data__contains={'owner': 'Bob'})
icontains
is not explicitly implemented on the field itself, so it's falling back to the base implementation (wrongly — the JSON_UNQUOTE gives us a string but we didn't generate theLOWER
) but it's not clear that makes much sense. (Mariusz: I'm closing this as invalid, but do we want to accept in order to raise an error here? 🤔)
As you've discovered the correct usage is to query by field (as per here):
>>> Dog.objects.filter(data__owner__name__icontains='bob')For your example
Paper.objects.filter(authors__name__icontains="rehmann")
should give you what you want.
I hope that helps.
This seems related to the documentation issue #26511.
Thanks for your information, I know how to query field in a json object, but the problem is the authors
fields is a list and Django does not support filter it.
Paper.objects.filter(authors__name__icontains="rehmann")
won't give me what I want.
>>> print(Paper.objects.filter(authors__name__icontains="rehmann").query) SELECT `scholardata_paper`.`id`, `scholardata_paper`.`created`, `scholardata_paper`.`modified`, `scholardata_paper`.`doi`, `scholardata_paper`.`s2_id`, `scholardata_paper`.`wanfang_id`, `scholardata_paper`.`corpus_id`, `scholardata_paper`.`pm_id`, `scholardata_paper`.`arxiv_id`, `scholardata_paper`.`acl_id`, `scholardata_paper`.`mag_id`, `scholardata_paper`.`language`, `scholardata_paper`.`url`, `scholardata_paper`.`abstract`, `scholardata_paper`.`title`, `scholardata_paper`.`authors`, `scholardata_paper`.`keywords`, `scholardata_paper`.`issued`, `scholardata_paper`.`venue`, `scholardata_paper`.`journal_id`, `scholardata_paper`.`volume`, `scholardata_paper`.`issue`, `scholardata_paper`.`page`, `scholardata_paper`.`references_count`, `scholardata_paper`.`citations_count`, `scholardata_paper`.`refreshed`, `scholardata_paper`.`references_refreshed`, `scholardata_paper`.`citations_refreshed` FROM `scholardata_paper` WHERE LOWER(JSON_UNQUOTE(JSON_EXTRACT(`scholardata_paper`.`authors`, $."name"))) LIKE LOWER(%rehmann%) ORDER BY `scholardata_paper`.`id` ASC
follow-up: 9 comment:8 by , 4 years ago
the problem is the authors fields is a list and Django does not support filter it
That has little to do with this ticket though. The ORM has no way of knowing authors
is a JSON Array because JSONField
are schema less. If you want to filter out papers that have at least one author with a matching name you'll have to use the MySQL equivalent of PostgreSQL json_array_elements
and a subquery through a Func
expression but that's something you should ask about on support channels.
comment:9 by , 4 years ago
Replying to Simon Charette:
the problem is the authors fields is a list and Django does not support filter it
That little to do with this ticket though. The ORM has no way of knowing
authors
is a JSON Array becauseJSONField
are schema less. If you want to filter out papers that have at least one author with a matching name you'll have to use the MySQL equivalent of PostgreSQLjson_array_elements
and a subquery through aFunc
expression but that's something you should ask about on support channels.
Yes, that little to do with this ticket. I just want to tell Carlton Gibson that he has given the wrong solution.
Hi. Thanks for the report.
Your usage is wrong for what you're trying to do. See the Querying JSONField topic docs.
For JSONField
contains
takes key pairs, rather than a string:icontains
is not explicitly implemented on the field itself, so it's falling back to the base implementation (wrongly — the JSON_UNQUOTE gives us a string but we didn't generate theLOWER
) but it's not clear that makes much sense. (Mariusz: I'm closing this as invalid, but do we want to accept in order to raise an error here? 🤔)As you've discovered the correct usage is to query by field (as per here):
For your example
Paper.objects.filter(authors__name__icontains="rehmann")
should give you what you want.I hope that helps.
This seems related to the documentation issue #26511.