Opened 6 weeks ago
Closed 5 weeks ago
#35944 closed Cleanup/optimization (fixed)
Postgresql: ArrayField with Unicode characters gets serialized as string of "\u XXXX" characters
Reported by: | Oleg Sverdlov | Owned by: | Oleg Sverdlov |
---|---|---|---|
Component: | Core (Serialization) | Version: | 5.1 |
Severity: | Normal | Keywords: | ArrayField, postgresql, JSON |
Cc: | Oleg Sverdlov | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In ArrayField.value_to_string(self, obj) the value is encoded using JSON.dumps(values), which produces escaped Unicode \u XXXX by default.
For example, an ArrayField with 3 elements ["один", "два", "три"] (1,2,3 in Russian) will produce
["\u043e\u0434\u0438\u043d", "\u0434\u0432\u0430", "\u0442\u0440\u0438"]
While this is not a bug per se, this becomes a nuisance when viewing on result of "dumpdata" management command:
The ArrayField fields will be encoded differently from other text fields.
Perhaps there should be an option to turn on/off the ensure_ascii parameter in JSON.dumps(values, ensure_ascii=option)) ?
The option can be enabled by default, as we do for 'hstore' field, or perhaps enabled conditionally:
- in the field settings ArrayField(name='numbers', ascii_only=False)
- in settings.py ( ARRAY_FIELD_ENSURE_ASCII )
I will be glad to submit a patch.
Change History (6)
comment:1 by , 6 weeks ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:2 by , 6 weeks ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:3 by , 6 weeks ago
Has patch: | set |
---|
comment:5 by , 5 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|
Given we made the decision to have JSON serialization default to
ensure_ascii=False
when dealing with Unicode in #29249 (68fc21b3784aa34c7ba5515ab02ef0c7b6ee856d) I think we should use the same approach here and useensure_ascii=False
for any usage ofjson.dumps
inField.value_to_string
for fields that might include text which includesArrayField
, andHStoreField
.I don't think an additional field option to control this behavior and certainly not a setting is warranted here as it should be possible to subclass either field class to override
value_to_string
andensure_ascii=False
does constitute a more coherent default.Feel free to assign the issue to you and submit a PR with tests for
ArrayField
andHStoreField
.