Opened 4 hours ago

Last modified 3 hours ago

#35944 new Cleanup/optimization

Postgresql: ArrayField with Unicode characters gets serialized as string of "\u XXXX" characters

Reported by: Oleg Sverdlov Owned by:
Component: Core (Serialization) Version: 5.1
Severity: Normal Keywords: ArrayField, postgresql, JSON
Cc: Oleg Sverdlov Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In ArrayField.value_to_string(self, obj) the value is encoded using JSON.dumps(values), which produces escaped Unicode \u XXXX by default.

For example, an ArrayField with 3 elements ["один", "два", "три"] (1,2,3 in Russian) will produce

["\u043e\u0434\u0438\u043d", "\u0434\u0432\u0430", "\u0442\u0440\u0438"]

While this is not a bug per se, this becomes a nuisance when viewing on result of "dumpdata" management command:

The ArrayField fields will be encoded differently from other text fields.

Perhaps there should be an option to turn on/off the ensure_ascii parameter in JSON.dumps(values, ensure_ascii=option)) ?

The option can be enabled by default, as we do for 'hstore' field, or perhaps enabled conditionally:

  • in the field settings ArrayField(name='numbers', ascii_only=False)
  • in settings.py ( ARRAY_FIELD_ENSURE_ASCII )

I will be glad to submit a patch.

Change History (1)

comment:1 by Simon Charette, 3 hours ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Given we made the decision to have JSON serialization default to ensure_ascii=False when dealing with Unicode in #29249 (68fc21b3784aa34c7ba5515ab02ef0c7b6ee856d) I think we should use the same approach here and use ensure_ascii=False for any usage of json.dumps in Field.value_to_string for fields that might include text which includes ArrayField, and HStoreField.

I don't think an additional field option to control this behavior and certainly not a setting is warranted here as it should be possible to subclass either field class to override value_to_string and ensure_ascii=False does constitute a more coherent default.

Version 0, edited 3 hours ago by Simon Charette (next)
Note: See TracTickets for help on using tickets.
Back to Top