Opened 6 weeks ago

Closed 6 weeks ago

Last modified 4 weeks ago

#35885 closed Uncategorized (invalid)

JSONField does accept strings that look like dicts and incorrectly saves them as strings, breaking JSON filtering

Reported by: DataGreed Owned by:
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: DataGreed Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Steps to reproduce:

  1. Create a model with JSONField, create a migration and migrate
  2. Create an instance of this model, set the value of JSON field to a string generated by json.dumps() from any dictionary, e.g. "{"foo":"bar"}
  3. Save the model instance
  4. Instance save successfully

Expected result:

Instance is saved with the JSONField set to a dictionary {"foo": "bar"}, and JSONField deep filtering works properly

Actual result:

Instance is saved with a JSONField value saved as a string. Filtering does not work, and when you query this instance from a database, the JSONField value will be a string, not a dict.

I tried it on PostgreSQL and sqlite - both working incorrectly.
Note that if I set JSONField value to an actual dict it works correctly, but it is too easy to make a mistake and then waste hours of debugging the reasons the JSON filtering does not work, since developers usually sanitize the data at least by dumping the dict to json to make sure it is even possible to dump.

It seems like the fix would be to parse the string value in JSONField before saving.
My current workaround is to do the following on my models with JSONFields:

class SomeModel(models.Model):

    json_field = models.JSONField(blank=True, null=True)

    def clean(self):
        try:
            if isinstance(self.json_field, str):
                # make sure we are not trying to save string to JSONField, django allows this for some reason
                # and it breaks filtering on json
                self.arguments = json.loads(self.json_field)
        except ValueError as e:
            # reraise as validation error. We do this to get more actionable description for the error
            raise ValidationError(str(e))

    def save(self, *args, **kwargs):

        # ensure clean runs. Well, kind of - we can still directly update the fields,
        # and it will somewhat break data integrity, so just don't do that maybe? thanks :)
        self.full_clean()
        super().save(*args, **kwargs)

Change History (3)

comment:1 by Simon Charette, 6 weeks ago

Component: UncategorizedDatabase layer (models, ORM)
Resolution: invalid
Status: newclosed

JSON accepts top level string as valid input; that is including JSON serialized as a string.

[...] since developers usually sanitize the data at least by dumping the dict to json to make sure it is even possible to dump.

I don't think this is a normal behavior. Usually data transits through through serialization layers that are in charge of parsing JSON strings as Python objects just like for datetimes, decimal, float, and other types.

Having JSONField auto-magically perform a json.loads when provided a string input is backward incompatible and prevents storing JSON de-serializable strings as top level values (e.g. "1", "false", "null").

in reply to:  1 comment:2 by DataGreed2, 4 weeks ago

Replying to Simon Charette:

JSON accepts top level string as valid input; that is including JSON serialized as a string.

[...] since developers usually sanitize the data at least by dumping the dict to json to make sure it is even possible to dump.

I don't think this is a normal behavior. Usually data transits through through serialization layers that are in charge of parsing JSON strings as Python objects just like for datetimes, decimal, float, and other types.

So what do you suggest? I encounter this bug throughout the years and I spend considerable amount of time trying to figure out the issue when the json deep filtering suddenly does not work.

Why would you think this is not normal behaviour? Why IntegerField breaks if you try to save "FOO" in it and JSONField doesn't break when you save string in it? It should throw a validation error at least. Same goes with other fields that work properly, unlike JSONField that allows you to write data that cannot be parsed properly.

Having JSONField auto-magically perform a json.loads when provided a string input is backward incompatible and prevents storing JSON de-serializable strings as top level values (e.g. "1", "false", "null").

Well, you can jsut check it for these values before calling json.loads. Although, I have no ide why would you even try to write "false" or "1" in a JSONField.

comment:3 by Simon Charette, 4 weeks ago

Why IntegerField breaks if you try to save "FOO" in it and JSONField doesn't break when you save string in it? It should throw a validation error at least.

Because a string is not a valid value for an integer field while a string is a valid value for a JSON field? It's completely valid to store a JSON de-serializable string such as "true" in a JSON document. Would you expect json.dumps(json.dumps({"foo": "bar"}} to raise an exception because you attempt to serialize a JSON de-serializable string?

Well, you can jsut check it for these values before calling json.loads. Although, I have no ide why would you even try to write "false" or "1" in a JSONField.

JSON is a standard and JSONField adheres to it. Django doesn't offer a JSON except top level values cannot be JSON de-serializable strings kind of field. To me what you are asking for is analogous to the implicit bytes to text conversion that Python 2 use to make when asked to shove bytes into text expecting interfaces. It is much better from a data hygiene perspective to deserialize data at input boundaries (forms, serializers) and pass it around in its deserialized form through interfaces.

Version 0, edited 4 weeks ago by Simon Charette (next)
Note: See TracTickets for help on using tickets.
Back to Top