Opened 21 months ago
Closed 20 months ago
#34539 closed Bug (fixed)
`get_prep_value` no longer called for JSONField
Reported by: | Julie Rymer | Owned by: | Julie Rymer |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Release blocker | Keywords: | |
Cc: | Simon Charette, Florian Apolloner, Tim Graham | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello, I just upgraded from django 4.1 to 4.2 and I have a custom JSONField
with a get_prep_value()
override that stopped working. After searching a bit, I saw that was because JSONField.get_prep_value()
is no longer called in 4.2 (5c23d9f0c32f166c81ecb6f3f01d5077a6084318).
I think this issue need a resolution either:
JSONField
should callget_prep_value()
like all other fields type, because this is the method that the documentation tell us to override in custom fields. Otherwise we need to overrideget_db_prep_value()
which is heavier and does not have the same purpose. I think simply replacingconnection.ops.adapt_json_value(value, self.encoder)
withconnection.ops.adapt_json_value(self.get_prep_value(value), self.encoder)
inJSONField.get_db_prep_value()
would fix this
- If there is a good reason to no longer call
get_prep_value()
, this exception forJSONField
should be clearly documented in custom get_prep_value() doc. It should also be added to Backwards incompatible changes in 4.2 release note because I got stuck with this issue with no warning when migrating.
PS: #34397 seems to be related but in fact is about Django 3.2 so it is not the current issue
Attachments (1)
Change History (30)
comment:1 by , 21 months ago
Description: | modified (diff) |
---|
comment:2 by , 21 months ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 6 comment:3 by , 21 months ago
Thanks for report. The way how built-in fields are implemented is an implementation detail and can change without the deprecation path.
Replying to Julie Rymer:
Hello, I just upgraded from django 4.1 to 4.2 and I have a custom
JSONField
with aget_prep_value()
override that stopped working. After searching a bit, I saw that was becauseJSONField.get_prep_value()
is no longer called in 4.2 (5c23d9f0c32f166c81ecb6f3f01d5077a6084318).
I think this issue need a resolution either:
JSONField
should callget_prep_value()
like all other fields type, because this is the method that the documentation tell us to override in custom fields.
Both methods get_prep_value()
and get_db_prep_value()
are documented there. I don't see any preferences for get_prep_value()
🤔.
Otherwise we need to override
get_db_prep_value()
which is heavier and does not have the same purpose. I think simply replacingconnection.ops.adapt_json_value(value, self.encoder)
withconnection.ops.adapt_json_value(self.get_prep_value(value), self.encoder)
inJSONField.get_db_prep_value()
would fix this
- If there is a good reason to no longer call
get_prep_value()
, this exception forJSONField
should be clearly documented in custom get_prep_value() doc. It should also be added to Backwards incompatible changes in 4.2 release note because I got stuck with this issue with no warning when migrating.
This is not really an exception, there are many built-in fields that don't call get_prep_value()
in get_db_prep_value()
, e.g. ArrayField
or DurationField
.
You can check how, for example, Wagtail solved the same issue on their side.
comment:4 by , 21 months ago
Cc: | added |
---|
comment:5 by , 21 months ago
Also, the docs that you mentioned is Field API reference
not How to create a custom field
or How to override build-in fields
, so it only describes methods available in the Field
API.
comment:6 by , 21 months ago
Replying to Mariusz Felisiak:
Thanks for report. The way how built-in fields are implemented is an implementation detail and can change without the deprecation path.
...
Both methods
get_prep_value()
andget_db_prep_value()
are documented there. I don't see any preferences forget_prep_value()
🤔.
It doesn't seems like an implementation detail when those function are clearly documented for customisation.
As for the preference, the first paragraph of the Field API doc is
Field is an abstract class that represents a database table column. Django uses fields to create the database table (db_type()), to map Python types to database (get_prep_value()) and vice-versa (from_db_value()).
get_prep_value
is the documented method for mapping Python types to database, which is what I was trying to do, get_db_prep_value
is only for backend specific need so not the most common case (hence why only get_prep_value
is put foremost in the intro I suppose).
Replying to Mariusz Felisiak:
Also, the docs that you mentioned is Field API reference not How to create a custom field or How to override build-in fields, so it only describes methods available in the Field API.
You are right, however the customisation doc still does not mention that get_prep_value()
is not always available for override.
Replying to Mariusz Felisiak:
This is not really an exception, there are many built-in fields that don't call get_prep_value() in get_db_prep_value(), e.g. ArrayField or DurationField.
Then it should be changed, or documented, for those fields also :)
Like wagtail, I also ended up overriding get_db_prep_value()
instead of get_prep_value()
, but this seems like too much work when I don't care about the backend and simply want to serialise my custom object.
comment:7 by , 21 months ago
Julie, could you tell us a little bit more about how you are using get_prep_value
? That could help us understand a bit better the use case to make further decisions. Thanks!
comment:8 by , 21 months ago
It doesn't seems like an implementation detail when those function are clearly documented for customisation.
Field.get_prep_value()
is documented for the customization, not JSONField.get_prep_value()
.
follow-up: 10 comment:9 by , 21 months ago
My usecase is the following:
I have a db json field that I use to keep data in the following format [{id: str, label: str, length: Decimal, height: Decimal, width: Decimal, weight: Decimal}]
For easier manipulation and validation, I have a dataclass containing those properties. I created a custom JSONField to convert to/from a list of that dataclass to a simple python dict to be inserted in db.
From the doc
If your custom Field class deals with data structures that are more complex than strings, dates, integers, or floats, then you may need to override from_db_value() and to_python().
And so I overrode from_db_value()
and to_python()
Further along the doc
Since using a database requires conversion in both ways, if you override from_db_value() you also have to override get_prep_value() to convert Python objects back to query values.
And so I overrode get_prep_value()
. But get_prep_value()
is never called, alas. This behaviour is not consistent from what is expected.
I know that this doc is about Field.get_prep_value()
and not JSONField.get_prep_value()
, but unless I missed something, there is no specific documentation for JSONField.get_prep_value()
, nor forewarning in Field.get_prep_value()
, so the obvious guess is that this doc apply to JSONField.get_prep_value()
as well as any other Field class.
I think the question to solve is
- do we want to add a warning that the documented rules for customisation does not apply to some Field classes and to document the missing rules, or
- should we harmonize instead all Field classes that don't follow the rules yet and make their behaviour consistent with current documentation.
I am more in favour of the second option since it does seems to be less of a change effort and more practical for the end-user.
You can find my custom Field implementation here:
from __future__ import annotations import decimal import json from dataclasses import dataclass, fields from decimal import Decimal from django.core import exceptions from django.db import models from django.utils.translation import gettext_lazy as _ @dataclass(repr=True, eq=True, unsafe_hash=True) class ProductPackage: id: str length: Decimal # in cm height: Decimal # in cm width: Decimal # in cm weight: Decimal # in kg label: str = "" @staticmethod def from_dict(package_json: dict) -> ProductPackage: return ProductPackage( id=package_json.get("id"), length=Decimal(package_json["length"]), height=Decimal(package_json["height"]), width=Decimal(package_json["width"]), weight=Decimal(package_json["weight"]), label=package_json.get("label", ""), ) def to_dict(self) -> dict: return dict((field.name, getattr(self, field.name)) for field in fields(self)) @classmethod def from_list(cls, packages_json: list[dict]) -> list: return list( cls.from_dict(package_json) for package_json in (packages_json or []) ) def __copy__(self): return self.copy() def copy(self): return self.__class__(**self.to_dict()) class PackageJSONField(models.JSONField): """ ProductPackage class will make package dimension more practical to use and this class also handle Decimal so they can be serialized/deserialized from JSON """ description = "A JSON field to store a list of ProductPackage" default_error_messages = { **models.JSONField.default_error_messages, "invalid_packages": _("Value must be valid list of packages."), "invalid_decimal": _("Package dimensions must be valid decimal strings."), } def from_db_value(self, value, expression, connection) -> list[ProductPackage]: packages_json = super().from_db_value(value, expression, connection) return ProductPackage.from_list(packages_json) def to_python(self, value): if value is None: return value if isinstance(value, ProductPackage): return value try: if isinstance(value, list): return ProductPackage.from_list(value) return json.loads(value, cls=self.decoder) except json.JSONDecodeError: raise exceptions.ValidationError( self.error_messages["invalid"], code="invalid", params={"value": value}, ) except TypeError: raise exceptions.ValidationError( self.error_messages["invalid_packages"], code="invalid_packages", params={"value": value}, ) except decimal.InvalidOperation: raise exceptions.ValidationError( self.error_messages["invalid_decimal"], code="invalid_decimal", params={"value": value}, ) def get_prep_value(self, value: list[ProductPackage | dict]): if value is None: return value if any(isinstance(package, dict) for package in value): value = ProductPackage.from_list(value) value = list( { "id": package.id, "label": package.label, "length": str(package.length), "height": str(package.height), "width": str(package.width), "weight": str(package.weight), } for package in value ) return super().get_prep_value(value) def get_db_prep_value(self, value, connection, prepared=False): # the override I ended up doing for the current issue if isinstance(value, models.expressions.Value) and isinstance( value.output_field, models.JSONField ): value = value.value elif hasattr(value, "as_sql"): return value return connection.ops.adapt_json_value(self.get_prep_value(value), self.encoder)
follow-up: 11 comment:10 by , 21 months ago
Replying to Julie Rymer:
I think the question to solve is
- do we want to add a warning that the documented rules for customisation does not apply to some Field classes and to document the missing rules, or
- should we harmonize instead all Field classes that don't follow the rules yet and make their behaviour consistent with current documentation.
I am more in favour of the second option since it does seems to be less of a change effort and more practical for the end-user.
This will required adding LOC (probably many) that are not used by Django itself and aren't tested (unless we will add many tests with custom subclasses of built-in fields). This can also cause a lot of backward incompatible changes in the behavior. I'm strongly against it as it'll set a dangerous precedent and increase the maintenance burden.
I'll wait for other opinions.
follow-up: 12 comment:11 by , 21 months ago
Replying to Mariusz Felisiak:
This will required adding LOC (probably many) that are not used by Django itself and aren't tested (unless we will add many tests with custom subclasses of built-in fields). This can also cause a lot of backward incompatible changes in the behavior. I'm strongly against it as it'll set a dangerous precedent and increase the maintenance burden.
I'll try to address Mariusz' concerns.
From what I understand, the change would only be a matter of adding value=self.get_prep_value(value)
is the concerned Field classes' get_db_prep_value
(JSONField
, ArrayField
, DurationField
, other ?). The default Field.get_prep_value
definition is the following:
def get_prep_value(self, value): """Perform preliminary non-db specific value checks and conversions.""" if isinstance(value, Promise): value = value._proxy____cast() return value
So it doesn't seems to do much so I don't see what backward incompatible change that would cause.
But if there is concern about it, get_prep_value
could be overridden in the concerned Field classes to directly return the value instead of doing any operation, hence guaranteed no behaviour change.
This addition to the code will be tested, because get_db_prep_value
is already tested (right?) and we are not adding any behaviour that should involve any explicit test. We are just adding an easier entrypoint for customisation, the customisation itself is not for django to test so I don't see what additional maintenance burden it would add.
The only potential burden I see is a new rule for future changes done on get_db_prep_value
of any standard field class to take care to call get_prep_value
even if it does nothing.
Perhaps I missed something, do tell me if that is the case.
On the contrary, updating the documentation to keep track of any Field class not following the rule and explaining how to do customisation for each of them seems like a lot more involved effort.
follow-up: 13 comment:12 by , 21 months ago
Replying to Julie Rymer:
def get_prep_value(self, value): """Perform preliminary non-db specific value checks and conversions.""" if isinstance(value, Promise): value = value._proxy____cast() return valueSo it doesn't seems to do much so I don't see what backward incompatible change that would cause.
The number of LOC is not important. We are talking not only about JSONField
. If we will agree that all built-in fields must implement the entire Field
API and fulfill everything that is in the Field
API reference, we will be force to make many other changes.
This addition to the code will be tested, because
get_db_prep_value
is already tested (right?)
This is not true, removing get_prep_value()
will not cause any tests to fail, so it won't be tested.
On the contrary, updating the documentation to keep track of any Field class not following the rule and explaining how to do customisation for each of them seems like a lot more involved effort.
Personally, I'd close this ticket as invalid. As far as I'm aware, neither docs nor code changes are required.
comment:13 by , 21 months ago
Replying to Mariusz Felisiak:
This is not true, removing
get_prep_value()
will not cause any tests to fail, so it won't be tested.
You're right, but nothing a spy+assert_called_once
can't solve :p
Also, I though I'd mention that I'd be willing to make the necessary changes in case this scenario is selected (I'd not have that self-confidence for a doc update however )
Finally as a last input, I'll mention that the custom field doc actually explicitly elaborate on some standard Field type gotcha when they deviate from the Field API contract: AutoField
and FileField
.
comment:14 by , 21 months ago
You're right, but nothing a spy+
assert_called_once
can't solve :p
This is something we use as a last resort, not something that is a good API testing.
It's pretty obvious that we're not going to reach a consensus, so let's wait for other opinions.
comment:15 by , 21 months ago
Cc: | added |
---|
comment:16 by , 21 months ago
As a Django user, I agree that the current documentation does imply that get_prep_value()
would be called for any Field
instance. I always assumed that everything documented for Field
as a toplevel class applies to all children (except explicitly stated otherwise), so I can personally relate to the ticket description.
Having said the above, if modifying the Django code could generate extra maintenance burden and/or introduce patterns that are not desired (such as including code that is not used by Django itself), I would at least suggest that the documentation is extended to explicitly say that get_prep_value()
may not be called in every Django-provided field.
comment:17 by , 21 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:18 by , 21 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
Triage Stage: | Accepted → Unreviewed |
Hi Jatin-tec, sorry for the confusion but this ticket is not yet ready to be worked on, we still haven't decided on the best course of action. It could be that we change code, or docs, or nothing.
Thanks for understanding! I'll be removing you from the assignee field for now.
follow-up: 20 comment:19 by , 21 months ago
I don't hold a strong opinion against restoring the call to get_prep_value
given how common using JSONField
to persist Python datastructure is.
I never assumed that we guaranteed that get_prep_lookup
would be called by get_db_prep_lookup
but I can see how it can be interpreted this way from the docs.
Given this was an unintended side-effect of 5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having get_db_prep_value
start by calling get_prep_value
when prepared=False
.
As for the other fields that don't respect this contract that Mariusz identified I think that if we wanted to be coherent with the docs we should likely adapt them as well in a follow up ticket.
For as much as this API is awkward and open to interpretation it has been around for years so trying to stick to it as much as possible seems like a worthy approach?
follow-up: 21 comment:20 by , 21 months ago
Triage Stage: | Unreviewed → Accepted |
---|
Replying to Simon Charette:
I never assumed that we guaranteed that
get_prep_lookup
would be called byget_db_prep_lookup
but I can see how it can be interpreted this way from the docs.
Given this was an unintended side-effect of 5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having
get_db_prep_value
start by callingget_prep_value
whenprepared=False
.
OK, persuaded, let's restore it for JSONField
.
Julie, please prepare a patch (via GitHub PR) with a regression test and release notes (4.2.2.txt
.)
comment:21 by , 21 months ago
Replying to Mariusz Felisiak:
Replying to Simon Charette:
I never assumed that we guaranteed that
get_prep_lookup
would be called byget_db_prep_lookup
but I can see how it can be interpreted this way from the docs.
Given this was an unintended side-effect of 5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having
get_db_prep_value
start by callingget_prep_value
whenprepared=False
.
OK, persuaded, let's restore it for
JSONField
.
Julie, please prepare a patch (via GitHub PR) with a regression test and release notes (
4.2.2.txt
.)
Cool, I'll try to do this this w-e :)
comment:22 by , 21 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:24 by , 20 months ago
Patch needs improvement: | set |
---|
comment:25 by , 20 months ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:28 by , 20 months ago
Resolution: | fixed |
---|---|
Status: | closed → new |
This has caused a behaviour change for me which I think may be a regression. I'm not especially knowledgeable about the issues debated above, but here is what I see. First, I am using an add-on app called social_django which sets a JSONField and then saves the value here:
Second, on the stack, I have the frames show in the attachment "stack frame", and the important point is that when I enter the two new lines of code added, value is a dict, but the value is flattened into a string by the new code.
The result is that the database JSONField has the saved string. I believe that to be incorrect. Previously, the dict was saved as expected.
(Perhaps the intent here was to only have this change apply to subclasses of JSONField?)
Advice appreciated.
comment:29 by , 20 months ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please don't reopen closed tickets. If you believe there is an issue in Django, then please create a new ticket in Trac and follow our bug reporting guidelines.
Accepting this since, as the reporter says, the docs suggest this method will be called for most fields.