Opened 3 years ago
Closed 3 years ago
#33441 closed Bug (fixed)
Model Field.__hash__() should be immutable.
Reported by: | Adam Johnson | Owned by: | Adam Johnson |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Ryan Hiebert | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Field.__hash__
changes value when a field is assigned to a model class.
This code crashes with an AssertionError
:
from django.db import models f = models.CharField(max_length=200) d = {f: 1} class Book(models.Model): title = f assert f in d
The bug was introduced in #31750.
It's unlikely to have been encountered because there are few use cases to put a field in a dict *before* it's assigned to a model class. But I found a reason to do so whilst implementing #26472 and the behaviour had me stumped for a little.
IMO we can revert the __hash__
change from #31750. Objects with the same hash are still checked for equality, which was fixed in that ticket. But it's bad if an object's hash changes, since it breaks its use in dicts.
Change History (8)
comment:1 by , 3 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
comment:2 by , 3 years ago
Cc: | added |
---|---|
Resolution: | → invalid |
Status: | assigned → closed |
follow-up: 5 comment:3 by , 3 years ago
As far as I'm aware, fields are quite specific, in a real life they don't live without models.
Yes the use case is contrived. But there are plenty of uses for fields in ORM expressions that *might* lead one to create references to a field before/after they are assigned to a model class.
We shouldn't consider the same field instance assigned to a different models as equal.
__hash__
is never used for equality. The only requirement is that *IF* two objects are equal, they have the same hash. There's no problem with creating a __hash__
that always does return 1
- it will only make its use in dictionaries slow.
Changing __hash__
in the PR was unnecessary for the previous fix. I believe it came from a misunderstanding around what __hash__
means.
IMO the current behavior is expected, we don't want to reintroduce an issue fixed in 502e75f9ed5476ffe8229109acf0c23999d4b533.
I don't think there's any risk. The PR passes all tests, including the one introduced, minus the unnecessary hash()
assertions.
comment:4 by , 3 years ago
But there are plenty of uses for fields in ORM expressions that *might* lead one to create references to a field before/after they are assigned to a model class.
Could you go into more detail about this statement? Fields meant to be assigned/bound to a model class should all be set after the app setup phase and no fields should be bound to a model class after this point so I'm struggling to come up with the plenty of uses in ORM expressions you are referring to. Just saying that providing concrete examples might help your case here.
Ideally, we'd have a notion of BoundField(model: Model, field: Field)
to avoid having to differentiate between fields attached to models and ones that aren't based of .model
(that's what we do in django.forms for example) but that's a large change for sure.
As for reverting the patch for the __hash__
part I'm not against it, from what I understand and what Adam pointed out it should not break anything. If believe strongly that inherited fields should have they own __hash__
then maybe we could simplify this whole thing by having the interitance logic bump creation_counter
instead?
comment:5 by , 3 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Summary: | model Field.__hash__ should be immutable → Model Field.__hash__() should be immutable. |
Triage Stage: | Unreviewed → Accepted |
Replying to Adam Johnson:
__hash__
is never used for equality. The only requirement is that *IF* two objects are equal, they have the same hash. There's no problem with creating a__hash__
that always doesreturn 1
- it will only make its use in dictionaries slow.
Changing
__hash__
in the PR was unnecessary for the previous fix.
Right, I missed it. I always find it suspicious when __hash__()
does not match __eq__()
, but it's probably just me 🤷, and we have a similar solution for Model
.
I'm also struggling with finding "plenty of uses", nevertheless it should not block this change .
comment:6 by , 3 years ago
Status: | new → assigned |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:7 by , 3 years ago
Could you go into more detail about this statement? Fields meant to be assigned/bound to a model class should all be set after the app setup phase and no fields should be bound to a model class after this point so I'm struggling to come up with the plenty of uses in ORM expressions you are referring to. Just saying that providing concrete examples might help your case here.
I'm also struggling with finding "plenty of uses", nevertheless it should not block this change .
Sorry, I wasn't clear. I meant: There are plenty of uses for fields in ORM expressions. These *might* lead a user to create references to a field before/after they are assigned to a model class. I don't have a concrete example, and clearly it must be rare, since no one has reported a bug since Django 3.2. But, I don't think it's inconcievable that someone will try this at some point.
As far as I'm aware, fields are quite specific, in a real life they don't live without models. Field's hash should change when it's assigned to a model, comparing
creation_counter
is not enough. We shouldn't consider the same field instance assigned to a different models as equal. IMO the current behavior is expected, we don't want to reintroduce an issue fixed in 502e75f9ed5476ffe8229109acf0c23999d4b533.