Opened 12 years ago
Closed 11 years ago
#18864 closed Bug (fixed)
Django models __eq__ and __hash__ treat all unsaved model instances as identical
Reported by: | fengb | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | |
Cc: | jdunck@…, Ben Davis | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When a model instance is still new and unsaved, it has no primary key.
Both eq and hash rely solely on _get_pk_val(), which means all unsaved model instances are treated as though they are identical.
b1 = models.Base()
b2 = models.Base()
b1 == b2
True
hash(b1) == hash(b2)
True
For integrity, we should add a check for Python object identity since if both instances are saved, they will end up with different PKs.
For hashing, we can use the default Python hashing to prevent large collisions based on hash(None). We would also need to cache the hash so that hash(instance) will be the same both before save and after save such that:
b = models.Base()
s = set()
s.add(b)
b.save()
b in set
True
This last case actually has an edge case that I'm not sure is solvable.
b = models.Base.object.get(id=b.id)
b in set
False
Change History (12)
comment:1 by , 12 years ago
comment:3 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
I think that by caching the value of __hash__
we are just moving the problem around, not solving it. Having a hack where the model's __hash__
is based on the model's primary key, unless __hash__
was called on the same object before the object got a primary key value doesn't sound tempting.
Strictly, we should do this for immutable hash:
- Models without PK are not hashable.
- Once set, the Model pk should never be changed.
I don't think we want to go there, as this would break a lot of applications.
We should just document the existing limitation. In addition basing the eq and hashing on id(obj) when obj.pk is None is worth investigation. Alternatively, we might want to make models with pk = None unhashable. Using pk=None models in dicts or sets doesn't work anyways currently. This would leave still the problem with changed primary key value, but that doesn't seem as severe to me.
If we use the id(obj) when pk is None way, then we should document that one shall not assume the hash of an object will stay the same if the object's primary key changes. In particular this means using unsaved models in dicts or sets and then saving them is a bad idea.
Accepting this ticket as there clearly is a problem with our current implementation of __hash__
and __eq__
for unsaved models.
comment:4 by , 12 years ago
Rejecting hashing on unsaved instances:
https://github.com/fengb/django/tree/model-eq-hash-rejection
There is one error with the change: forms is attempting to use hashing. I'm afraid of the other problems this backwards incompatible change will introduce.
Traceback (most recent call last): File "/Users/bfeng/Documents/projects/django/tests/modeltests/model_formsets/tests.py", line 523, in test_inline_formsets_save_as_new self.assertTrue(formset.is_valid()) File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 269, in is_valid err = self.errors File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 247, in _get_errors self.full_clean() File "/Users/bfeng/Documents/projects/django/django/forms/formsets.py", line 293, in full_clean self.clean() File "/Users/bfeng/Documents/projects/django/django/forms/models.py", line 500, in clean self.validate_unique() File "/Users/bfeng/Documents/projects/django/django/forms/models.py", line 527, in validate_unique if row_data in seen_data: File "/Users/bfeng/Documents/projects/django/django/db/models/base.py", line 395, in __hash__ raise TypeError('unhashable without pk') TypeError: unhashable without pk
comment:5 by , 12 years ago
I haven't had time to review in full, but my inclination is that the behavior should be:
For saved instances:
__hash__
: hash(self.pk)
__eq__
: self.pk == other.pk and both objects have the same type
(see the numerous other tickets discussing this issue)
For unsaved instances:
__hash__
: hash(self.pk)
__eq__
: self is other
comment:6 by , 12 years ago
I don't agree that unsaved instances should use hash(self.pk)
because self.pk
will always be None
and we will get hash collisions all over the place.
comment:7 by , 12 years ago
Hmm, I'm actually coming around to the idea that unsaved models shouldn't be hashable, because the pk is mutable in that situation.
comment:8 by , 12 years ago
I on the other hand am beginning to think that we shouldn't ever officially support mutable primary keys. If you need mutable keys, then use a hidden AutoField for primary key, and have the logical primary key as an unique field. If you want (for legacy DBs for example), you can dodge this limitation by .update(). This will allow us to solve some issues in the composite fields feature (and in __hash__
, too).
If we go with no __hash__
for pk is None route, then we do have a backwards incompatibility at our hands. Call it a bugfix?
As for the __eq__
tickets, I think we should go for "same concrete model, same pk" => equal, otherwise not, as that will be simple to implement. Implementing isinstance() based __eq__
will lead to non-transitive __eq__
and some other complications (like, the subclass could have different field as pk field, so just comparing the value will not do). See #16458 comments 13 and 17 for some weird corner cases.
comment:9 by , 12 years ago
Cc: | added |
---|
comment:10 by , 12 years ago
Cc: | added |
---|
comment:11 by , 11 years ago
Patch implementing __eq__
based id() and forbidding __hash__
in case of no pk value is available at https://github.com/akaariai/django/tree/model_eq_nopk
comment:12 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Whoops, failed formatting.
Bad behavior example:
Hash fixed call:
Hash impossible example: