Opened 3 years ago

Closed 3 years ago

#33441 closed Bug (fixed)

Model Field.__hash__() should be immutable.

Reported by: Adam Johnson Owned by: Adam Johnson
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette, Ryan Hiebert Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Field.__hash__ changes value when a field is assigned to a model class.

This code crashes with an AssertionError:

from django.db import models


f = models.CharField(max_length=200)
d = {f: 1}


class Book(models.Model):
    title = f


assert f in d

The bug was introduced in #31750.

It's unlikely to have been encountered because there are few use cases to put a field in a dict *before* it's assigned to a model class. But I found a reason to do so whilst implementing #26472 and the behaviour had me stumped for a little.

IMO we can revert the __hash__ change from #31750. Objects with the same hash are still checked for equality, which was fixed in that ticket. But it's bad if an object's hash changes, since it breaks its use in dicts.

Change History (8)

comment:1 by Adam Johnson, 3 years ago

Has patch: set
Owner: changed from nobody to Adam Johnson

comment:2 by Mariusz Felisiak, 3 years ago

Cc: Simon Charette Ryan Hiebert added
Resolution: invalid
Status: assignedclosed

As far as I'm aware, fields are quite specific, in a real life they don't live without models. Field's hash should change when it's assigned to a model, comparing creation_counter is not enough. We shouldn't consider the same field instance assigned to a different models as equal. IMO the current behavior is expected, we don't want to reintroduce an issue fixed in 502e75f9ed5476ffe8229109acf0c23999d4b533.

comment:3 by Adam Johnson, 3 years ago

As far as I'm aware, fields are quite specific, in a real life they don't live without models.

Yes the use case is contrived. But there are plenty of uses for fields in ORM expressions that *might* lead one to create references to a field before/after they are assigned to a model class.

We shouldn't consider the same field instance assigned to a different models as equal.

__hash__ is never used for equality. The only requirement is that *IF* two objects are equal, they have the same hash. There's no problem with creating a __hash__ that always does return 1 - it will only make its use in dictionaries slow.

Changing __hash__ in the PR was unnecessary for the previous fix. I believe it came from a misunderstanding around what __hash__ means.

IMO the current behavior is expected, we don't want to reintroduce an issue fixed in 502e75f9ed5476ffe8229109acf0c23999d4b533.

I don't think there's any risk. The PR passes all tests, including the one introduced, minus the unnecessary hash() assertions.

comment:4 by Simon Charette, 3 years ago

But there are plenty of uses for fields in ORM expressions that *might* lead one to create references to a field before/after they are assigned to a model class.

Could you go into more detail about this statement? Fields meant to be assigned/bound to a model class should all be set after the app setup phase and no fields should be bound to a model class after this point so I'm struggling to come up with the plenty of uses in ORM expressions you are referring to. Just saying that providing concrete examples might help your case here.

Ideally, we'd have a notion of BoundField(model: Model, field: Field) to avoid having to differentiate between fields attached to models and ones that aren't based of .model (that's what we do in django.forms for example) but that's a large change for sure.

As for reverting the patch for the __hash__ part I'm not against it, from what I understand and what Adam pointed out it should not break anything. If believe strongly that inherited fields should have they own __hash__ then maybe we could simplify this whole thing by having the interitance logic bump creation_counter instead?

in reply to:  3 comment:5 by Mariusz Felisiak, 3 years ago

Resolution: invalid
Status: closednew
Summary: model Field.__hash__ should be immutableModel Field.__hash__() should be immutable.
Triage Stage: UnreviewedAccepted

Replying to Adam Johnson:

__hash__ is never used for equality. The only requirement is that *IF* two objects are equal, they have the same hash. There's no problem with creating a __hash__ that always does return 1 - it will only make its use in dictionaries slow.

Changing __hash__ in the PR was unnecessary for the previous fix.

Right, I missed it. I always find it suspicious when __hash__() does not match __eq__(), but it's probably just me 🤷, and we have a similar solution for Model.

I'm also struggling with finding "plenty of uses", nevertheless it should not block this change .

comment:6 by Mariusz Felisiak, 3 years ago

Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:7 by Adam Johnson, 3 years ago

Could you go into more detail about this statement? Fields meant to be assigned/bound to a model class should all be set after the app setup phase and no fields should be bound to a model class after this point so I'm struggling to come up with the plenty of uses in ORM expressions you are referring to. Just saying that providing concrete examples might help your case here.

I'm also struggling with finding "plenty of uses", nevertheless it should not block this change .

Sorry, I wasn't clear. I meant: There are plenty of uses for fields in ORM expressions. These *might* lead a user to create references to a field before/after they are assigned to a model class. I don't have a concrete example, and clearly it must be rare, since no one has reported a bug since Django 3.2. But, I don't think it's inconcievable that someone will try this at some point.

Last edited 3 years ago by Adam Johnson (previous) (diff)

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In fdfa97f:

Fixed #33441 -- Restored immutability of models.Field.hash().

Regression in 502e75f9ed5476ffe8229109acf0c23999d4b533.

Note: See TracTickets for help on using tickets.
Back to Top