#27985 closed Bug (fixed)
Converting `Foo.objects.filter(bar=None)` to an `IsNull` too early.
Reported by: | Jarek Glowacki | Owned by: | Sergey Fedoseev |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | sql None NULL transform |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This is a little similar to #25946, but different angle.
Case Study:
We define a custom Field which converts emptystring to None
before sending to DB and then None
back to emptystring when reading from DB:
WHY (optional reading):
For want of a CharField that enforces uniqueness on non-empty elements only.
'' == ''
in the DB, butNULL != NULL
. We don't want to just allow passing None into it in the code though. Hence NulledCharField.
Example Implementation:
class NulledCharField(models.CharField): description = _("CharField that stores NULL for emptystrings but returns ''.") def from_db_value(self, value, expression, connection, context): if value is None: return '' else: return value def to_python(self, value): if isinstance(value, models.CharField): return value if value is None: return '' return value def get_prep_value(self, value): if value is '': return None else: return value
This works for the most part, but not with filtering!
I took a look at the SQL, and Foo.objects.filter(bar='')
resolves to ... WHERE "Foo"."bar" = NULL; args=(None,)
Compare this to Foo.objects.filter(bar=None)
, which resolves to ... WHERE "Foo"."bar" IS NULL', ()
The reason for this is that the =None
-> __isnull=True
conversion happens before any transformation of the value.
Specifically, in django/db/models/sql/query.py,
Query.build_filter()
calls on self.prepare_lookup_value()
(line1139) before it calls self.build_lookup()
(line 1187).
The former does the isnull replacement, while the latter transforms the value.
This also applies (more severely) to the converse: If my get_prep_value()
were to for some reason convert None
to something else, then this would get ignored as Django would just decide to use the IsNull
lookup off the bat.
Solution seems like we just need to rearrange the order here a little. Happy to give it a go, but if anyone has context as to whether it's maybe this way by design, do tell. The above cases would at the very least serve as good failing tests if we think this kind of custom field support is appropriate.
Change History (14)
comment:1 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 8 years ago
Here's a failing testcase for now; will have a crack at writing a patch soon..
Also realised it gets even messier for the .exclude()
case, where it adds an extra query.
This causes a mutually exclusive pair of conditions, that'll always evaluate to false.
It does this:
Foo.objects.exclude(bar='')
-> ... WHERE NOT ("foo"."bar" = NULL AND "foo"."bar" IS NOT NULL)
Instead of just
Foo.objects.exclude(bar=None)
-> ... WHERE NOT ("foo"."bar" IS NULL)
comment:3 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 8 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
Yeahh, nope.
The current workflow is too convoluted to just slot in a patch for this.
I couldn't find a way to untangle the ordering such that it'd cover this use case without breaking multiple other use cases.
I'd suggest taking this ticket under consideration when someone comes round to refactor this file.
Specifically, the functions build_filter
and prepare_lookup_value
need to be broken down into more managable bits. It's evident they've had extra conditional steps shoehorned in over time, which has made them very difficult to work with.
For anyone who comes across this in the future, all we need to address this case is ensure that this snippet of code from prepare_lookup_value
:
# Interpret '__exact=None' as the sql 'is NULL'; otherwise, reject all # uses of None as a query value. if value is None: if lookups[-1] not in ('exact', 'iexact'): raise ValueError("Cannot use None as a query value") return True, ['isnull'], used_joins
is evaluated AFTER the field's get_prep_value
has processed the value.
Here are the failing tests again: Failing tests
And here's a bandaid workaround that hacks two lookups and the form field to get around the problem.
class NulledCharField(models.CharField): """ This is required when we want uniqueness on a CharField whilst also allowing it to be empty. In the DB, '' == '', but NULL != NULL. So we store emptystring as NULL in the db, but treat it as emptystring in the code. """ description = _("CharField that stores emptystring as NULL but returns ''.") def from_db_value(self, value, expression, connection, context): if value is None: return '' else: return value def to_python(self, value): if isinstance(value, models.CharField): return value if value is None: return '' return value def get_prep_value(self, value): if value is '': return None else: return value class NulledCharField(forms.CharField): """ A form field for the encompassing model field. """ def clean(self, value): if value == '': return None return value def formfield(self, form_class=None, **kwargs): return super().formfield(form_class=self.__class__.NulledCharField, **kwargs) @NulledCharField.register_lookup class NulledCharExactLookup(Exact): """ Generate ISNULL in the `exact` lookup, because Django won't use the `isnull` lookup correctly. """ lookup_name = 'exact' def as_sql(self, compiler, connection): sql, params = compiler.compile(self.lhs) if self.rhs in ('', None): return "%s IS NULL" % sql, params else: return super().as_sql(compiler, connection) @NulledCharField.register_lookup class NulledCharIsNullLookup(IsNull): """ Doing something like Foo.objects.exclude(bar='') has django trying to shove an extra isnull check where it doesn't need one. We solve this by just crippling this lookup altogether. We don't need it as all ISNULL checks that we actually do want, we generate in the `exact` lookup. :( """ lookup_name = 'isnull' prepare_rhs = False def as_sql(self, compiler, connection): sql, params = compiler.compile(self.lhs) return "TRUE", params
comment:5 by , 7 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:10 by , 7 years ago
This change caused a regression for my project. Here is a simplified example:
class Group(models.Model): pass class User(models.Model): groups = models.ManyToManyField('Group') class MyTest(TestCase): def test_unsaved(self): Group.objects.create() u = User() self.assertFalse(Group.objects.filter(user=u).exists())
This test passes on 1.11 but fails on 2.0.
On 1.11, passing an unsaved model to a M2M filter would result in an INNER JOIN
. A bisect shows that since commit a5c60404476124c682c996bfb1eb077d59f1ec53, it now results in a LEFT OUTER JOIN
.
At this stage, only looking for clarification on what is correct to see if this change in behavior is intentional. Is this now correct as a LEFT JOIN
or should this be considered a regression for this specific use?
comment:11 by , 7 years ago
Cc: | added |
---|
John, I feel like using a LOUTER
join is the appropriate behavior here.
I assume the previous query was
SELECT * FROM group INNER JOIN user_groups ON (user_groups.group_id = group.id) WHERE user_groups.user_id = NULL
And the new one is
SELECT * FROM group LEFT OUTER JOIN user_groups ON (user_groups.group_id = group.id) WHERE user_groups.user_id IS NULL
Using m2m__isnull=True
(or reverse_fk__isnull=True
) always triggered this LOUTER
AFAIK. I guess what broke here was to make m2m=None
-> m2m__isnull=True
which seems more appropriate given how SQL deals with = NULL
.
I guess we could get related and reverse related field __exact
lookups to warn/error out when an unsaved object is passed as a value though, it looks like this was undefined behavior if it wasn't caught by the test suite. An other option would be to make the related __exact=unsaved_object
lookup raise EmptyResultSet
on compilation to prevent any query from taking place and restore the 1.11 behavior.
comment:12 by , 7 years ago
Here's a PR implementing the EmptyResultSet
approach that could be used in a 2.0 backport.
https://github.com/django/django/compare/master...charettes:ticket-27985
I really think we should deprecating this path in 2.1+ though given the ambiguity of the lookup when instance._state.adding
, I feel like this would be coherent with the checks preventing instance.m2m.add(unsaved_instance)
.
comment:13 by , 7 years ago
Thanks Simon.
I assume the previous query was ...
Yes, that is right. And looking at this again, now, I think the =
in = NULL
was certainly wrong or unintended. So it would seem I was relying on undefined behavior and got lucky.
Here's a PR implementing the EmptyResultSet approach that could be used in a 2.0 backport. ... I really think we should deprecating this path in 2.1+ though
Please don't feel the need to add a workaround or hack for my project. Now that it is identified, I'm happy to fix my buggy code to rely only on defined behavior.
Or are you suggesting that in a future version of Django, RelatedExact
raise an exception when used with an unsaved object?
comment:14 by , 7 years ago
Please don't feel the need to add a workaround or hack for my project. Now that it is identified, I'm happy to fix my buggy code to rely only on defined behavior.
Or are you suggesting that in a future version of Django,
RelatedExact
raise an exception when used with an unsaved object?
While it was undefined I'm kind of shocked we didn't have any tests for this. I'm not sure if you added this test on purpose in your suite but I could see it happening often in cases where users mix model factories with the ORM in tests.
I feel like preventing unsaved objects from being used in all related filters in future versions of Django would be the best course of action here. Surprisingly erroring on if value._state.adding
in get_normalized_values yields quite a few test failures in the suite. Nothing that looks unsolvable though.
I haven't investigated, but please provide a patch and I'll take a closer look at it.