Opened 5 years ago
Last modified 5 years ago
#31295 new Cleanup/optimization
Avoid Select widget triggering additional query when using ModelChoiceIterator.
Reported by: | Aurélien Pardon | Owned by: | nobody |
---|---|---|---|
Component: | Forms | Version: | 2.2 |
Severity: | Normal | Keywords: | Model |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
ModelChoiceField use ModelChoiceIterator for its queryset
/self.choices
, which use .iterator()
and doesn't cache the query under some conditions.
If the field is required, the method use_required_attribute (https://github.com/django/django/blob/master/django/forms/widgets.py#L689) fetch the first choice, making a duplicate query to the database (worse than a useless query, the data may have changed):
class Select(ChoiceWidget): [...] def use_required_attribute(self, initial): [...] first_choice = next(iter(self.choices), None)
Disabling the use of .iterator()
(by adding an arbitrary .prefetch_related
for example) leads to no duplicate queries.
https://github.com/django/django/blob/da79ee472d803963dc3ea81ee67767dc06068aac/django/forms/models.py#L1152 :
class ModelChoiceIterator: [...] def __iter__(self): [...] # Can't use iterator() when queryset uses prefetch_related() if not queryset._prefetch_related_lookups: queryset = queryset.iterator()
One solution would be to add another test to the previous piece of code :
if not queryset._prefetch_related_lookups and not self.field.required: queryset = queryset.iterator()
Change History (13)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 5 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Type: | Cleanup/optimization → Bug |
Hi Carlton and thanks for your answer but, all due respect, I think you overlooked this bug.
First, this simple example on an empty project/database that I just tested (Python 3.8.1, Django 2.2.10):
class MyModel(models.Model): my_field = models.CharField(max_length=1) class MyForm(forms.Form): my_form_field = forms.ModelChoiceField(MyModel.objects.all(), empty_label=None) my_form = MyForm() type(my_form.widget.choices) # >>> return <class 'django.forms.models.ModelChoiceIterator'> and not <list> my_form.as_ul() len(connecton.queries) # >>> return 2 (and it's two times "SELECT * from my_app.my_model")
Here is what I understand:
- The code you sent, where
ChoiceWidget.choices
is evaluated and saved into alist
, is executed at the initialization of the widget (which is executed at the initialization of the field at the declaration of the form). - But! a
ModelChoiceField
overrideself.choices
here (it makes sense: the choices have to be queried from the database everytime the form is instanciated). - When rendering the form, there is a test about the
required
attribute (for valid HTML rendering from the comment) that try to fetch the first value of the queryset:class Select(ChoiceWidget): """[...]""" def use_required_attribute(self, initial): """[...]""" first_choice = next(iter(self.choices), None)
ModelChoiceIterator.__iter__
is called. Ifempty_label
is notNone
, everything is fine, as thenext()
grabs the empty label, without additional request to the database. But ifempty_label
isNone
(and if there is no prefetch to the queryset), an additional request is made (code is changed lightly to illustrate the bug) because the queryset was not cached (due to.iterator()
):def __iter__(self): if self.field.empty_label is not None: yield ("", self.field.empty_label) for obj in self.queryset.iterator(): yield self.choice(obj)
Now that I have made some tests on an empty project/database, I'm confident enough to state that it's a bug because there is a useless and additional request to the DB.
Thanks for all the work you put into the projects in and around Django. I hope I tested correctly, I do not want to waste your time.
follow-up: 5 comment:4 by , 5 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Type: | Bug → Cleanup/optimization |
Hi Aurélien,
Thanks for the follow-up. Really not a problem. Better to measure twice, cut once and all that. :)
The iterator behaviour — refetching each time — was deliberate, and is correct. I can't see an adjustment there that doesn't entail a fix to #22841.
(Duplicate then...)
Maybe you don't want that behaviour and a ModelChoiceIterator subclass would work for you? But as soon as I think that, better would be a Select widget subclass.
#27370 fixed the Select.use_required_attribute()
behaviour. It necessarily needs the first item for the _choice_has_empty_value
check. So, give the design of ModelChoiceIterator, fetching it isn't a Bug. (Yes there's an extra query but correctness trumps performance.)
Maybe there's a way of performing that Select.use_required_attribute()
check without fetching the first item...? Would that be better than using a subclass for this kind of case? (If I know my dataset I can just return False
from use_required_attribute()
.)
I guess we could look at a patch there. I'm going to call this needsinfo pending such though. I hope that makes sense.
comment:5 by , 5 years ago
Replying to Carlton Gibson:
(Yes there's an extra query but correctness trumps performance.)
How is that correct if the queries yield different results ?
It only works if the two requests are inside a transaction, for example when ATOMIC_REQUESTS
is set to True
(which is not the default).
For example, imagine this model (the primary key is important, it's the value that the choice will use):
class MyModel(models.Model): my_field = models.CharField(max_length=1, primary_key=True)
If the first query (for the actual choices) returns ['a', 'b']
, but the second one (for the required attribute) returns ['', 'b', 'c']
, then the required attribute will be used leading to invalid HTML.
follow-up: 7 comment:6 by , 5 years ago
How is that correct if the queries yield different results ?
I don't know what else I can say. It's part of the design of ModelChoiceField that it re-evaluates it's queryset each time it's iterated.
See #3534 for example, 13 years ago:
ModelChoiceField and ModelMultipleChoiceField cache the output of their queryset the first time self.choices is accessed. This is bad for long-running processes, such as mod_python, because the cache gets stale. Plus, it's bad saving all of those choices in memory. The attached unit tests illustrate the problem.
The exact same memory usage considerations were the motivation for the move to iterator() in #23623. I can't see that being removed.
If the kind of race conditions you're talking about are a factor for your app then, at least until #22841 is addressed, maybe you do need a ModelChoiceIterator subclass for your field. (Always use the QuerySet rather than the iterator...)
comment:7 by , 5 years ago
Replying to Carlton Gibson:
I don't know what else I can say. It's part of the design of ModelChoiceField that it re-evaluates it's queryset each time it's iterated.
ModelChoiceField
iterates over his ModelChoiceIterator
when rendering its widget (and its options).
ModelChoiceIterator
does not necessarily re-evaluates it's queryset each time it's iterated: sometimes the query is cached (for example when the query involves a prefetch_related
), sometimes it is not cached.
The reason ModelChoiceIterator
use, sometimes, .iterator()
is for performance (less memory usage, see #3534) and because, at the times, the queryset was evaluated/fetched only once when the the field was rendered.
After Django's Select widget adds a required="required" attribute, even if created with empty_label=True, the queryset is evaluated/fetched twice because of the new .use_required_attribute
method.
See #3534 for example, 13 years ago:
ModelChoiceField and ModelMultipleChoiceField cache the output of their queryset the first time self.choices is accessed. This is bad for long-running processes, such as mod_python, because the cache gets stale. Plus, it's bad saving all of those choices in memory. The attached unit tests illustrate the problem.
The exact same memory usage considerations were the motivation for the move to iterator() in #23623. I can't see that being removed.
You said that "correctness trumps performance" but since https://code.djangoproject.com/ticket/27370, the usage of .iterator()
in ModelChoiceIterator with the new .use_required_attribute
when rendering the form leads to invalid HTML. How is this not a bug?
Moreover, how duplicate requests to the database (the second one is useless and leads to bug) do not fall under the same performance considerations of #3534?
I think the solution is to decide to use the required attribute after having rendered the field options: while iterating the choices/options, we can check if the first option allows us to use the required attribute.
comment:8 by , 5 years ago
If you can provide a test case demonstrating invalid HTML then that we be a big we could look at.
comment:9 by , 5 years ago
The fact that the following simple code makes two times the same request to the database :
class MyModel(models.Model): my_field = models.CharField(max_length=1) class MyForm(forms.Form): my_form_field = forms.ModelChoiceField(MyModel.objects.all(), empty_label=None) my_form.as_ul()
is absolutely not working as intended. Show this to any other core dev and ask them, please.
This whole "invalid HTML code" is not an important issue. It happens yes, but under very rare circumstances (a blank primary key and an INSERT
between the two identical requests to the database), and it also won't bother any browser. I can work hard and show you the invalid HTML code (using database trigger to emulate race conditions?) but honestly it's not the problem.
comment:10 by , 5 years ago
Summary: | required ModelChoiceField makes duplicate (cursor) queries to the database → Avoid Select widget triggering additional query in ModelChoiceIterator. |
---|---|
Triage Stage: | Unreviewed → Accepted |
Hi Aurélien,
Maybe there's a way of performing that Select.use_required_attribute() check without fetching the first item...?
It strikes me that we might be able to do this. This patch passes the existing tests:
diff --git a/django/forms/widgets.py b/django/forms/widgets.py index 40ac1d3162..5bc040065a 100644 --- a/django/forms/widgets.py +++ b/django/forms/widgets.py @@ -696,7 +696,16 @@ class Select(ChoiceWidget): if self.allow_multiple_selected: return use_required_attribute - first_choice = next(iter(self.choices), None) + iterator = iter(self.choices) + # Avoid an extra query if we have a ModelChoiceIterator. + try: + empty_label = iterator.field.empty_label + except AttributeError: + pass + else: + if empty_label is not None: + return use_required_attribute + first_choice = next(iterator, None) return use_required_attribute and first_choice is not None and self._choice_has_empty_value(first_choice)
So with a further test asserting number of queries, a think about any edge cases, and a tidy up, I think we can evaluate a patch on that basis.
Thanks for your input!
comment:11 by , 5 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:12 by , 5 years ago
Summary: | Avoid Select widget triggering additional query in ModelChoiceIterator. → Avoid Select widget triggering additional query when using ModelChoiceIterator. |
---|
comment:13 by , 5 years ago
Hi Carlton,
Thanks for your answer. I'm afraid the code you proposed won't work, I just tested it. I will try to think about a solution.
The difficulty is that when rendering the widget, we need before rendering the options (and thus iterate over the ModelChoiceFieldIterator/fetch the data from the DB) if the HTML attribute required
should be used.
And we also don't want to cache the query, for memory usage performance.
I think of two different solutions :
- defer the presence of the
required
attribute after the iteration for the <options>. During this iteration, we store the fact that the first option allows or not therequired
attribute. After the rendering of the options, we put afterwards (or not) therequired
attribute. Maybe with a placeholder replaced by a regexp? I find this quite ugly. - Maybe the ModelChoiceIterator may cache it's first value? It should not take a lot of memory. I don't think it will work because I assume that, for efficiency, multiple rows are fetched from the database, if you want to cache the first value for an ulterior iteration, you also have to cache the rest of the rows until <options> are iterated.
- ... or choose the lesser evil between "invalid HTML", "double query to the database" and "high memory usage".
Hi Aurélien,
Using the iterator was a deliberate design choice. (Placing your suggestion in causes tests failure, including
test_choices_freshness
in line with this.)The use to point to in
Select.use_required_attribute()
shouldn't be problematic, since `self.choices` is fully evaluated in `ChoiceWidget.__init__()`, as per the comment there, precisely to allow repeated iteration.I think the underlying issue here is probably #22841.