Opened 5 years ago

Last modified 5 years ago

#31295 new Cleanup/optimization

Avoid Select widget triggering additional query when using ModelChoiceIterator.

Reported by: Aurélien Pardon Owned by: nobody
Component: Forms Version: 2.2
Severity: Normal Keywords: Model
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Aurélien Pardon)

ModelChoiceField use ModelChoiceIterator for its queryset/self.choices, which use .iterator() and doesn't cache the query under some conditions.

If the field is required, the method use_required_attribute (https://github.com/django/django/blob/master/django/forms/widgets.py#L689) fetch the first choice, making a duplicate query to the database (worse than a useless query, the data may have changed):

class Select(ChoiceWidget):
    [...]

    def use_required_attribute(self, initial):
        [...]
        first_choice = next(iter(self.choices), None)

Disabling the use of .iterator() (by adding an arbitrary .prefetch_related for example) leads to no duplicate queries.
https://github.com/django/django/blob/da79ee472d803963dc3ea81ee67767dc06068aac/django/forms/models.py#L1152 :

class ModelChoiceIterator:
   [...]

    def __iter__(self):
        [...]
        # Can't use iterator() when queryset uses prefetch_related()
        if not queryset._prefetch_related_lookups:
            queryset = queryset.iterator()

One solution would be to add another test to the previous piece of code :

        if not queryset._prefetch_related_lookups and not self.field.required:
            queryset = queryset.iterator()

Change History (13)

comment:1 by Aurélien Pardon, 5 years ago

Description: modified (diff)

comment:2 by Carlton Gibson, 5 years ago

Resolution: wontfix
Status: newclosed

Hi Aurélien,

Using the iterator was a deliberate design choice. (Placing your suggestion in causes tests failure, including test_choices_freshness in line with this.)

The use to point to in Select.use_required_attribute() shouldn't be problematic, since `self.choices` is fully evaluated in `ChoiceWidget.__init__()`, as per the comment there, precisely to allow repeated iteration.

I think the underlying issue here is probably #22841.

comment:3 by Aurélien Pardon, 5 years ago

Resolution: wontfix
Status: closednew
Type: Cleanup/optimizationBug

Hi Carlton and thanks for your answer but, all due respect, I think you overlooked this bug.

First, this simple example on an empty project/database that I just tested (Python 3.8.1, Django 2.2.10):

class MyModel(models.Model):
    my_field = models.CharField(max_length=1)

class MyForm(forms.Form):
    my_form_field = forms.ModelChoiceField(MyModel.objects.all(), , empty_label=None)

my_form = MyForm()
type(my_form.widget.choices) 
# >>> return <class 'django.forms.models.ModelChoiceIterator'>

my_form.as_ul()
len(connecton.queries)
# >>> return 2 (and it's two times "SELECT * from my_app.my_model")

Here is what I understand:
1) The code you sent, where ChoiceWidget.choices is evaluated and saved into a list is executed at the initialization of the widget (which is executed at the initialization of the field at the declaration of the form).
2) But! a ModelChoiceField override self.choices here (it makes perfect sense: the choices have to be evaluated everytime the form is instanciated).
3) As the choices are bundled in a ModelChoiceIterator/.iterator(), when rendering the form, there is a test about the required attribute that try to fetch the first value of the queryset:

class Select(ChoiceWidget):
    """[...]"""
    def use_required_attribute(self, initial):
        """[...]"""
        first_choice = next(iter(self.choices), None)

4) ModelChoiceIterator.__iter__ is called. If empty_label is not None, everything is fine, as the next() grabs the empty label, without additional request to the database. But if empty_label is None (and if there is no prefetch to the queryset), an additional request is made (code is changed lightly to illustrate the bug):

    def __iter__(self):
        if self.field.empty_label is not None:
            yield ("", self.field.empty_label)
        for obj in self.queryset.iterator():
            yield self.choice(obj)

Now that I have made some tests on an empty project/database, I'm confident enough to state that it's a bug because there is a useless and additional request to the DB.

Thanks for all the work you put into the projects in and around Django. I hope I tested correctly, I do not want to waste your time.

Version 0, edited 5 years ago by Aurélien Pardon (next)

comment:4 by Carlton Gibson, 5 years ago

Resolution: needsinfo
Status: newclosed
Type: BugCleanup/optimization

Hi Aurélien,

Thanks for the follow-up. Really not a problem. Better to measure twice, cut once and all that. :)

The iterator behaviour — refetching each time — was deliberate, and is correct. I can't see an adjustment there that doesn't entail a fix to #22841.
(Duplicate then...)

Maybe you don't want that behaviour and a ModelChoiceIterator subclass would work for you? But as soon as I think that, better would be a Select widget subclass.

#27370 fixed the Select.use_required_attribute() behaviour. It necessarily needs the first item for the _choice_has_empty_value check. So, give the design of ModelChoiceIterator, fetching it isn't a Bug. (Yes there's an extra query but correctness trumps performance.)

Maybe there's a way of performing that Select.use_required_attribute() check without fetching the first item...? Would that be better than using a subclass for this kind of case? (If I know my dataset I can just return False from use_required_attribute().)

I guess we could look at a patch there. I'm going to call this needsinfo pending such though. I hope that makes sense.

in reply to:  4 comment:5 by Aurélien Pardon, 5 years ago

Replying to Carlton Gibson:

(Yes there's an extra query but correctness trumps performance.)

How is that correct if the queries yield different results ?
It only works if the two requests are inside a transaction, for example when ATOMIC_REQUESTS is set to True (which is not the default).

For example, imagine this model (the primary key is important, it's the value that the choice will use):

class MyModel(models.Model):
    my_field = models.CharField(max_length=1, primary_key=True)

If the first query (for the actual choices) returns ['a', 'b'], but the second one (for the required attribute) returns ['', 'b', 'c'], then the required attribute will be used leading to invalid HTML.

comment:6 by Carlton Gibson, 5 years ago

How is that correct if the queries yield different results ?

I don't know what else I can say. It's part of the design of ModelChoiceField that it re-evaluates it's queryset each time it's iterated.
See #3534 for example, 13 years ago:

ModelChoiceField and ModelMultipleChoiceField cache the output of their queryset the first time self.choices is accessed. This is bad for long-running processes, such as mod_python, because the cache gets stale. Plus, it's bad saving all of those choices in memory. The attached unit tests illustrate the problem.

The exact same memory usage considerations were the motivation for the move to iterator() in #23623. I can't see that being removed.

If the kind of race conditions you're talking about are a factor for your app then, at least until #22841 is addressed, maybe you do need a ModelChoiceIterator subclass for your field. (Always use the QuerySet rather than the iterator...)

in reply to:  6 comment:7 by Aurélien Pardon, 5 years ago

Replying to Carlton Gibson:

I don't know what else I can say. It's part of the design of ModelChoiceField that it re-evaluates it's queryset each time it's iterated.

ModelChoiceField iterates over his ModelChoiceIterator when rendering its widget (and its options).
ModelChoiceIterator does not necessarily re-evaluates it's queryset each time it's iterated: sometimes the query is cached (for example when the query involves a prefetch_related), sometimes it is not cached.

The reason ModelChoiceIterator use, sometimes, .iterator() is for performance (less memory usage, see #3534) and because, at the times, the queryset was evaluated/fetched only once when the the field was rendered.
After Django's Select widget adds a required="required" attribute, even if created with empty_label=True, the queryset is evaluated/fetched twice because of the new .use_required_attribute method.

See #3534 for example, 13 years ago:

ModelChoiceField and ModelMultipleChoiceField cache the output of their queryset the first time self.choices is accessed. This is bad for long-running processes, such as mod_python, because the cache gets stale. Plus, it's bad saving all of those choices in memory. The attached unit tests illustrate the problem.

The exact same memory usage considerations were the motivation for the move to iterator() in #23623. I can't see that being removed.

You said that "correctness trumps performance" but since https://code.djangoproject.com/ticket/27370, the usage of .iterator() in ModelChoiceIterator with the new .use_required_attribute when rendering the form leads to invalid HTML. How is this not a bug?
Moreover, how duplicate requests to the database (the second one is useless and leads to bug) do not fall under the same performance considerations of #3534?

I think the solution is to decide to use the required attribute after having rendered the field options: while iterating the choices/options, we can check if the first option allows us to use the required attribute.

comment:8 by Carlton Gibson, 5 years ago

If you can provide a test case demonstrating invalid HTML then that we be a big we could look at.

comment:9 by Aurélien Pardon, 5 years ago

The fact that the following simple code makes two times the same request to the database :

class MyModel(models.Model):
    my_field = models.CharField(max_length=1)

class MyForm(forms.Form):
    my_form_field = forms.ModelChoiceField(MyModel.objects.all(), empty_label=None)

my_form.as_ul()

is absolutely not working as intended. Show this to any other core dev and ask them, please.

This whole "invalid HTML code" is not an important issue. It happens yes, but under very rare circumstances (a blank primary key and an INSERT between the two identical requests to the database), and it also won't bother any browser. I can work hard and show you the invalid HTML code (using database trigger to emulate race conditions?) but honestly it's not the problem.

Last edited 5 years ago by Aurélien Pardon (previous) (diff)

comment:10 by Carlton Gibson, 5 years ago

Summary: required ModelChoiceField makes duplicate (cursor) queries to the databaseAvoid Select widget triggering additional query in ModelChoiceIterator.
Triage Stage: UnreviewedAccepted

Hi Aurélien,

Maybe there's a way of performing that Select.use_required_attribute() check without fetching the first item...?

It strikes me that we might be able to do this. This patch passes the existing tests:

diff --git a/django/forms/widgets.py b/django/forms/widgets.py
index 40ac1d3162..5bc040065a 100644
--- a/django/forms/widgets.py
+++ b/django/forms/widgets.py
@@ -696,7 +696,16 @@ class Select(ChoiceWidget):
         if self.allow_multiple_selected:
             return use_required_attribute
 
-        first_choice = next(iter(self.choices), None)
+        iterator = iter(self.choices)
+        # Avoid an extra query if we have a ModelChoiceIterator.
+        try:
+            empty_label = iterator.field.empty_label
+        except AttributeError:
+            pass
+        else:
+            if empty_label is not None:
+                return use_required_attribute
+        first_choice = next(iterator, None)
         return use_required_attribute and first_choice is not None and self._choice_has_empty_value(first_choice)

So with a further test asserting number of queries, a think about any edge cases--in particular where empty_label is None and when the queryset is empty--and a tidy up, I think we can evaluate a patch on that basis.

Thanks for your input!

Last edited 5 years ago by Carlton Gibson (previous) (diff)

comment:11 by Carlton Gibson, 5 years ago

Resolution: needsinfo
Status: closednew

comment:12 by Carlton Gibson, 5 years ago

Summary: Avoid Select widget triggering additional query in ModelChoiceIterator.Avoid Select widget triggering additional query when using ModelChoiceIterator.

comment:13 by Aurélien Pardon, 5 years ago

Hi Carlton,

Thanks for your answer. I'm afraid the code you proposed won't work, I just tested it. I will try to think about a solution.

The difficulty is that when rendering the widget, we need before rendering the options (and thus iterate over the ModelChoiceFieldIterator/fetch the data from the DB) if the HTML attribute required should be used.
And we also don't want to cache the query, for memory usage performance.

I think of two different solutions :

  • defer the presence of the required attribute after the iteration for the <options>. During this iteration, we store the fact that the first option allows or not the required attribute. After the rendering of the options, we put afterwards (or not) the required attribute. Maybe with a placeholder replaced by a regexp? I find this quite ugly.
  • Maybe the ModelChoiceIterator may cache it's first value? It should not take a lot of memory. I don't think it will work because I assume that, for efficiency, multiple rows are fetched from the database, if you want to cache the first value for an ulterior iteration, you also have to cache the rest of the rows until <options> are iterated.
  • ... or choose the lesser evil between "invalid HTML", "double query to the database" and "high memory usage".
Last edited 5 years ago by Aurélien Pardon (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top