Opened 11 years ago

Closed 10 years ago

Last modified 9 years ago

#22046 closed Bug (wontfix)

unhelpful queryset handling for model formsets with data

Reported by: Jim Bailey Owned by:
Component: Forms Version: 1.6
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When a model formset is constructed with data there are two related issues with the queryset handling.

The first is a performance issue - if a queryset is not specified then the _object_dict cache gets filled with every model instance in the database even if the data only deals with a subset of the instances.

The second problem can happen when a queryset is specified, and can lead to unexpected duplication of records in the database. If the specified queryset does not cover all the instances specified in the data, then the missmatched instances are assumed to be new, their pk is ignored, and they are written as new instances on save.

For example, a user has their pagination set to 2 items per page. There are the following Authors in the system:
(1, 'a')
(2, 'c')
(3, 'e')
(4, 'g')
They load the second page of authors sorted by name so the formset has details for (3, 'e') and (4, 'g').
Another user then adds another author (5, 'b').
The first user then saves the page without changes. The model formset is contructed with the data (3, 'e') and (4, 'g'), but also a queryset for the second page of authors sorted by name, which now contains (2, 'c') and (3, 'e').
The records with pk 3 is found and updated as expected. The record with pk 4 is not found in the queryset, so a new
Author instance is created: (6, 'g').
There are now two authors with the name 'g'.

I think the solution to both issues is the same. When a model formset is constructed with data the only records that should be loaded from the database and operated on are those specified by the pks in the data. The queryset used to pull in that data can and should be constructed automatically from the data. Any queryset specified in the constructor should be ignored.

Change History (10)

comment:1 by Jim Bailey, 11 years ago

Has patch: set

I have added unit tests and a patch on this branch:
https://github.com/dgym/django/tree/ticket_22046

I have also created a pull request here:
https://github.com/django/django/pull/2277

comment:2 by hershbergien, 11 years ago

Owner: changed from nobody to hershbergien
Status: newassigned

comment:3 by hershbergien, 11 years ago

Easy pickings: unset
Needs tests: set
Patch needs improvement: set

The pull request does not address the second problem described in this ticket. The pull request needs more work and testing. See the attached test, which reproduces the second problem as described.

    def test_model_formset_with_paginated_form_data(self):
        # saving a formset that contains a subset of the view queryset should
        # not overwrite existing data with different pks.
        # see #22046

        Author.objects.create(pk=1, name='a')
        Author.objects.create(pk=2, name='c')
        Author.objects.create(pk=3, name='e')
        Author.objects.create(pk=4, name='g')

        data = {
            'form-TOTAL_FORMS': 2,
            'form-INITIAL_FORMS': 2,
            'form-MAX_NUM_FORMS': 2,
            'form-0-id': '3',
            'form-0-name': 'c',
            'form-1-id': '4',
            'form-1-name': 'g',
        }

        FormSet = modelformset_factory(Author, fields=('name', ))
        formset = FormSet(data=data, queryset=Author.objects.order_by('name'))

        # adding another author before saving the form exposes the issue
        Author.objects.create(pk=5, name='b')

        formset.save()

        # expect only one author named `c'
        self.assertEqual(1, Author.objects.filter(name='c').count())

comment:4 by Jim Bailey, 11 years ago

Hello, thank you very much for taking a look.

I do not agree with the expectation stated in test_model_formset_with_paginated_form_data(). What I would expect is:

Before the form is saved:

pk | name
---+-----
 1 | a
 2 | c
 3 | e
 4 | g
 5 | b

The form data specifies that the record with id/pk 3 should be updated to have name 'c' and the record with id/pk 4 should be updated to have name 'g', so after the form is saved:

pk | name
---+-----
 1 | a
 2 | c
 3 | c
 4 | g
 5 | b

So there should now be two authors named 'c'.

I don't think this new test really touches on any of the issues raised by this ticket as the specified queryset does contain all the records present in the form. The order is irrelevant. This is why this test fails both with and without my patch, and if it is corrected (there actually should be two 'c' authors) then it passes both with and without my patch.

I feel that test_model_formset_with_mismatched_queryset() adequately tests problem 2 - that the correct record is updated (instead of a new one being created) despite the specified queryset not covering the form's record.

comment:5 by Tim Graham, 11 years ago

Triage Stage: UnreviewedAccepted

comment:6 by hershbergien, 11 years ago

Jim,

After taking a second look I concur that test_model_formset_with_mismatched_queryset() addresses problem 2. While manually trying to reproduce the problem I thought I had an example that showed it still was not fixed. In light of my flawed test case I'm left believing I made the same logical error. My apologies.

Last edited 11 years ago by hershbergien (previous) (diff)

comment:7 by hershbergien, 11 years ago

Owner: hershbergien removed
Status: assignednew

comment:8 by Tim Graham, 10 years ago

Resolution: wontfix
Status: newclosed

While the security model for formsets is currently somewhat ill-defined, I don't think a security model of "if you expose a formset for a model class, every instance of that model class is available for editing by anyone with access to POST to that formset" is what we want.

comment:9 by Luke Plant, 9 years ago

@timgraham

I accept your last comment, but I don't see how that results in a WONTFIX for this bug. As far as I can see, it's a real bug - when you save a model formset, and the underlying queryset returns different rows from what it did previously (which can happen on various insertions/deletions/changes), then you are going to get very unexpected behaviour.

AFAICS, the logic of BaseModelFormSet._existing_object is only correct if the queryset passed in to the formset contains all applicable records both times.

There is also the security concern that it is currently very easy to get this wrong. If you forget to pass in the queryset argument to the formset when you are POSTing, then you will have a major security issue, because the user can specify any ID they like and update the record.

comment:10 by Tim Graham, 9 years ago

The fragility of the current approach of matching data to instances in the database seems to be tracked in #9076, #15574, and possibly others. I guess we could reclassify this as a duplicate if you feel that's better (wontfix seems to be mainly about the approach proposed here) or clarify how this ticket is different from those others.

To address the security concern you brought up, I imagine would could start a deprecation path toward requiring queryset to be specified, similar to how we now require ModelForm's Meta.fields or exclude to be specified. Shall we open a ticket for that?

Note: See TracTickets for help on using tickets.
Back to Top