Opened 13 years ago

Closed 11 years ago

Last modified 11 years ago

#16855 closed Bug (fixed)

select_related doesn't chain as expected

Reported by: Leo Shklovskii Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: jdunck@…, erik.wognsen@…, seldon, Trey Hunner, hrehfeld@…, tomek@…, marc.tamlyn@…, mike@…, i.virabyan@…, real.human@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If I'm doing something like:

Foo.objects.select_related('fieldone').select_related('fieldtwo')

I would expect the select related to function on both fieldone and fieldtwo. What actually happens is that the select_related only works on fieldtwo. This is a problem for compositing queries at runtime (for example in search filters) and is a departure from how I would expect queryset methods to work.

If the design decision is to keep it this way, there should be a note in the docs pointing this out.

Change History (39)

comment:1 by Carl Meyer, 13 years ago

Triage Stage: UnreviewedAccepted

Agreed.

comment:2 by Carl Meyer, 13 years ago

Jeremy Dunck observes in this django-developers thread that in the current implementation the "depth" setting is apparently sticky, while fields are not, and gives a more detailed outline of the desired behavior of depth in combination with multiple calls to select_related.

comment:3 by Leo Shklovskii, 13 years ago

Thanks Carl, that approach makes a lot of sense to me and would solve my use case. At the moment I have to go dig at the 'private' data structure that's holding the depth and list of fields to act on in the queryset object to get what I need. It sort of works but is a really fragile hack. If this behavior could get fixed in an upcoming 1.3.x release that would be fantastic.

comment:4 by Carl Meyer, 13 years ago

Hi Leo - thanks for confirming that the proposed fix would meet your use case! Unfortunately it's quite unlikely that this fix will make it into a 1.3.X release. In our current documented release process, the last stable branch receives only critical bugfixes (security, data-loss, or crashing bugs). This doesn't fit into any of those categories.

This could certainly still make it into 1.4, but that would require someone to put together a patch in the near future.

comment:5 by Jeremy Dunck, 13 years ago

Cc: jdunck@… added

comment:6 by Erik Wognsen, 13 years ago

Cc: erik.wognsen@… added

I have a problem where using select_related with field arguments in the queryset method of my ModelAdmin doesn't work.

Could this bug also be the cause of that problem because select_related is applied again later by the admin, clearing my field choices?

comment:7 by seldon, 13 years ago

Cc: seldon added

comment:8 by Trey Hunner, 13 years ago

Cc: Trey Hunner added

comment:9 by hrehfeld@…, 13 years ago

By no means I am an experienced Django user, but this https://gist.github.com/2549147 seems to fix this issue on my single usecase. No further testing done. Can someone verify this?

comment:10 by hrehfeld@…, 13 years ago

Cc: hrehfeld@… added

comment:11 by Tomek Paczkowski, 12 years ago

Cc: tomek@… added

comment:12 by Tomek Paczkowski, 12 years ago

Has patch: set

comment:13 by Tomek Paczkowski, 12 years ago

Anssi Kääriäinen left some good comments on my pull request. Now I think there's need to extend this bug to consider following scenario:

# should get all immediate relations AND the Order
Species.objects.select_related(depth=1).select_related('genus__family__order') 

comment:14 by Tomek Paczkowski, 12 years ago

Patch needs improvement: set

comment:15 by Anssi Kääriäinen, 12 years ago

I am afraid that dealing with the case of select_related(depth=1).select_related('genus__family__order') will be a lot more complicated than the patch you have currently.

I would look into trying to turn the select related descents (in db/models/query.py:get_klass_info, db/models/sql/compiler.py:fill_related_selections) into two phase implementations. In the first phase, turn the depth into lookups, in the second phase just add the lookups, and remove the depth arguments from the current descent implementations. Of course, there are numerous ways forward here. If you spot some easier way, use it instead.

Last edited 12 years ago by Anssi Kääriäinen (previous) (diff)

comment:16 by Marc Tamlyn, 12 years ago

Cc: marc.tamlyn@… added

comment:17 by Mike Fogel, 12 years ago

Cc: mike@… added

comment:18 by Ivan Virabyan, 12 years ago

Cc: i.virabyan@… added

comment:19 by Luke Plant, 12 years ago

One solution might be to deprecate the 'depth' argument to select_related(). I never thought it was a good idea, and I don't think I've ever used it in practice, because it could so easily start doing the wrong thing if you change your schema. It also just seems wrong - if you are using select_related(), you are making specific fine-tunings to the SQL produced. To do that using a broad "just add joins to a depth = n" instruction seems bizarre.

If depth was deprecated, I don't think it would be too bad to have bugs related to it (since we've already got such bugs) until it is finally removed, and then we can avoid fixing the combination.

Also, while I'm here, we should make the API for clearing match prefetch_related() and defer() i.e. pass None as single argument to clear.

Last edited 12 years ago by Luke Plant (previous) (diff)

comment:20 by Marc Tamlyn, 12 years ago

+1 to deprecating depth.

comment:21 by Ivan Virabyan, 12 years ago

+1, never seen anyone using depth argument

comment:22 by Anssi Kääriäinen, 12 years ago

+1 from me, too. It is dangerous to use. In addition, the select_related code will be a little bit cleaner if depth was removed (not much, but even a little is something...)

comment:23 by Marc Tamlyn, 12 years ago

Proposed deprecation of depth to go into 1.5 release on django-dev list at https://groups.google.com/forum/?fromgroups=#!topic/django-developers/K3Fzqw9RRgg

comment:24 by Aymeric Augustin, 12 years ago

+1 for deprecating depth too.

comment:25 by Marc Tamlyn, 12 years ago

Given the overwhelmingly positive response from core devs to deprecating depth, I went ahead and made the PR: https://github.com/django/django/pull/488

comment:26 by Preston Holmes <preston@…>, 12 years ago

In 965cc0b1ffa27574e5684a18f9400577e5b4598d:

Deprecated depth kwarg on select_related.

This is the start of a deprecation path for the depth kwarg on
select_related. Removing this will allow us to update select_related so
it chains properly and have an API similar to prefetch_related.

Thanks to Marc Tamlyn for spearheading and initial patch.

refs #16855

comment:27 by Preston Holmes <preston@…>, 12 years ago

In 0131da06220c1bc7e25df9ea50219e480e5004b3:

[1.5.x] Deprecated depth kwarg on select_related.

This is the start of a deprecation path for the depth kwarg on
select_related. Removing this will allow us to update select_related so
it chains properly and have an API similar to prefetch_related.

Thanks to Marc Tamlyn for spearheading and initial patch.

refs #16855

comment:28 by Luke Plant, 12 years ago

If we are deprecating the depth argument, we need to remove our own usage of it, or deprecate APIs that use it. The admin uses it:

https://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.list_select_related

I think we should deprecate this entirely, and encourage people instead to implement ModelAdmin.get_queryset in order to add select_related or prefetch_related calls.

We also should fix the admin so that it does the right thing when reacting to list_display - currently it is pretty stupid:

https://github.com/django/django/blob/79dd751b0b9e428bf386d7304f442fa9815fdbba/django/contrib/admin/views/main.py#L343

We can either make it more intelligent, or remove this behaviour entirely, but the latter is more likely to give performance regressions.

comment:29 by Marc Tamlyn, 12 years ago

I agree it would be nice to tidy up the use of select_related in the admin - though we do need to make sure it remains intelligent, I know a lot of my sites depend on that performance gain.

To clarify though, as far as I can tell none of Django is broken by the removal of the depth kwarg - it's still acceptable at present (if inadvisable) to pass no arguments to select_related() and have Django auto-follow all of the FKs (i.e. depth=infinity). To remove that as well is more likely to be a major backwards incompatibility.

This does beg the question as to what the correct API should be for chaining and clearing - is it ok to have select_related() and select_related(None) do different things? (former would add all automatically, latter would clear list.)

comment:30 by Luke Plant, 12 years ago

Ah - I was actually thinking we should only have select_related(*fields), and remove the behaviour where no arguments means 'select everything'. I think that has the same problems as selecting to an arbitrary depth. I was actually thinking that the current implementation worked by having 'depth=5' as the default, so select_related() was the same as select_related(depth=5), and the former would be automatically deprecated along with the latter.

In fact, the depth = 5 idea is implemented at a lower level - in django/db/models/sql/query.py. So it still doesn't do infinite traversal AFAICS.

comment:31 by Anssi Kääriäinen, 12 years ago

How about changing list_select_related to accept a list of selections. That way you could pretty easily define the needed select related stuff.

...
    list_display = 'my_fk_field'
    list_select_related = ('my_fk_field', 'my_fk_field__more')

Raise deprecation warning for True in there.

get_queryset() override works, too, though the admin seems to prefer the declarative attribute style...

Intelligent select_related is somewhat hard to do correctly, although that depends on the definition of what intelligent select_related means...

I am in favour of getting rid of unlimited or depth=5 select_related() calls altogether.

comment:32 by Marc Tamlyn, 12 years ago

"Intelligent" select_related in the admin should just be a case of ensuring all the columns on the list view which are foreign keys are passed into a select_related call. I haven't looked at the practicalities of this, but I know that the statements needed are the same as those needed for ordering by that column, so we should be able to reuse some code (and the admin_order_field property for methods which create columns).

I like the new version of list_select_related.

There are a few uses of a blank select_related in the auth code, and a stack of them in the tests but these should just be a case of working out which fields need to change and updating them.

I've posted on the dev group again to clarify this change in deprecation - to me it's a bigger deal that just removing depth now and I'm probably only +0 on it.

comment:33 by Simon Charette, 12 years ago

Implementing such an intelligent detection would be hard; think of the list_display callbacks that may access foreign objects.

# models.py
class City(models.Model):
   name = models.CharField(max_length=30)

   def __unicode__(self):
       return self.name

class Street(models.Model):
    name = models.CharField(max_length=30)
    city = models.ForeignKey(City)

   def __unicode__(self):
       return "%s, %s" % (self.name, self.city)

# admin.py
class StreetAdmin(admin.ModelAdmin):
    list_display = ('__unicode__', 'display_city',)
    list_select_related = True

    def display_city(self, obj):
        city = obj.city
        return '<a href="%s">%s</a>' % (city.get_absolute_url(), city)
    display_city.allow_tags = True

If we replace the list_select_related = True behavior to an automatic detection we should introduce another property (inspired by admin_order_field) on those callbacks, say admin_list_select_related, to expose which field the intelligent mechanism should select. Otherwise people will have to override get_queryset to achieve what was automatically done by specifying list_select_related = True prior to this change.

FWIW I prefer the list_select_related change proposed by @akaariai (list of selection) since it's way more declarative and less error prone.

comment:34 by Tomek Paczkowski, 11 years ago

Described list_select_related change was committed as fix for #19080 (9ed971f4), though it does not deprecate the Boolean variant.

comment:35 by Marc Tamlyn, 11 years ago

I was thinking about this when I committed the fix for #19080 - I think it should be reasonable (and a good compromise solution to this issue) to have effectively two reset-mechanisms on this function. With depth gone after 1.6, there will be three ways of calling select_related:

select_related('foo', 'bar') is the preferred usage, where you specify the fields. Subsequent calls of this form will append to each other.
select_related() removes list and replaces with "all" (actually roughly the same as original depth=5).
select_related(None) removes list completely.

I think this will be pretty consistent with the existing behaviour (given multiple calls with names are pretty weird now), and also more consistent with other similar functions (like prefetch_related).

comment:36 by Tim Graham, 11 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

PR from Marc.

comment:37 by Marc Tamlyn <marc.tamlyn@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 349c12d3f5c015c2f8b36917da21230c2ac1acb4:

Fixed #16855 -- select_related() chains as expected.

select_related('foo').select_related('bar') is now equivalent to
select_related('foo', 'bar').

Also reworded docs to recommend select_related(*fields) over select_related()

comment:38 by Tai Lee, 11 years ago

Please do not further reduce the functionality of select_related() by removing the no-arguments variant (select all related, with a high hard coded depth limit to prevent recursive joins).

I use select_related() with no arguments all the time, as I have found that in almost all my use cases, following too many joins and returning more data than may be necessary is still much more performant than returning too little data, and subsequently having template authors generate additional queries by accessing related fields that were not selected in the view.

It also saves the view programmer from tediously listing larger numbers of related fields when they legitimately want to select most or all related fields, and avoids the need for view programmers to know in advance which related fields a template author will want to access.

I much rather fallback to naming explicit fields in select_related() only when there is an actual performance problem because of too many joins, instead of premature optimisation by trying to explicitly list all relations, and having to re-visit that code when a template or model changes.

comment:39 by Tai Lee, 11 years ago

Cc: real.human@… added
Note: See TracTickets for help on using tickets.
Back to Top