Opened 10 years ago
Closed 10 years ago
#24614 closed Uncategorized (needsinfo)
Symmetry for select_related/prefetch_related for handling non-related fields
Reported by: | no | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've been looking at the generated queries from some of our code and noticed a few things about select_related and prefetch_related. To start with, I want to provide some code examples for reference:
from django.db import models class City(models.Model): name = models.CharField(max_length = 50) population = models.PositiveIntegerField() mayor = models.ForeignKey('Person', null=True, default=None) class Person(models.Model): name = models.CharField(max_length = 50) hometown = models.ForeignKey(City) def initial_data(): c1 = City(name='City1', population=2) c1.save() p1 = Person(name='Citizen One', hometown=c1) p1.save() p2 = Person(name='Mayor', hometown=c1) p2.save() c1.mayor = p2 c1.save(update_fields=['mayor'])
# ==== Current behaviour (with abbreviated sql) ==== # --- Select related at depth=1 --- >>> p = Person.objects.select_related('hometown').get(id=1) # SELECT p.*, c.* FROM app_person p INNER JOIN app_city c ON p.hometown_id = c.id WHERE p.id=1; >>> print(p.hometown.name) # No query City1 >>> print(p.hometown.population) # No query 2 >>> print(p.hometown.mayor) # SELECT p.* FROM app_person p WHERE p.id = 2 Person(Mayor) # --- Select related on a non-rel field --- p = Person.objects.select_related('hometown__name').get(id=1) # SELECT p.*, c.* FROM app_person p INNER JOIN app_city c ON p.hometown_id = c.id WHERE p.id=1; >>> print(p.hometown.name) # No query City1 >>> print(p.hometown.population) # No query 2 >>> print(p.hometown.mayor) # SELECT p.* FROM app_person p WHERE p.id = 2 Person(Mayor) # --- Select related on a rel-rel field --- p = Person.objects.select_related('hometown__mayor').get(id=1) # SELECT p1.*, c.*, p2.* FROM app_person p1 INNER JOIN app_city c ON p1.hometown_id = c.id LEFT OUTER JOIN app_person p2 ON c.mayor_id = p2.id WHERE p1.id = 1 >>> print(p.hometown.name) # No query City1 >>> print(p.hometown.population) # No query 2 >>> print(p.hometown.mayor) # No query Person(Mayor) # --- Prefetch related on a related field --- >>> p = Person.objects.prefetch_related('hometown').get(id=1) # SELECT p.* FROM app_person p WHERE p.id = 1 # SELECT c.* FROM app_city c WHERE c.id IN (...) >>> print(p.hometown.name) # No query City1 >>> print(p.hometown.population) # No query 2 >>> print(p.hometown.mayor) # SELECT p.* FROM app_person p WHERE p.id = 2 Person(Mayor) # --- Prefetch related on a non-rel field --- >>> p = Person.objects.prefetch_related('hometown__name').get(id=1) # SELECT p.* FROM app_person p WHERE p.id = 1 # SELECT c.* FROM app_city c WHERE c.id IN (...) ValueError: 'hometown__name' does not resolve to an item that supports prefetching - this is an invalid parameter to prefetch_related(). # --- Prefetch related on a rel-rel field --- >>> p = Person.objects.prefetch_related('hometown__name').get(id=1) # SELECT p.* FROM app_person p WHERE p.id = 1 # SELECT c.* FROM app_city c WHERE c.id IN (...) # SELECT p.* FROM app_person p WHERE p.id = 2 >>> print(p.hometown.name) # No query City1 >>> print(p.hometown.population) # No query 2 >>> print(p.hometown.mayor) # No query Person(Mayor)
There are two things in the example I want to bring up:
The first is that select_related
and prefetch_related
differ in how they handle their arguments if they're invalid (ie, when they are pointing to non-related fields), select_related
will happily accept the argument and just fetch the table parts, whereas prefetch_related
will throw a ValueError
. I think both methods should either throw the ValueError
, or both sanitize the input :- by the same reason given in
django/db/models/query.py line ~1508
Last one, this *must* resolve to something that supports prefetching, otherwise there is no point adding it and the developer asking for it has made a mistake."
The second is that I believe there is some semantic overloading going on for the argument format for prefetch_related
/select_related
/only
/defer
, I think it's best explained in a table:
"model__attr" | prefetch_related | select_related | only/defer | only/defer + select_related
|
---|---|---|---|---|
attr is Related field | Extra query (expected), selects all fields | Joins field (expected), selects all fields | affects what is in the select clause (expected), but ignores "attr" and only selects "model" | affects what is in the select clause (expected), and only selects "attr" from "model" (expected) |
attr is non-relational field | ValueError | Ignores the field, selects all fields on table | affects what is in the select clause (expected), but ignores "attr" and only selects "model" | affects what is in the select clause (expected), and only selects "attr" from "model" (expected) |
There seems to be a two semantic overlap issues. First is with the name of select_related
in that it doesn't really affect which fields are "selected", only really which tables are joined, second is the semantic overlap between the argument formats which selected_related
accepts and which formats only/defer
accept. The main problem I perceive is that both accept arguments in the format "model__attr"
yet treat them differently. On one hand, prefetch_related
and select_related
are supposed to only deal with arguments where "__attr"
is a relational field (prefetch does correctly), on the other hand, only/defer
accept arguments where "__attr"
is both a relational field, or just a normal attribute field. To me, at least, these are two distinct data types, yet when I'm looking through code they are represented in exactly the same manner. Obviously, there isn't any other practical way to differentiate the two data types, so there should probably be changes made to how they're handled.
Originally, I was going to propose that if a non-relational field was detected in prefetch_related
or select_related
, then it is passed to an only()
, since when I see the name "select_related", I associate it with the fields "SELECT"ed, thus A.objects.select_related('b__attr')
would behave the same as A.objects.select_related('b').only('b__attr', 'aAttr1', 'aAttr2').defer('b__otherattr')
, however, after trying to reason about it, I'm not sure it's a good idea since it further exacerbates the symmetry issues.
Pros:
- More succinct code for optimizing which fields are selected
select_related
accepts the same field types asonly/defer
- Allows more fine grained control to which fields are
SELECT
ed (fields in theselect_related
will be the ones actually selected for those related tables)
Cons/Concerns:
- Behaviour is less like
prefetch_related
- Possibly breaking change for
select_related
on relational fields. - Doesn't really make sense for
prefetch_related
, thus the symmetry issue betweenprefetch_related
andselect_related
. - What is selected for
A.objects.select_related('b__c')
wherec
is also a related field? Is it all fields onc
or just its id?
Perhaps a better approach, although probably most controversial, would be to rename select_related
to join_related
and have it throw a ValueError
on non-relational fields. This would resolve the semantic overloading of the method name and implications of its implementations, however, it doesn't resolve the second - although admittedly more minor - the previously mentioned argument/data types/semantics issue.
Anyway, I'm hoping a discussion can be started on this, but at very minimum I think select_related
should match the behaviour of prefetch_related
when it encounters an argument that does not resolve to a relational field. Let me know if a separate issue should be reported.
Change History (5)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
So... there was a little discussion on IRC about this, and it was pointed out that this was already done. Are you sure you see the behavior you described on master?
Some participants in the IRC discussion favored turning it from error to a deprecation warning -- so it might be changed.
comment:3 by , 10 years ago
Thanks for your responses @shaib, I really do appreciate you taking the time to read it and being open to discussing it, I'll move the discussion and a reply into the mailing list, as you suggest, over the weekend (does the mailing list handle the table and code formatting?).
As for your second response: I've just double checked in dj 1.8, and the validation only occurs if the field is on the immediate model, or if it's invalid:
>>> Person.objects.select_related('foo') django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'foo'. Choices are: hometown >>> Person.objects.select_related('hometown__foo') django.core.exceptions.FieldError: Invalid field name(s) given in select_related: 'foo'. Choices are: mayor >>> Person.objects.select_related('hometown__name') # query = select ... blah [ Person object ...] >>> Person.objects.select_related('name') django.core.exceptions.FieldError: Non-relational field given in select_related: 'name'. Choices are: hometown
Notice how it will happily accept the third one. Would you like me to file a separate bug report for this issue?
comment:5 by , 10 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
I created a ticket for the separate issue: #24687.
Closing this issue pending a mailing discussion as discussed above.
Hello, and thanks for the long and detailed explanation and discussion. I appreciate that creating this ticket must have taken significant effort. However, I tend to disagree with your conclusions.
For the symmetry in validation of non-related fields, I tend to think that the benefit does not justify breaking backwards-compatibility. If someone
select_relatred
s an extra field -- either by mistake, or because a field was once a relation and changed -- and their code works to their satisfaction, I see no reason to break it for them on Django upgrade. Ifselect_related
were a new feature, you'd be right on. Unfortunately, it isn't.On the "semantic overload" issue, I am not convinced that is a real problem in common use, and even if it is, I'm not sure if it should be solved by (again, backwards-incompatible) behavior change or by improving documentation.
I am leaving this ticket open, as I consider the work you put into it deserves another eye looking at it, but if you really want to start a discussion, the place for it is not the issue tracker, but the DevelopersMailingList.