Opened 16 years ago
Closed 11 years ago
#10733 closed Bug (fixed)
Invalid results when deferring fields in more than one related model with only()
Reported by: | mrts | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | efficient-admin |
Cc: | Carl Meyer, ruosteinen, walter+django@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Given the same models as in #10710, the following works as expected:
>>> results = C.objects.all().only('name', 'a', 'b').select_related() >>> results[0].a.name u'a2' >>> results[0].b.name u'b1'
, but the following does not pull in the second related model field (b.name
):
>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related() >>> results[0].a.name u'a2' >>> results[0].b.name ''
Attachments (2)
Change History (32)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:3 by , 16 years ago
milestone: | → 1.1 |
---|
comment:4 by , 16 years ago
I looked at this for a while and didn't make any huge progress, but I think I found 2 issues(maybe :/):
- The SQL for
Leaf.objects.only('child__name', 'second_child__name').select_related()
is only doing 1 join, and I think it needs to be doing 2.
- It seems that neither of those child models is actually a deferred model there.
comment:5 by , 16 years ago
Keywords: | efficient-admin added |
---|
comment:6 by , 16 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:7 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:8 by , 16 years ago
I've looked into this a little; the problem is not in the sql generation, as the expected columns and joins are all present.
I've tracked the problem down to somewhere in QuerySet.iterator(). In the test case, the model instantiation when selecting the single result is:
DeferredRelatedC_Deferred_is_published_lots_of_text(a_id=1, b_id=2, id=3, name=u'c3') DeferredRelatedA(1, u'a1', 2, u'b2') DeferredRelatedB()
That is, the kwargs are not being set properly when instantiating the two related models.
comment:9 by , 16 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
The fault is in django/models/query.py:get_cached_row()
: when it recurses it was not passing the only_load
dict down the stack. The attached patch fixes this.
However I noticed that the recursion is not passing anything for the offset
parameter, which is to do with aggregation queries. I can't say for sure at the moment, but it looks to me like an additional bug relating to aggregated queries with related models could be hiding in here as well. Or is the offset
parameter only relevant to the top-level model?
comment:10 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:11 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:12 by , 16 years ago
comment:14 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Alas, given the above models, the following happens:
>>> from only_breakage.models import C >>> from django.db import connection >>> import copy
Expected
All three tables joined:
>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related().query.as_sql() ('SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_a"."id", "only_breakage_a"."name", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id") INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")', ())
A single query fetches all what's needed:
>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related() >>> connection.queries [] >>> results[0].a.name u'a2' >>> connection.queries [{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_a"."id", "only_breakage_a"."name", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id") INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1', 'time': '0.002'}, >>> queries = copy.deepcopy(connection.queries) >>> results[0].b.name u'b1' >>> assert connection.queries == queries
Actually got
Only two tables joined (the A
table is discarded):
>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related().query.as_sql() ('SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")', ())
Three queries happen:
>>> results = C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related() >>> connection.queries [] >>> results[0].a.name u'a2' >>> connection.queries [{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1', 'time': '0.002'}, {'sql': u'SELECT "only_breakage_a"."id", "only_breakage_a"."name", "only_breakage_a"."lots_of_text", "only_breakage_a"."a_field" FROM "only_breakage_a" WHERE "only_breakage_a"."id" = 2 ', 'time': '0.000'}] >>> results[0].b.name u'b1' >>> connection.queries [{'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1', 'time': '0.002'}, {'sql': u'SELECT "only_breakage_a"."id", "only_breakage_a"."name", "only_breakage_a"."lots_of_text", "only_breakage_a"."a_field" FROM "only_breakage_a" WHERE "only_breakage_a"."id" = 2 ', 'time': '0.000'}, {'sql': u'SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id") LIMIT 1', 'time': '0.000'}]
comment:15 by , 16 years ago
milestone: | 1.1 → 1.2 |
---|
Ok - there are actually three problems here:
- C.objects.all().select_related() does one join, not two.
- The aname clause in only() doesn't roll out to the right field inclusion list in the queryset
- The internal only_load data structure can't differentiate between aname and bname.
Point 1 should probably be it's own ticket (it strikes me that it probably already is, but a quick search didn't reveal an obvious candidate). The workaround is to explicitly specify select_related('a','b').
Points 2 and 3 are closely related, but they aren't simple fixes.
I'm going to punt this for v1.1. None of these problems cause data loss or incorrect answers, and I can't see any reason that these problems can't be fixed while preserving backwards compatibility in the only() interface. I will grant that these problems are inconvenient and lead to inefficiencies, but if we're sticking to a time-based release schedule, we're going to have to live with the fact that releases are going to have some bugs.
That said, if someone is particularly affected by this problem and is able to work up a patch before v1.1 I'm happy to look at getting it into trunk.
comment:16 by , 16 years ago
Right you are, select_related('a', 'b').query.as_sql()
looks correct:
>>> C.objects.all().only('name', 'a', 'b', 'a__name', 'b__name').select_related('a', 'b').query.as_sql() ('SELECT "only_breakage_c"."id", "only_breakage_c"."name", "only_breakage_c"."a_id", "only_breakage_c"."b_id", "only_breakage_a"."id", "only_breakage_a"."name", "only_breakage_b"."id", "only_breakage_b"."name" FROM "only_breakage_c" LEFT OUTER JOIN "only_breakage_a" ON ("only_breakage_c"."a_id" = "only_breakage_a"."id") INNER JOIN "only_breakage_b" ON ("only_breakage_c"."b_id" = "only_breakage_b"."id")', ())
Doesn't affect the behaviour though, problems 1-3 remain.
follow-up: 18 comment:17 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
follow-up: 19 comment:18 by , 15 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Replying to ccahoon:
As Russell said, this is not fixed. Reopening.
comment:19 by , 15 years ago
comment:20 by , 15 years ago
Shouldn't at least a simple patch, correctly passing only_load
in db/models/query.py:get_cached_row()
be included into trunk?
This won't solve all problems, but will allow simple cases to work properly which is certainly better than nothing.
comment:21 by , 15 years ago
Cc: | added |
---|
comment:22 by , 15 years ago
Cc: | added |
---|
So, what is the status here? Are we going to see this in 1.2?
comment:24 by , 14 years ago
Cc: | added |
---|
comment:25 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:13 by , 12 years ago
Status: | reopened → new |
---|
comment:14 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Looks like
b.id
is actuallyNone
in that case:Another odd side-effect -- when accessing the deferred fields, bogus values are returned (as in #10710, notice how
lots_of_text
returns theid
value):Running on