#30079 closed Cleanup/optimization (wontfix)
Prefetch cache should be aware of database source and .using() should not always trigger a new query
Reported by: | Mike Lissner | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | prefetch, multidatabase |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've been poking around the darker edges of multidatabase prefetching and I discovered a strange pattern. When you have cached prefetch values, they're not aware of which database they came from:
# This runs zero queries In [62]: pizzas = Pizza.objects.filter(pk=17).prefetch_related('toppings').using('replica') # This runs two, as expected In [63]: pizza0 = pizzas[0] # This runs zero, even though it normally would come from "default" and the cache came from "replica" In [64]: pizza0.toppings.all() Out[64]: [<Topping 1: Cheese>, <Topping 2: Bacon>] # This runs one query, even though this data should already be populated. In [65]: pizza0.toppings.all().using('replica') Out[65]: [<Topping 1: Cheese>, <Topping 2: Bacon>]
I was pretty surprised and confused that adding the using() method on the last line above busted the cache even when the database selected was the same as the one used to pre-populate the cache.
I was also surprised (though less so) at the opposite, that *not* including the non-default DB (on line 64) *was* able to hit the cache (which was populated by "replica") instead of hitting the default database (as the query would normally do).
On IRC chatting about this, the question was: "Well, should filter hit the DB or use the cache? What about exclude and other methods? Where do you draw the line?" I think the simple line to draw is whether something changes the SQL query. using() methods don't change SQL, so they should work properly on a cached result. filter() and exclude() do change the SQL, so they *shouldn't* use the cache.
Attachments (1)
Change History (5)
follow-up: 2 comment:1 by , 6 years ago
Type: | Bug → Cleanup/optimization |
---|---|
Version: | 2.1 → master |
comment:2 by , 6 years ago
Replying to Carlton Gibson:
On the basis of the difference between
ln64
andln65
I'm inclined to accept this.pizza0
came fromreplica
and theln64
call just goes with that. (That seems right/expected to me...) I see your point that adding theusing()
shouldn't change that.
Good! I made myself clear. Always a good thing.
At first glance the "if you didn't add a
filter()
/exclude()
" idea seems OK too.
That was my analysis too: "This seems to be a reasonable line in the sand."
Two questions though:
- What's the use-case for re-adding the
using()
call inln65
? I can't quite see why you'd do that...? (Another way of looking atusing()
might be "go back to this DB"...)
I can't make a really clear argument for my use case, but basically it was, "I'm doing weird things with prefetching and I want to be sure Django pulls this data properly from the right DB/cache." In other words, it was me trying to be explicit about where the caches were *supposed* come from, and then getting bit by having the caches busted. From my reading, ln64 is totally unclear which DB it would use since it would normally use the "default" DB, but had a cached value, so which wins? My solution was to be more explicit to make sure it used the cache, but that backfired (luckily, I was watching the query logs).
- Have you looked at all at what a patch might look like? (If it's simple enough, that helps...)
No, not at all, unfortunately. I've looked into the Django code from time to time, but mostly I stay on my side of the API, lest I get in trouble. Especially when it comes to the ORM bits.
comment:3 by , 6 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Has patch: | set |
Resolution: | → wontfix |
Status: | new → closed |
OK, at best, this is a cleanup on the documentation. The admonition note in the `prefech_related` docs already covers this:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
using()
is a subsequent chained method. It returns a new QuerySet, and the results and prefetch caches are not transferred.
Arguably, using()
, whilst returning a new QuerySet, doesn't actually imply a different query.
As such I drafted the attached patch.
... as always with
QuerySets
, any subsequent chained methods which return a new
QuerySet
, normally implying a different database query ...
In my opinion, the patch unnecessarily complicates the meaning. We know from our use of QuerySets that all the Methods that return new QuerySets behave this way, even if we might forget it occasionally. Thus wontfix
.
Thanks for the report Mike!
comment:4 by , 6 years ago
Whew, that's pretty subtle in both the old and the new versions of the documentation. I have to confess that I don't think of chaining as the trigger for a new queryset. I think about the query as a new trigger for a queryset. I guess I never read this part of the documentation or didn't take it to heart.
I don't know, I don't think we can document our way out of this behavior, but I have no idea how difficult this is to fix technically.
Hi Mike. Thanks for the report.
On the basis of the difference between
ln64
andln65
I'm inclined to accept this.pizza0
came fromreplica
and theln64
call just goes with that. (That seems right/expected to me...) I see your point that adding theusing()
shouldn't change that.At first glance the "if you didn't add a
filter()
/exclude()
" idea seems OK too.Two questions though:
using()
call inln65
? I can't quite see why you'd do that...? (Another way of looking atusing()
might be "go back to this DB"...)Thanks.