Opened 7 years ago
Last modified 8 months ago
#28939 new Cleanup/optimization
QuerySet used by prefetch_related() does not use expected connection. — at Version 2
Reported by: | Nick Pope | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | prefetch, prefetch_related, using, connection |
Cc: | Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Update: I should have mentioned that I was using a custom database router. I now know this issue occurred due to a bug in that router.
I've run into an case with prefetching where the connection used for the prefetch queries is not the one I would expect:
# When using the default database connection: In [1]: from django.db import connections In [2]: connections['default'].queries Out[2]: [] In [3]: connections['readonly'].queries Out[3]: [] In [4]: User.objects.prefetch_related('operators').first() Out[4]: <User: example> In [5]: connections['default'].queries Out[5]: [{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT 1', 'time': '0.205'}, {'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM "operator" INNER JOIN "operator_users" ON ("operator"."id" = "operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1) ORDER BY "operator"."code" ASC', 'time': '0.011'}] In [6]: connections['readonly'].queries Out[6]: [] # When specifying the database connection to use: In [1]: from django.db import connections In [2]: connections['default'].queries Out[2]: [] In [3]: connections['readonly'].queries Out[3]: [] In [4]: User.objects.using('readonly').prefetch_related('operators').first() Out[4]: <User: example> In [5]: connections['default'].queries Out[5]: [{'sql': 'SELECT ... AS "_prefetch_related_val_user_id", ... FROM "operator" INNER JOIN "operator_users" ON ("operator"."id" = "operator_users"."operator_id") WHERE "operator_users"."user_id" IN (1) ORDER BY "operator"."code" ASC', 'time': '0.010'}] In [6]: connections['readonly'].queries Out[6]: [{'sql': 'SELECT ... FROM "auth_user" ORDER BY "auth_user"."id" ASC LIMIT 1', 'time': '0.002'}]
In the second case I would have expected all queries to be executed on the readonly
connection by default.
This can be achieved by using Prefetch('operators', queryset=Operator.objects.using('readonly'))
, but it means that plain strings cannot be used and often a lot of nested Prefetch()
objects can be required.
A solution to this could be to do the following:
- Use the connection from the original
QuerySet
that called.prefetch_related()
by default. Possibly add(Doesn't fit nicely with the API.)QuerySet.prefetch_related(using=...)
to allow overriding for all prefetch queries.- Possibly add
Prefetch(using=...)
to allow overriding for a specific branch of prefetching. (This would propagate down.)
Change History (2)
comment:1 by , 7 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 7 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Description: | modified (diff) |
Type: | Bug → Cleanup/optimization |
Hi Simon,
Sorry - false alarm.
I did some digging and found tests.prefetch_related.tests.MultiDbTests
which seems to show my scenario working correctly.
I've subsequently realised that I wasn't handling instance._state.db
correctly in db_for_read()
in my router.
It may be worth improving the documentation on database routers and/or prefetch_related()
warning of this pitfall.
I have left this ticket open, but feel free to close it if you don't think the documentation needs improving.
Here is (an example of) my fixed database router:
class Router(object): def __init__(self): self.mapping = {'special': 'other'} def db_for_read(self, model, **hints): db = self.mapping.get(model._meta.app_label, 'default') instance = hints.get('instance') return instance._state.db or db if instance else db def db_for_write(self, model, **hints): return self.mapping.get(model._meta.app_label, 'default') def allow_relation(self, obj1, obj2, **hints): db1 = self.mapping.get(obj1._meta.app_label, 'default') db2 = self.mapping.get(obj2._meta.app_label, 'default') return db1 == db2 def allow_migrate(self, db, app_label, model_name=None, **hints): return db == self.mapping.get(app_label, 'default')
Beforehand, db_for_read()
was the same as db_for_write()
.
Essentially there are two databases (default
and other
) which also both have read-only connections to a replica (default-ro
and other-ro
).
This is an application-routing router - everything goes to default
unless it is from the special
application which goes to other
.
Typically we opt-in to using the read-only connections via QuerySet.using()
and this is where things then broke due to not handling instance._state.db
.
I think 1. would make the most sense here unfortunately I'm afraid this might break backward compatibility with regards to database router.
For example, given the following scenario
It would change the behaviour of
Foo.objects.using('default').prefetch_related('bars')
becauseusing()
has usually precedence over the routers suggestion. We could special case prefetches to only default to the base queryset'susing
if no router provide adb_for_read
suggestion but that would be counter intuitive IMO and add even more complexity to the read database alias selection logic.So, I think that 1. is favourable but would require a deprecation cycle for the case where both a base query alias is forced though
using
and thatdb_for_read
suggests a database. In the mean time I think that 3. is easier to implement and could be useful even in a future where 1. is fixed.