#20625 closed New feature (fixed)
Custom Chainable QuerySets
Reported by: | Daniel Sokolowski | Owned by: | loic84 |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | QuerySet, models.Manager, chainable |
Cc: | loic@…, mike@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Provide a mechanism for defining chainable custom QuerySet directly through a subclassed models.Manager
definition instead of the current approach that requires a custom models.Manager
+ QuerySet
pair. This ticket potentially replaces/complements https://code.djangoproject.com/ticket/16748.
I suggest that we allow defining chainable querysets by allowing the django programmer to add a chainable == True
attribute to the custom queryset method. The proposed syntax would look as shown below and a working github pull request will be submitted shortly; thoughts and feedback welcomed.
class OfferManager(models.Manager): """ Example of a chainable custom query set """ ... QUERYSET_PUBLIC_KWARGS = {'status__gte': STATUS_ENABLED} QUERYSET_ACTIVE_KWARGS = {'status': STATUS_ENABLED} ... def public(self): """ Returns all entries accessible through front end site""" return self.all().filter(...) public.chainable = True # instructs to dynamically tranplat this method onto # returned QuerySet as <queryset>.public(...) # effectively providing chainable custom QuerySets def active(self): """ Returns offers that are open to negotiation """ return self.public().filter(**OfferManager.QUERYSET_ACTIVE_KWARGS) # an example of how to reffer to OfferManager # constants as 'self' context changes active.chainable = True ...
Change History (41)
comment:1 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 12 years ago
comment:3 by , 12 years ago
Has patch: | set |
---|
Properly formatted usage code example:
class OfferManager(models.Manager): """ Example of a chainable custom query set """ ... QUERYSET_PUBLIC_KWARGS = {'status__gte': STATUS_ENABLED} QUERYSET_ACTIVE_KWARGS = {'status': STATUS_ENABLED} ... def public(self): """ Returns all entries accessible through front end site""" return self.all().filter(...) public.chainable = True # instructs to dynamically tranplat this method onto # returned QuerySet as <queryset>.public(...) # effectively providing chainable custom QuerySets def active(self): """ Returns offers that are open to negotiation """ return self.public().filter(**OfferManager.QUERYSET_ACTIVE_KWARGS) # an example of how to reffer to OfferManager # constants as 'self' context changes active.chainable = True ...
comment:4 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
I'm accepting the ticket in principle - we need a solution to this problem, evidenced by the number of solutions/complaints out there:
https://github.com/zacharyvoase/django-qmethod
http://stackoverflow.com/questions/4576622/in-django-can-you-add-a-method-to-querysets
http://djangosnippets.org/snippets/562/
http://djangosnippets.org/snippets/734/
http://djangosnippets.org/snippets/2114/
http://djangosnippets.org/snippets/2117/
And I don't think having multiple solutions to this problem helps people - it just adds more-than-one-way-to-do-it.
However, I'm not at the moment accepting the approach given in the ticket (neither am I rejecting it). We need to consider the pros/cons of the different methods in the existing solutions above.
For the solution in the PR above, I'd say the major disadvantage is that you have methods on the Manager which are not being used as Manager methods, but callables that could be attached to either Managers or QuerySets. This means surprises w.r.t to self
(as documented in the example), and it would be easy to think it is working (because it works without chaining), or to have nasty surprises if you call methods that exist on QuerySet but have been overridden on the Manager subclass.
comment:5 by , 12 years ago
Cc: | added |
---|
Overlapping ticket with related discussion: #16748
I see two main patterns in these attempts:
- Those that make it easier to provide a custom
QuerySet
:
https://github.com/jtillman/django/compare/CustomQuerySet
https://github.com/django/django/pull/317
http://djangosnippets.org/snippets/734/
- Those that pass custom
Manager
methods onto theQuerySet
:
https://github.com/django/django/pull/1284/
https://github.com/zacharyvoase/django-qmethod
http://djangosnippets.org/snippets/562/
I personnaly favor the second option because it results in an API where both Model.objects.custom_method()
and Model.objects.all().custom_method()
work, which is what you've come to expect with the methods from the stock QuerySet
.
Before we worry about the implementation we'll have to settle on the general idea.
comment:6 by , 12 years ago
I believe one problem with the proposed way is that chainable manager methods are lost when the QuerySet switches its class. The switching happens as side effect of certain calls (for example .values(), .values_list()). Also, creating dynamic classes as done in the patch is likely somewhat expensive, and will also likely break pickling (as the dynamic class can't be found from module level). Both of the latter problems are solvable, first one by caching the dynamic class, second one by altering QuerySet.reduce().
Other solutions to this problem are to make Manager itself a QuerySet (or subclass of it). I tried this some time ago and it was evident this path has a lot of problems ahead. Yet another solution is to call the chainable manager methods from the manager, but explicitly pass the queryset into the method, something like
class MyManager: @chainable def published(self, chain_to, other_params...): return chain_to.filter(...)
with some meta-magic (or just plain old-style wrapper function) the chain_to could be automatically created as get_queryset() if called from manager, but if called from qs, then chained would be the qs instance. However it will be a bit confusing to call this method with objects.published(datetime.now())... Alternative is to not magically provide the chain_to qs when called from manager - instead the user should create the chain_to if it is None.
A big +1 to the idea, but there needs to be a bit more discussion about what way to solve this.
My opinion is that if it is possible to get rid of the Manager class completely, or make it QuerySet subclass, that would be optimal path. Unfortunately this seems hard to do.
comment:7 by , 12 years ago
akaariai's idea of having Manager
inherit from QuerySet
made me think of another option.
We can copy QuerySet
methods onto the Manager
.
That's similar to what we manually do to proxy all the current QuerySet
methods, we'd just have to do it dynamically.
The end user's code would look like this:
class MyQueryset(QuerySet): def is_published(self): pass class MyModel(Model): objects = Manager(queryset_class=MyQueryset) >>> hasattr(MyModel.objects, 'is_published') True
comment:8 by , 12 years ago
Proof of concept: https://github.com/loic/django/compare/ticket20625
I wanted to leverage the Django test suite, so I tested my concept on the actual Manager
class, the complete suite is passing.
We might not be able to do such refactor in the wild (removing all the proxy methods) since calls to super
wouldn't work anymore, but the idea is there.
comment:9 by , 12 years ago
FWIW here's what I've been using for the past years.
It doesn't account for QuerySet class changes thought (values_list
) and should be adapted to make sure Model.objects.delete()
can't be call directly.
comment:10 by , 12 years ago
From the semantic/syntax angle I don't believe we should pursue the idea of combining QuerySet and models.Manager into one as to me even though they do overlap in functionality they make sense to remain distinct, and the more I think about creating custom QuerySets on the models.Manager level the more I realize it is not the the right approach; it is just what we are familiar with and thought through the Django documentation.
I think the `objects=Manager(queryset_class=...) is the correct way , I would say all or almost all of the custom managers I have written should actually have been custom QuerySets. So I endorse loic84 approach (but would like to see if we could omit the need for the '.manager' attribute).
So Djangonauts - speak up and let's get a consensus going :)
comment:11 by , 12 years ago
@charettes' proposal is pretty much identical to mine (as opposed to the 2 patterns described in comment:5), except for the factory which if I understand correctly wouldn't be needed anymore if this made it into core.
I picked an opt-in strategy with QuerySet.method.manager=True
but we could just as well pick the following opt-out strategy:
- Accept by default if
QuerySet.method.manager
is not present and the method name doesn't start with an underscore (thus protecting private and magic methods) QuerySet.method.manager=False
for public methods that don't make sense at theManager
level (i.e.QuerySet.delete()
)QuerySet.method.manager=True
to opt in for private methods.
Edit: Implementation of the opt-out strategy https://github.com/loic/django/compare/ticket20625_optout.
comment:12 by , 12 years ago
@loic84 patch is looking good!
The only issue left is the QuerySet
class changes, e.g. values()
which returns a ValuesQuerySet
instance that wouldn't inherit for the custom QuerySet
subclass.
comment:13 by , 12 years ago
@charettes: It's not a trivial issue! I pushed an extra commit to the branch where I play with dynamic types, everything but pickling works. Any suggestion?
comment:14 by , 12 years ago
Added another commit which fixes pickling, I believe this POC is as good as it can be, we now have to decide whether it's the approach we want to pursue.
If we decide to go for it, we need to settle on the scope (do we try to deprecate all legacy proxy methods or not) and I'll turn it into a proper patch with tests and docs.
For the record, I did try having Manager
inherit from QuerySet
, it's quite hard because of all the magic methods. I also believe it's undesirable in term of API as there are plenty of QuerySet
methods - the private ones in particular - that we don't want on the manager.
comment:15 by , 12 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
I've tried to come up with a strategy to deprecate all the legacy proxy methods.
The issue at hand is custom managers in the wild that call super
in overridden methods.
@charettes mentioned on IRC that __getattribute__
would enable calls to super
but that we wouldn’t be able to differentiate a normal access (which shouldn't go through the legacy method) and calls to super
which should trigger the deprecation warning.
We could have a metaclass, similar to RenameMethodsBase
which adds back legacy methods for every method with the same name in the MRO: if our method is ever called (by super
) we trigger the deprecation warning.
I ran into a problem when trying to deprecate Manager.create()
and Manager.get_or_create()
with RelatedManager
(and similar):
https://gist.github.com/loic/5861660
https://github.com/django/django/blob/master/tests/multiple_database/tests.py#L907-L921
If RelatedManager
doesn't call super
, we'll never reach custom managers' create()
and get_or_create()
. I don't see a solution to that other than making a backward incompatible change. Or we could elect a few proxy methods that shouldn’t be deprecated (like Manager.all()
, see comment in the patch).
Any suggestions?
If we can't reach consensus on the deprecation plan I suggest we move on with this ticket and open a new one regarding the deprecation itself.
comment:16 by , 12 years ago
@loic84 I agree that deprecation/removal of Manager
proxy methods shouldn't block this ticket and discussion should be moved to another ticket once we manage to come up with a solution for this one.
We should play safe and avoid messing with Manager
's MRO too much in the 1.6 release.
Lets see if #15363 behaves correctly first to avoid flooding users with deprecation warnings all over the place.
comment:17 by , 12 years ago
Owner: | changed from | to
---|
To summarize a bit:
- The latest patch should cover all concerns raised so far: it handles specialized querysets such as
ValuesQuerySet
, doesn't break pickling, cruft-free methods definitions, sane defaults (no private or magic methods), a mechanism to opt-out or force-in certain methods and last but not least, it's robust enough to sustain the complete Django test suite while replacing almost 90% ofManager
's public API.
- We settled that deprecating legacy proxy methods is out of scope for this ticket while acknowledging that it's desirable in the long run. We'll open a dedicated ticket once we land this feature.
- Since we are not deprecating the legacy methods, we need a dedicated test suite.
- 1.6b1 has landed, so we can resume work targeting master.
Where are we in term of consensus? I'll turn this into an RFC patch as soon as I get a green flag on the implementation.
Latest implementation can be found here: https://github.com/loic/django/compare/ticket20625_optout.
comment:18 by , 12 years ago
I have tried various versions of making Manager a subclass of QuerySet. The most promising approach was to use a temporary class for hiding QuerySet only functionality from the manager, and drop the temp class on first clone. So:
- On creation of manager, create a dynamic class and add HideQuerySetMethods to the
__bases__
of the dynamic class. Effect is that for example .delete() calls HideQuerySetMethods.delete(), not QuerySet.delete(). - On first clone, switch the class to the original class, that is drop the HideQuerySetMethods class from
__bases__
.
I got the solution to almost working state, but it required too many kluges. I ended up deleting the branch as there was just too many little problems to make it work cleanly.
I have seen the idea of this ticket discussed multiple times. Every time some solutions are offered, but the solutions always have some problematic cases that end up stalling the development effort. I think the solution in this ticket is clean, and makes it easy enough to add queryset methods. So, my opinion is that we should go forward with it.
I wouldn't actually remove the proxy methods at all. Removing them doesn't seem worth the effort, the maintenance burden of the methods is very small. If they are going to be removed, then all must be removed in one go. It is just weird if some of the proxy methods are still there because Django needs the super() method to be present, but other are removed.
It seems #17270 is a duplicate, I guess that one could be closed as this ticket seems to have more momentum at the moment.
comment:19 by , 12 years ago
@akaariai: I agree with the all or nothing stance regarding the method deprecation, and I also agree that the maintenance burden is rather small.
Just a clarification, Django itself doesn't need the calls to super()
, the problem is backward compatibility.
I'll tackle docs and tests tomorrow, hopefully we'll have an RFC patch pretty soon.
comment:20 by , 12 years ago
Sorry for the Django super() requirement confusion.
While writing my previous post I got an idea. I did some quick testing and it seems the idea needs to be presented. There are two key things in the idea:
- Managers are created by CustomQuerySet.manager_cls() method - this is a factory constructing a new manager class.
- The new manager class has dynamically created proxy methods for those methods in the CustomQuerySet that should be present in the manager.
So, you could do something like this:
class MyQuerySet(models.QuerySet): def bar(self): return ... bar.manager = True class MyModel(Model): objects = MyQuerySet.as_manager()
Now, MyModel.objects is a custom Manager subclass which has a proxy method for bar(). And it really *has the method*, so super() works correctly.
The API might not be optimal, but the idea that proxy methods are automatically created on dynamically created Manager classes should be considered carefully. In my opinion it offers better semantics than the __getattr__
based approach.
I wrote a very quick proof-of-concept patch. It is available from: https://github.com/akaariai/django/tree/manager_factory. The only interesting part of the patch is the method at https://github.com/akaariai/django/commit/283af213d1873cefd642b6d9aeb97f285ef227c4#L2R44
comment:21 by , 12 years ago
@akaariai: I'm glad you presented this idea; I find as_manager()
very elegant as it removes all the boilerplate from Manager
while remaining backward compatible, it's probably more efficient as well.
Its semantic is already used throughout Django (View.as_view()
, Form.as_p()
, etc.) so people should feel right at home.
Regarding the copy logic, I see you've used the "opt in" strategy from my initial patch, as per @danols' suggestion I like better the "opt out with smart defaults" one.
I had a stab at it: https://github.com/loic/django/compare/loic:ticket20625_copy
class MyQuerySet(models.QuerySet): # Private methods are ignored by default. def _foo(self): return ... # Public methods are copied by default. def foo(self): return ... # Opt in. def _bar(self): return ... _bar.manager = True # Opt out. def bar(self): return ... bar.manager = False class MyModel(Model): objects = MyQuerySet.as_manager()
comment:22 by , 12 years ago
Yes, I like the smart defaults idea more than opt-in.
The only ugly case in the as_manager() approach seems to be that if you want manager-only methods you will need to create a Manager class, and then use MyQuerySet.as_manager(base_cls=MyManager) (or alternatively subclass MyQuerySet.manager_cls()). This isn't as-pretty-as-possible. But, in general having methods on the qs even if they only make sense as manager methods isn't too bad. Django has multiple examples of this, qs.create() doesn't actually make sense, but the create on qs doesn't hurt anybody either.
I am sure that if one defines the same method on both the qs and the manager in complex enough ways you can break super(). This seems to be a case of "don't do that".
I think it is now the time to complete the patch.
comment:23 by , 12 years ago
@akaariai: wouldn't that address the issue?
class MyManager(Manager): def manager_only(): pass class MyQuerySet(QuerySet): manager_class = MyManager def manager_and_queryset_method(): pass class MyModel(Model): objects = MyQuerySet.as_manager()
So basically rename manager_cls()
to get_manager_class()
and have it use a cls.manager_class
(or cls.base_manager_class
) attribute.
If we go for that we should probably remove Manager.queryset_class
(edit: by remove I mean, rename to _queryset_class) since it would become an antipattern.
Are you planning to complete the patch or should I do it?
Since we are replacing all the legacy methods with this technique there is not much left to do in term of testing: basically just the copying logic.
comment:24 by , 12 years ago
Implementation of the idea above with accompanying (quick and dirty) test at https://github.com/loic/django/commit/2e129ae8aba862054373a3c5789493bc456d1ed7.
comment:25 by , 12 years ago
Seems good to me.
If you want a custom manager, but not queryset you can use subclass Manager approach. If you want custom queryset with methods added to the manager, you can use custom queryset + as_manager(). If you want both, you will need to have both Manager and QuerySet subclass, and use QuerySet.manager_class to tie them together. This feels clean and correct.
I think one area in need of testing is what happens when you define a custom method on both Manager and QuerySet class. I am sure that complex scenarios will break, but that is OK. This is mostly knowing what will fail, and if possible having informative error messages for the failing cases.
I am currently on holiday so I am not willing to spend too much time to work on Django, feel free to go for the final patch.
comment:26 by , 12 years ago
The only issue I can think of is that we would override the Manager
method when it is defined on both.
I've just addressed that in https://github.com/loic/django/commit/b1d6d2a0cbdb5ad7decc2cf6fc8b8dde960061cf.
I previously shielded Manager.all()
from this issue with manager=False
so it's no longer necessary. I believe Manager.all()
and RelatedManager.create()|get_or_create()
are in the situation you've described and it seems to work well.
I'll start working on an RFC patch, hopefully I'll get pretty close by tonight.
comment:27 by , 12 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
PR https://github.com/django/django/pull/1328
Docs probably need more love but that gives us something to work with.
I had to change the implementation of get_manager_class()
because it relied on __dict__
which doesn't work with inheritance.
I tried using dir()
and the previous callable()
test and while it worked for most purposes it also produced a few false positive; for example base_manager_class
was picked up as a callable and made it into new_methods
.
In the end I use inspect.getmembers()
with a ismethod
predicate and the result has been very robust.
comment:28 by , 12 years ago
For the record, we lost support for the specialized querysets with the last implementation; I've added a new commit to the PR to address that.
There is also a couple of minor/cosmetic pending questions in the comments on the PR.
comment:29 by , 12 years ago
Cc: | added |
---|
comment:30 by , 12 years ago
A quick skim of the patch didn't reveal anything surprising. I don't see anything that needs to be addressed before commit. The minor issues mentioned in the PR reviews seem minor enough to be decided at commit time.
If somebody can review the docs part that would help a lot. I am not too good with documentation changes myself.
comment:32 by , 12 years ago
Squashed down to a single commit that applies on top of the latest master. Hopefully this is RFC.
comment:34 by , 12 years ago
Anyone would volunteer for a merge? I rebase the branch to master daily but it doesn't seem merging is on anyone's todo list.
comment:35 by , 12 years ago
I'll be back to work next week. I will try to merge this one then unless somebody beats me to it.
comment:36 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:39 by , 9 years ago
+* The custom
QuerySet
and its custom methods were lost after the first
+ call tovalues()
or
values_list()
.
This isn't fully fixed. Now the custom QuerySet
and its custom methods are lost after the second call to values()
for values_list()
.
This affects 1.8, but not master, as #24211 removed the ValuesQuerySets anyway.
Initial pull request created: https://github.com/django/django/pull/1284