Opened 8 years ago
Last modified 7 years ago
#27303 new Bug
Selecting multiple admin list_filters across relations return results that don't match both filters
Reported by: | Yeago | Owned by: | nobody |
---|---|---|---|
Component: | contrib.admin | Version: | 1.10 |
Severity: | Normal | Keywords: | filterspec |
Cc: | rpkilby@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Problems arise when filtering across relations in the admin. The current method relies on FilterSpecs chaining filter() clauses. While these create explicit ANDs when filtering on fields on a single table (expected), they imply OR when a filter crosses a relation because each filter() clause creates a new join.
Consider a case where someone would want to filter across all People who have a Contact between dates X and Y that is on_phone (as opposed to in_person). In the current state of the admin, you can't arrive at this list because of the above subtlety with how chained filters() change behavior when spanning relations (which I would argue, this hypothetical user doesn't care about).
https://github.com/yeago/django-1/commit/bccabbfcf443f0fac511599ccfdf166bb1697b6b
I believe allowing FilterSpec objects to return Q-likes is a happy middle-ground that doesn't break current assumptions.
Change History (35)
comment:4 by , 8 years ago
Its the same as opening a shell and doing
AnyModel.objects.filter(any__join_field2='X').filter(any__join_field2='Y')
vs
AnyModel.objects.filter(field2='X').filter(field2='Y')
If you look at the sample project and attempt to filter Info objects of person X within a date Y, you can't.
comment:5 by , 8 years ago
Sorry, I'm still not understanding. After selecting "This month" and "person 2" in the admin filters for the info page, I see the query:
SELECT "app_info"."id", "app_info"."name" FROM "app_info" INNER JOIN "app_contact" ON ("app_info"."id" = "app_contact"."info_id") INNER JOIN "app_contact" T3 ON ("app_info"."id" = T3."info_id") WHERE ("app_contact"."date" < 2016-11-01 00:00:00 AND "app_contact"."date" >= 2016-10-01 00:00:00 AND T3."person_id" = 2)
It seems to give the expected results of infos that have contacts with person=2 AND date=this month. Please let me know my mistake.
comment:6 by , 8 years ago
Well the case I am attempting to satisfy here is when you want a list of infos by that person and this month (the person 2 contact in my project was last year).
comment:7 by , 8 years ago
It sounds like you want list filter's to work using OR then? I guess that could be a duplicate of #18736. It seems problematic to implement your proposal without a corresponding UI change as a user wouldn't have any way to know whether AND or OR are in use. What do you think?
comment:8 by , 8 years ago
I think that its likely we still are not understanding each other. In my case above, its not OR, its AND. The only difference is that the filters are applied to the same join versus creating additional joins for each filtered field. I think its probably much more likely that ordinary users would expect things to behave as I am suggesting rather than how they are. I don't think anyone would want to filter on a list of hypothetical Info objects, and then filter them by user, and then filter them by date, and still be confronted by objects that do not reflect querying of user AND date.
as you saw in your example, you ended up with one object. This is due to the difference between
.filter(contact__date=X).filter(contact__user=Y)
and
.filter(contact__date=X, contact__user=Y)
comment:9 by , 8 years ago
I'm trying to use AND and OR as they work in SQL:
The AND operator displays a record if both the first condition AND the second condition are true.
The OR operator displays a record if either the first condition OR the second condition is true.
It seems like your usage of AND is the SQL OR. Rather than the second list filter criteria narrowing down the results of the first list filter criteria, you want to see objects that match all of the list filters that are selected. Is that correct?
comment:10 by , 8 years ago
I'm not really sure how you arrive at my usage being OR. In the case I am trying to describe, I still want all conditions to be true. But I want the application of those conditions to happen on one single join instead of separate joins. That is still SQL AND.
The proof is in the pudding, here is a link to the line number in my sample patch where I am applying a SQL AND https://github.com/yeago/django-1/commit/bccabbfcf443f0fac511599ccfdf166bb1697b6b#diff-ea7d8c684e252f3dad6aa458df7d3070R329
Hopefully with this we can set aside discussion of OR vs AND.
comment:11 by , 8 years ago
(Although while we are on that subject I could see many ways forward for features related to OR filtering if FilterSpecs used Q objects instead of chained filters)
comment:12 by , 8 years ago
I didn't see the fixtures in your sample project until a few comments ago. Using that data, could you give sample selected filters and the expected vs. actual results?
follow-up: 26 comment:13 by , 8 years ago
I think I understand the point you're trying to make Steve. Do the docs here represent the different behaviours you're trying to put across? https://docs.djangoproject.com/en/1.10/topics/db/queries/#spanning-multi-valued-relationships
Those docs primarily refer to multi-value relationships which I'm not certain is the use case you're trying to resolve. But to summarise:
.filter(multi__value__lookup=1, multi__value__other=2)
is equivalent to (MVL=1 AND MVO=2)
because the WHERE clause targets a single join.
SELECT * FROM model JOIN multi ON model.multi_id = multi.id JOIN value on multi.value_id = value.id WHERE value.MVL=1 AND value.MVO = 2
Where .filter(multi__value__lookup=1).filter(multi__value__other=2)
is logically equivalent to (MVL=1 OR MVO=2)
because MVL and MVO have separate joins, so they can both match independently since it's a multivalue relationship.
SELECT * FROM model JOIN multi ON model.multi_id = multi.id JOIN value on multi.value_id = value.id JOIN multi m2 ON model.multi_id = m2.id JOIN value v2 on multi.value_id = v2.id WHERE value.MVL=1 AND v2.MVO = 2
Does this correctly summarise the behaviour?
comment:14 by , 8 years ago
Josh: Yes.
Tim: there's only 3 records and I would argue that filtering for date of last year and staff B would more intuitively return no results, however, you can't ever get there.
comment:15 by , 8 years ago
In that case, I tentatively agree that targeting a single join is *usually* what you want. But having some way to make this decision is even better. I admit I don't really follow the patch that well, but some test cases might make your change more clear. Are you able to provide test cases that allow users to target a single join or chain to multiple? By users, I really mean admin developers.
Or does your change implement targeting a single join, only, allowing no customisation? If so, even though it's arguably a better design, it's a backwards incompatibility that someone is certainly relying on.
comment:16 by , 8 years ago
I kept my patch intentionally simple and didn't try to solve any of those other cases, but if there is some current wishlist or related tickets I can consider them in a patch and move forward. But I think a greater design decision should be made about this issue, though, because regardless of the past I don't think that leaving the logical OR is consistent or even intentionally used or realized by most implementors. The filter().filter() issue is something that trips up even more experienced Djangonauts.
Also, where would these tests go? I do see a tests module in the admin but I don't see anything related to these sorts of things in it.
comment:17 by , 8 years ago
Sorry, I had some trouble understanding the issue -- I guess the sample I was looking at didn't demonstrate the issue at first.
I'm not sure about the best way to resolve it. The proposed ModelAdmin.filter_q_behavior
attribute seems to expose internals of Django's ORM too much. Maybe trying to frame the solution as allowing a way for the admin user or the ModelAdmin
author to choose between AND and OR in the filter criteria as proposed in #18736 would be simpler for everyone to understand.
Tests might be in tests/admin_filters/
or some of the other admin_*
test packages.
comment:18 by , 8 years ago
The problem with an approach like that is that this is only superficially like an OR, but it's still an AND.
Can you explain what you mean a little more by exposing internals?
comment:19 by , 8 years ago
I think all the documented ModelAdmin
attributes are relatively straight forward. Q
is a Django ORM thing and if I didn't follow this ticket, I don't think it would be at all obvious or intuitive what filter_q_behavior
might mean or what effect it might have (maybe the idea is fine and only a better name is needed). Maybe you'd like to try the DevelopersMailingList to get some other input.
comment:20 by , 8 years ago
Oh well yes you had mentioned you wanted this to be backwards compatible or choosable. If people agreed that the proposed behavior is preferable then that option could go away after a deprecation schedule. The issue gets a few elaborate notices in the docs so it stands to reason an API option on this subject would at least have some reference material to back it up.
As for that other ticket, this patch would provide a few hooks for people to apply their Qs in complex ways. I think that simply attaching an AND or OR to each would quickly become too simplistic for most authors.
comment:21 by , 8 years ago
I'm not sure if I agree with Josh's feeling that someone is relying on this. I'd lean toward calling the current behavior a bug and just fixing it, if possible. I guess using list filter's across multiple relations to the same model might not be very common which is why this didn't surface until now.
comment:22 by , 8 years ago
Cc: | added |
---|
I wonder if we could introduce a new filterspec API that relies on the ''sticky filter'' behavior of the ORM for this purpose instead:
list_filters = [ ('multi__value', StickyFilter([ 'lookup', ('other', SomeFilterSpect()), ])) ]
The StickyFilter
(or whatever name we find for this object) would delegate filtering and rendering to its underlying filters but make sure to mark the queryset
object passed to it as having sticky filters (aliases are reused when a queryset is sticky) before calling each of its filters queryset(request, sticky_queryset)
methods.
comment:23 by , 8 years ago
That's a pretty terrifying API. It strikes me as a lot of work on the author's part just to arrive at a behavior that people wanted anyway. At that point I would say we should consider creating a nojoin_filter().nojoin_filter() and chain away. I also think that if we were going to make filters nested we should get something more out of that purchase, such as the ability to express things such as
Q(a=1)|Q((b=2|b=3|b=4))
which I am not sure this approach buys us (whereas an endeavoring developer could try their luck with the apply_filters() hook provided in the patch and I think that they would get there).
comment:24 by , 8 years ago
Cc: | removed |
---|
comment:25 by , 8 years ago
Has patch: | unset |
---|---|
Summary: | Allow FilterSpecs to return Q-likes → Selecting multiple admin list_filters across relations return results that don't match both filters |
Triage Stage: | Unreviewed → Accepted |
Type: | New feature → Bug |
comment:26 by , 8 years ago
I filed a ticket #27388 for this same issue because Dango Rest Framework depended on the api of chaining filters as an AND. As Josh Smeaton pointed out chained filters are logically OR'd statements. I think this is so counter to intuition that it is reasonable to break backwards compatibility in an upcoming release. I think the current api is likely to cause more pain in the future than the pain that would be caused by breaking existing code. If a function gets passed a queryset it cannot assume that it can call filter and gather the results, because of the nature of OR, it will include more than desired if another filter had been previously applied. Perhaps most importantly, applying multiple joins for the purpose of ORing can lead to exponentially slow queries, because successive joins can have multiplicative factors on the number of rows. This was the original reason I filed the ticket, the successive joins resulted in a search of 1million rows for a table with originally several thousand.
comment:27 by , 8 years ago
Cc: | added |
---|
comment:28 by , 7 years ago
Its pretty disappointing that all discussion on a ticket with a patch stops simply because a long-time contributor with some vague alternative assertions about how to approach the problem can't handle feedback.
comment:29 by , 7 years ago
I'm not sure what you're hoping to accomplish with that comment. Its meaning is vague and doesn't seem constructive. I haven't seen any responses to my suggestion in comment:21 to call the current behavior a bug and fix it. Perhaps it's a way to move forward.
comment:31 by , 7 years ago
Actually it looks like Connor seconded treating the current behavior as a bug, so I think the next step would be to provide a patch for evaluating that approach.
comment:33 by , 7 years ago
I see PR 7345, did I miss something else? My understanding is that patch adds a new feature rather than corrects the existing behavior.
comment:34 by , 7 years ago
Yeah, with more robust approaches you get both bugs solved and new features. Perhaps you could elaborate more about what kind of patch you are hoping to see? One where filterspec objects filter upon objects and return them, but that somehow doesn't chain ORs? That is fundamentally not how chained querying works in django. If you're arguing that some kind of hackery on the filtered queryset to merge the joins is more preferable, then please say so. I don't really think you understand the parameters of this problem. If you could help escalate this issue to someone with some knowledge of the plumbing at play here I would very much appreciate it. Maybe they could chime in with something useful and we could push this very crucial and old problem (which I have been contending with for 9 years through maintained forks) forward.
comment:35 by , 7 years ago
Maybe the bugfix is to change the search behavior to use the filter_q_behavior
approach. Does it break any existing tests? Can you add a new test to show that it corrects the query for spanning relations?
You can try bumping the thread on the mailing list if you need more guidance. I'm employed to review patches, but besides that, Django is a volunteer-driven project, so there's no escalation process. Some of the people who worked on the ORM are no longer active contributors.
One note is that this method won't play well with the current "naked" approach to adding relation items to list_display. Perhaps the implicit filters (such as django.contrib.admin.filters.RelatedFieldListFilter) could return a Q instead of a chained filter().