Opened 13 years ago
Closed 12 years ago
#17600 closed Bug (fixed)
Error in encapsulates filters (Q)
Reported by: | Pablo Martín | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | anssi.kaariainen@…, Gabe Jackson | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If I have the next code:
class A(models.Model): ..... class B(models.Model): a = models.ManyToManyField(A)
The next queries get differents results:
B.objects.exclude(a__in=[7]) from django.db.models import Q B.objects.exclude(Q(a__in=[7]))
Results:
The first query get all objects excluding the "b objects" with a=7. It's Ok
But the second query get all objects excluding the "b objects" with a=7 or a=None.
Is it an error?, Is it known?
I add a verbose example, execute the next code
from django.contrib.auth.models import User, Group u1 = User.objects.create(username='u1') u2 = User.objects.create(username='u2') u3 = User.objects.create(username='u3') g1 = Group.objects.create(name='g1') g2 = Group.objects.create(name='g2') u1.groups.add(g1) u2.groups.add(g2) print User.objects.exclude(groups__in=[g1.pk]) print User.objects.exclude(Q(groups__in=[g1.pk]))
I have seen the documentation and the tests and I didn't see any note or reference
Attachments (2)
Change History (31)
comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|---|
Type: | Uncategorized → Bug |
comment:2 by , 13 years ago
comment:3 by , 13 years ago
I research the problem. And these queries are different:
User.objects.exclude(groups__in=[g1.pk]) User.objects.exclude(Q(groups__in=[g1.pk]))
But, however these are equals:
User.objects.exclude(groups__in=[g1.pk]) User.objects.filter(~Q(groups__in=[g1.pk])) User.objects.exclude(~~Q(groups__in=[g1.pk]))
comment:4 by , 13 years ago
The error is very difficult to explain. But the problem is when the [browser:django/trunk/django/db/models/query.py#L637 Q node is negated automatically], because of an exclude, and this receives a Q node. Here is an example:
User.objects.exclude(Q(groups__in=[g1.pk]))
The problem is in the [browser:django/trunk/django/db/models/sql/query.py#L1252 code]. In this line {{ q_object.negated }} is False
but it should be True
. The parent object of {{ q_object }} is the one that is negated.
If you have something like this:
User.objects.exclude(~~Q(groups__in=[g1.pk]))
This works because the last Q node is negated.
comment:5 by , 13 years ago
Has patch: | set |
---|
follow-up: 8 comment:7 by , 13 years ago
Cc: | added |
---|
It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?). I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.
comment:8 by , 13 years ago
Replying to akaariai:
It seems wrong that you need to do the negate_last_node thing (a better name: negate_leaf_nodes?).
Of course, I didn't explain. My patch is a temporal solution to me, and something for you to understand better the problem, only this.
At reference, your "better name" is wrong. These nodes are not leaf. They have a children, but these are not instance of tree.Node. But it is true, I forgot a "s".
I think there are some deeper bugs to fix here. Mainly, it seems the query code works only if the leaf nodes of the tree are negated. It should work with any kind of boolean tree, or it should be documented and asserted in the code that only leaf-negated trees are allowed. In my opinion fixing the immediate issue is not the right thing to do here.
I think this too. I think that to fix this ticket we need someone core developer.
comment:9 by , 13 years ago
Replying to framos:
Have you brought this up in django-developers?
The monday I send a e-mail to django-developers.
comment:10 by , 13 years ago
Has patch: | unset |
---|
If your patch is not intended as a fix to the problem, I think it is better to remove the "has patch" mark -- and so I did. Feel free to correct me, of course.
I think the best step forward, short of a core developer getting involved, is to add a patch that just has tests to demonstrate the problem.
follow-up: 12 comment:11 by , 13 years ago
I think this one should rest until 1.4 is out. Fixing this before that seems unlikely. Or, if this is fixed, the fix must be something like what is done by pmartin.
I think the real fix here would include some refactoring of the WhereNode class (mainly, there is something weird going on with the .negate() and start_subtree() business). Then refactor sql/query.py add_q (it creates pretty stupid boolean trees on certain inputs), and finally check what broke and how to _then_ fix the issue in this ticket.
There are some related tickets already: #17025 is about where node refactoring. It maybe goes a bit too far, but I think it shows the potential of code cleanup and speed improvement available. Then there is #17000 which is about add_q refactoring.
comment:12 by , 13 years ago
Replying to akaariai:
I think the same. Thanks akaariai, your comment is perfect. Good work!
I sent this ticket to django-developers, to try that a core developer are interested. Subject: "A important bug in queryset"
comment:13 by , 12 years ago
I added some tests cases to demonstrate these issues. I ran in to the same issue today as I was looking to make a query to return Orders having OrderItems with a certain status only.
Perhaps somebody could check the test cases.
by , 12 years ago
Attachment: | exclude-tests-ticket17600.diff added |
---|
Test Cases for Q, ~Q and exclude bugs
comment:14 by , 12 years ago
Cc: | added |
---|
comment:15 by , 12 years ago
Has patch: | set |
---|---|
Triage Stage: | Design decision needed → Accepted |
I believe the code in this branch could fix this ticket: https://github.com/akaariai/django/tree/refactor_utils_tree
While the branch is named refactor_utils_tree it actually refactors query.qs.add_q() too. The branch contains some test cases, but it would be interesting to see how the branch works with your tests. If you are able to do the testing that would be valuable. If not, I will try to find time to do it myself.
Im marking this as accepted - to me it is obvious there is a bug here, and one which is fixable with reasonable amount of work.
comment:16 by , 12 years ago
I'm trying to check if the refactor_utils_tree solves this issue.
To me it seems that test_only_orders_with_all_items_having_status_1() is bogus - it should return both Order: 1 and Order: 2. I can't see where the restriction is done that _all_ items must have status 1?
The two tests doing assertEqual(qs1, qs2) do not work correctly - while the querysets do not implement __eq__
in a way that would be usable here, so the correct test is assertEqual(list(qs1), list(qs2)). If this is done, these tests pass.
Apart of that all tests pass, and I think the patch actually solves this issue. I am wondering if these tests should be added to the patch after the above corrections, and if modeltests/excludes is the right place. I might put them into regressiontests/queries, as that is what I usually run, and contains a lot of similar stuff already.
comment:17 by , 12 years ago
perfect. I agree with your comments on the test cases. they also don't need to be merged if they don't test anything not already in the queries test cases. One question still remains though, I was in fact trying to use the the exclude and ~Q together to enforce that all items must have status 1. This doesn't seem possible without raw sql? If it is, then we should definitely have 1) A test-case for it 2) Documentation. I'll write the Documentation if you are aware how this should work with the ORM.
comment:19 by , 12 years ago
Hmmh, seems like my interpretation of the .exclude(~Q(items__status=1))
was incorrect. It seems this should behave similarly to qs.exclude(items__in=Item.objects.filter(~Q(status=1)))
. That is, remove every Order having at least one item with different status than 1. And this again is equivalent to having all items status as 1 (including the case of no items at all).
The query the refactor_utils_tree generates for the exclude(~Q()) is this: WHERE NOT (NOT (id IN (SELECT order_id .. WHERE status = 1)), while the exclude(items__in)
generates WHERE NOT (id IN (SELECT order_id .. WHERE NOT status = 1 AND order_id IS NOT NULL)). Thus, the second NOT should be pushed down to the subquery.
I don't see this as a must fix issue for refactor_utils_tree, but I will check if this happens to be easy to fix. Another opinion if the interpretation of the nested exclude is correct would be welcome. These things tend to get somewhat hard to parse...
comment:20 by , 12 years ago
It seems this isn't easy to fix.
First, my previous comment is bogus - the items__in
query actually generates two subqueries, the generated queries are:
the items__in one: SELECT DISTINCT "excludes_order"."id" FROM "excludes_order" WHERE NOT ("excludes_order"."id" IN (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."id" IN (SELECT U0."id" FROM "excludes_orderitem" U0 WHERE NOT (U0."status" = 1 )) AND U1."order_id" IS NOT NULL))) ORDER BY "excludes_order"."id" ASC and the erroneous current one: SELECT DISTINCT "excludes_order"."id" FROM "excludes_order" WHERE NOT (NOT ("excludes_order"."id" IN (SELECT U1."order_id" FROM "excludes_orderitem" U1 WHERE (U1."status" = 1 AND U1."order_id" IS NOT NULL)))) ORDER BY "excludes_order"."id" ASC
It is somewhat hard to generate the correct query, as when the need for the subquery is seen, the NOT is already added to the tree. This will be easier to handle if the add_q logic is arranged in a way where the lookups are checked for multijoins before beginning to add them into the query, so that one can push the NOT into the subquery. Still, I don't believe this to be easy to fix even then, as the subquery generation needs added logic to support nested subqueries.
So, I think this test should be added to regressiontests/queries as an @expectedFailure.
follow-up: 22 comment:21 by , 12 years ago
I found a workaround to make the ORM generate the correct query, by negating all the Q
objects. This is my original (buggy) code
nearby = Q(route__isnull=True, stations__name=station.name) if station.place: nearby |= Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE)) self.exclude(nearby).distinct()
and this is the code after the change
nearby = ~~Q(route__isnull=True, stations__name=station.name) if station.place: nearby |= ~~Q(route__isnull=False, route__dwithin=(station.place, MAX_WALKING_DISTANCE)) self.exclude(nearby).distinct()
comment:22 by , 12 years ago
comment:23 by , 12 years ago
The https://github.com/akaariai/django/tree/refactor_utils_tree branch contains now the tests for this ticket. Everything works, except the double-subquery tests, which is added as an expected failure into the suite.
comment:25 by , 12 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I have rebased the https://github.com/akaariai/django/tree/refactor_utils_tree and done some additions, too. The biggest one is that we now have build_filter() instead of add_filter() which will create a WhereNode object ready to be added to the tree. By doing this the wherenode/havingnode additions suddenly happen naturally in the _add_q without the need to pass the target clause down to add_filter. I like this way.
There is also a new add_filter() which is just self.where.add(self.build_filter(filterexpr), AND) - that is, a little helper which is used in various places of the ORM.
comment:26 by , 12 years ago
#13198 is related (if not duplicate). Tests from that ticket should be added at least. I reopened that ticket so that adding the tests will not be forgotten.
comment:27 by , 12 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
The patch isn't good enough yet. Mainly, the in_negated flag for build_filter() doesn't work properly. It is used for two things, first for answering if the filter needs subquery, and in this case the answer is yes if the three contains negated=True at any point, and also for IS NULL checking, and in this case the answer comes from the current state of the negation, that is the IS NULL check is needed only if the path contains odd amount of NOTs, otherwise they will cancel each other.
In addition I have been playing back-and-forth with add_filter() and build_filter(). The build_filter() naming isn't good, it alters the query's joins but the name suggests it just builds the filter condition. So, for that reason too this patch needs still improvement.
comment:28 by , 12 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
I have added another version of the patch, available from https://github.com/akaariai/django/tree/refactor_utils_tree_2.
The add_q -> _add_q + all the code related to doing HAVING splits is still ugly... But it is also working more correctly than current code, and fixing at least 5 tickets.
The underlying problem is that the logic *is* somewhat ugly. The having splits must be done for any subtree starting with OR and containing an aggregate somewhere in the subtree. Also, the negated status of the tree must be appended to the subtree. Still, the references to the aggregates can be in F() expressions, and these nest, too...
The patches approach is to split the having parts before adding the filters. The current approach is to build the having and the where trees in parallel, but that doesn't work nicely. There is still an approach where the having parts are split out in compiler stage, and that might work nicely. Other ideas are welcome...
So, I think I will just commit the code and then see if there is some way to make the changes still cleaner.
comment:29 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Seems like Trac didn't pick up the commit. So, this was fixed in d3f00bd5706b35961390d3814dd7e322ead3a9a3.
The code is still somewhat ugly (the Q() split to HAVING and WHERE parts in particular). I am playing around with the idea of doing the WHERE/HAVING split afterwards, at compile time. This might be cleaner.
Obviously it is a nosense use Q when you only have one lookup. It's possible that the example would be more clarifier if you change the last query for this:
This should return :
But return: