Opened 17 months ago

Closed 17 months ago

Last modified 17 months ago

#34728 closed Bug (duplicate)

OR operator on queryset does not work as expected

Reported by: Kbleser Owned by: nobody
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: queryset OR
Cc: rene@…, Kbleser, Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I try to combine two querysets using the | operator. Even though one of the querysets isn't empty, the resulting queryset is empty.

Example:
similar_recipes = at_least_four_shared_ingredients | same_tag_and_at_least_two_shared_ingredients

same_tag_and_at_least_two_shared_ingredients contains entries but similar_recipes doesn't.

You can reproduce the problem here: https://github.com/Inglorious-Inge/Kochbuch2/tree/queryset_problem
The test in recipes/tests/test_models.py is failing.
The code under test is in recipes/models.py line 30-45.

Change History (15)

comment:1 by René Fleschenberg, 17 months ago

As you can see on the main branch of the repository linked above by the reporter, the query works as expected if it is rewritten to use Q objects:

        return Recipe.objects.annotate(
            shared_ingredients=Count("ingredients")
        ).filter(has_shared_ingredients).filter(
            Q(same_tag, shared_ingredients__gte=2)
            | Q(shared_ingredients__gte=4)
        ).exclude(id=self.id)

Thanks to CodenameTim on the Django Discord for helping to debug this!

comment:2 by Kbleser, 17 months ago

Cc: Kbleser added

comment:3 by Natalia Bidart, 17 months ago

Resolution: invalid
Status: newclosed

Following the latest comment and after confirming the test from the repo is no longer failing, I'll mark this bug as invalid since it was really a support request.

comment:4 by René Fleschenberg, 17 months ago

Resolution: invalid
Status: closednew

Natalia, I disagree. It is not a support request (the reporters problem was solved by using a different API), but a possible bug in the ORM. Combining two querysets with | where at least one of them is non-empty should not result in an empty queryset, IMO. If it can (when?), we should probably document this.

It also seems to me that the behaviour does not match the note in the docs about equivalency with Q objects?

If we can't agree here on whether this is a bug or not, should we start an ML discussion?

BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).

in reply to:  4 ; comment:5 by Mariusz Felisiak, 17 months ago

Replying to René Fleschenberg:

BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).

Queries may not be the same according to docs, check out #33319:

"| is not a commutative operation, as different (though equivalent) queries may be generated."

in reply to:  5 comment:6 by René Fleschenberg, 17 months ago

Replying to Mariusz Felisiak:

Replying to René Fleschenberg:

BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs. This is probably outside of the scope of this ticket, but I thought it's worth mentioning here (maybe we can fix both issues in one go).

Queries may not be the same according to docs, check out #33319:

"| is not a commutative operation, as different (though equivalent) queries may be generated."

I understand "equivalent" as "they will return the same set of results". Is that not what the docs mean here?

in reply to:  4 ; comment:7 by Mariusz Felisiak, 17 months ago

Replying to René Fleschenberg:

Natalia, I disagree. It is not a support request (the reporters problem was solved by using a different API), but a possible bug in the ORM. Combining two querysets with | where at least one of them is non-empty should not result in an empty queryset, IMO. If it can (when?), we should probably document this.

It also seems to me that the behaviour does not match the note in the docs about equivalency with Q objects?

Queryset that you proposed is not the same query that user used in a sample project, I'm not sure how the ORM could figure it out from:

Recipe.objects.filter(
    Q(ingredients__in=self.ingredients.all()
).annotate(
    shared_ingredients=Count("ingredients")
).filter(shared_ingredients__gte=4)

|

Recipe.objects.filter(
    Q(tags__in=self.tags.all()) & Q(ingredients__in=self.ingredients.all()
).annotate(
    shared_ingredients=Count("ingredients")
).filter(shared_ingredients__gte=2)

Docs only describe how the OR operator works by rule. The ORM cannot analyze user queries and rewrite them.

in reply to:  7 ; comment:8 by René Fleschenberg, 17 months ago

Replying to Mariusz Felisiak:

Docs only describe how the OR operator works by rule. The ORM cannot analyze user queries and rewrite them.

I understand that this is probably not easy (maybe not possible) for the ORM to solve. Maybe I am wrong here, but I think you would expect qs1 | qs2 to return the same set of results as set(qs1) | set(qs2)? Not sure what to do about it, though.

in reply to:  8 comment:9 by Mariusz Felisiak, 17 months ago

Replying to René Fleschenberg:

I understand that this is probably not easy (maybe not possible) for the ORM to solve. Maybe I am wrong here, but I think you would expect qs1 | qs2 to return the same set of results as set(qs1) | set(qs2)? Not sure what to do about it, though.

For this we'd have to use UNION instead of combining filters. Queryset from the ticket description is really complicated (from the structure point of view) with overlap filters, aggregations, filters on aggregated values, etc. My point is that you cannot describe it as Model.filter(x) | Model.filter(y) as it contains aggregations and filters on them. Docs don't precise what would happen when you use the OR operator on Model.filter(x1).annotate(aggregation).filter(x2) | Model.filter(y1).annotate(aggregation).filter(y2). If you want to merge results of two independent querysets you should use .union().

comment:10 by Natalia Bidart, 17 months ago

Resolution: needsinfo
Status: newclosed

I'm taking a look at this again. For the sake of clarity, René or the reporter, could you please paste here the test method (or any other needed diff) that produce a failure? I've cloned the repo and played a bit with the existing find_similar_recipes and I'm not being able to reproduce the issue. Thanks!

Regarding:

If we can't agree here on whether this is a bug or not, should we start an ML discussion?

I think this may be the best path forward!

BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs.

Do you have an example or link to the discussion at hand? Perhaps we could clarify in the docs what "equivalent" mean, in that, given that | is not commutative, it may be the case that the generated SQL return different results but still equivalent, no? :thinking:

in reply to:  10 ; comment:11 by René Fleschenberg, 17 months ago

Hi,

Replying to Natalia Bidart:

I'm taking a look at this again. For the sake of clarity, René or the reporter, could you please paste here the test method (or any other needed diff) that produce a failure? I've cloned the repo and played a bit with the existing find_similar_recipes and I'm not being able to reproduce the issue. Thanks!

I suspect you looked at the main branch, which doesn't show the problem :) Sorry, I should have explicitly mentioned the branch here. If you checkout the branch queryset_problem and run the tests, you should see that the test in recipes/tests/test_models.py is failing: https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/tests/test_models.py

(Not pasting the test setup here for the sake of brevity).

    def test_find_similar_recipes(self):
        similar_recipes = self.recipe.find_similar_recipes()
        self.assertEquals(len(similar_recipes), 1)

The reason is that in recipes/models.py on line 39 (https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/models.py#L39C4-L39C106), at_least_four_shared_ingredients | same_tag_and_at_least_two_shared_ingredients results in a queryset that turns out empty when evaluated. If you use a debugger to take a look at the two querysets being ORed together, you should see that same_tag_and_at_least_two_shared_ingredients by itself evaluates to a non-empty queryset.

BTW, on Discord, there has been another report about qs1 | qs2 and qs2 | qs1 not being equivalent, which they should be according to the docs.

Do you have an example or link to the discussion at hand? Perhaps we could clarify in the docs what "equivalent" mean, in that, given that | is not commutative, it may be the case that the generated SQL return different results but still equivalent, no? :thinking:

I really don't know what "equivalent" would mean in regards to SQL queries if they do not return the same set of rows :) I suspect we thought so far that equivalency would be guaranteed but it is actually not (if the report on Discord is correct). I'll try to find the discussion on Discord and open a separate ticket for this, since I think it is a separate different issue.

Thanks!

comment:12 by René Fleschenberg, 17 months ago

Just for the record, the Discord discussion is here: https://discord.com/channels/856567261900832808/857642132423704577/1126286062642266223

I'll try to contact isik-kaplan and ask them if they already filed a ticket, or if they can provide a reproducible example.

in reply to:  11 comment:13 by Natalia Bidart, 17 months ago

Cc: Simon Charette added

Replying to René Fleschenberg:

I suspect you looked at the main branch, which doesn't show the problem :) Sorry, I should have explicitly mentioned the branch here. If you checkout the branch queryset_problem and run the tests, you should see that the test in recipes/tests/test_models.py is failing: https://github.com/Inglorious-Inge/Kochbuch2/blob/queryset_problem/recipes/tests/test_models.py

Excellent! I was able to reproduce the failure as reported. What I also noticed is that without the annotation, the OR behaves as expected (though I understand the goal of the annotation in this case).

I've been trying to debug further by re-writting the queries in find_similar_recipes, and then checking results and the generated SQL. I can't detect anything obviously wrong so my next step would be to craft the SQL I would use if I'd be building this by hand, but I ran out of time today.

I'll CC Simon in this ticket, I believe they may have a clearer view of what's expected/supported and what's not. Personally, I do find the outcome a bit unexpected, so I share the feeling, but the experts will know more.

comment:14 by Simon Charette, 17 months ago

TL;DR not an issue with queryset combination but another ticket that relates to how the ORM deals with aggregation when more than one multi-valued relationship is involved. Either invalid or duplicate to me.

---

As you can see on the main branch of the repository linked above by the reporter, the query works as expected if it is rewritten to use Q objects

This happens to work because you perform the aggregation before filtering the multi-valued relationship which prevents JOIN reuse as documented and results in you joining twice against against the same multi-valued relationship as well as another one (tags)

SELECT "recipes_recipe"."id",
       "recipes_recipe"."title",
       "recipes_recipe"."date_posted",
       "recipes_recipe"."created_by_id",
       "recipes_recipe"."image",
       "recipes_recipe"."preparation_time_in_minutes",
       "recipes_recipe"."instructions",
       "recipes_recipe"."level",
       "recipes_recipe"."serving_size",
       COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS "shared_ingredients"
FROM "recipes_recipe"
LEFT OUTER JOIN "recipes_ingredienttorecipe" ON ("recipes_recipe"."id" = "recipes_ingredienttorecipe"."recipe_id_id")
INNER JOIN "recipes_ingredienttorecipe" T4 ON ("recipes_recipe"."id" = T4."recipe_id_id")
LEFT OUTER JOIN "recipes_tagtorecipe" ON ("recipes_recipe"."id" = "recipes_tagtorecipe"."recipe_id_id")
WHERE (T4."ingredient_id_id" IN
         (SELECT U0."id"
          FROM "recipes_ingredient" U0
          INNER JOIN "recipes_ingredienttorecipe" U1 ON (U0."id" = U1."ingredient_id_id")
          WHERE U1."recipe_id_id" = 1)
       AND NOT ("recipes_recipe"."id" = 1))
GROUP BY "recipes_recipe"."id",
         "recipes_recipe"."title",
         "recipes_recipe"."date_posted",
         "recipes_recipe"."created_by_id",
         "recipes_recipe"."image",
         "recipes_recipe"."preparation_time_in_minutes",
         "recipes_recipe"."instructions",
         "recipes_recipe"."level",
         "recipes_recipe"."serving_size",
         "recipes_tagtorecipe"."tag_id_id"
HAVING (("recipes_tagtorecipe"."tag_id_id" IN
           (SELECT U0."id"
            FROM "recipes_tag" U0
            INNER JOIN "recipes_tagtorecipe" U1 ON (U0."id" = U1."tag_id_id")
            WHERE U1."recipe_id_id" = 1)
         AND COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 2)
        OR COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4);

Notice how recipes_ingredienttorecipe is joined twice which means COUNT("recipes_ingredienttorecipe"."ingredient_id_id") AS "shared_ingredients" will actually be the COUNT of the product of rows 2X2=4 (see #10060). In other words, aggregate annotations when many multi-valued relationships are involved are broken and have been for while and in your case it's the COUNT("recipes_ingredienttorecipe"."ingredient_id_id") >= 4) that is allowing your test to pass on main. The fact this problems manifests itself in queryset combination when filtering against broken aggregate annotation is just a symptom of a much larger problem.

If you define one of your aggregate annotations as a subquery to avoid joining more than one multi-valued relationship at once (#28296) things properly work

def find_similar_recipes(self):
    return (
        Recipe.objects.annotate(
            shared_ingredients_cnt=Count(
                "ingredients",
                filter=Q(ingredients__in=self.ingredients.all()),
            ),
            # Using a subquery to target the _other_ multi-valued relationship avoids X product of rows
            # with ingredients
            shares_tag=Exists(
                TagToRecipe.objects.filter(
                    tag_id__in=self.tags.all(),
                    recipe_id=OuterRef("pk"),
                )
            ),
        )
        .filter(
            Q(shared_ingredients_cnt__gte=4)
            | Q(shares_tag=True, shared_ingredients_cnt__gte=2)
        )
        .exclude(id=self.id)
    )

Which can also use queryset combination without issues

def find_similar_recipes(self):
    base = Recipe.objects.annotate(
        shared_ingredients_cnt=Count(
            "ingredients",
            filter=Q(ingredients__in=self.ingredients.all()),
        )
    ).exclude(id=self.id)
    same_tag_and_at_least_two_shared_ingredients = base.annotate(
        shares_tag=Exists(
            TagToRecipe.objects.filter(
                tag_id__in=self.tags.all(),
                recipe_id=OuterRef("pk"),
            )
        )
    ).filter(shares_tag=True, shared_ingredients_cnt__gte=2)
    at_least_four_shared_ingredients = base.filter(shared_ingredients_cnt__gte=4)
    return (
        same_tag_and_at_least_two_shared_ingredients | at_least_four_shared_ingredients
    )
Last edited 17 months ago by Simon Charette (previous) (diff)

comment:15 by Natalia Bidart, 17 months ago

Resolution: needsinfoduplicate

Thank you Simon for the detailed and clear explanation. I think this is more accurately closed as duplicate of #10060.

Note: See TracTickets for help on using tickets.
Back to Top