Opened 11 months ago

Closed 11 months ago

Last modified 11 months ago

#35256 closed Uncategorized (duplicate)

Strange/duplicate join clauses on queries with filter and annotations

Reported by: Bálint Balina Owned by: nobody
Component: Database layer (models, ORM) Version: 4.1
Severity: Normal Keywords: annotate filter join
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The filtering/annotation on the same data behaves differently based on the order of operations.

See the below example models, code, and generate sql:

class PriceList(models.Model):
    id = models.IntegerField(...)
    customer = models.ForeignKey(...)

class PriceListItem(models.Model):
    id = models.IntegerField(...)
    price_list = models.ForeignKey("PriceList", models.CASCADE)
    product = models.ForeignKey("Product", models.PROTECT, related_name="price_list_items")
    price = models.DecimalField(...)

class Product(models.Model):
    id = models.IntegerField(...)
    sku = models.CharField(...)

# retrieve all products, with prices related to a certain customer. 
# The price should be null, if it is not specified for the customer, but the product should still be retrieved.

q1 = Product.objects.annotate(price=F("price_list_items__price")).filter(Q(price_list_items__price_list__isnull=True) | Q(price_list_items__price_list__customer_id=1)).values("id", "price")

# sql (WRONG): 
"""
SELECT "product"."id", t4."price" 
FROM "product" 
LEFT OUTER JOIN "price_list_item" ON ("product"."id" = "price_list_item"."product_id") 
LEFT OUTER JOIN "price_list" ON ("price_list_item"."price_list_id" = "price_list"."id") 
LEFT OUTER JOIN "price_list_item" t4 ON ("product"."id" = t4."product_id") 
LEFT OUTER JOIN "price_list" t5 ON (t4."price_list_id" = t5."id") 
WHERE (t4."price_list_id" IS NULL OR  t5."customer_id" = 1))
"""

q2 = Product.objects.filter(Q(price_list_items__price_list__isnull=True) | Q(price_list_items__price_list__customer_id=1)).annotate(price=F("price_list_items__price")).values("id", "price")

# sql: 
"""
SELECT "product"."id", price_list_item."price" 
FROM "product" 
LEFT OUTER JOIN "price_list_item" ON ("product"."id" = "price_list_item"."product_id") 
LEFT OUTER JOIN "price_list" ON ("price_list_item"."price_list_id" = "price_list"."id") 
WHERE (price_list_item."price_list_id" IS NULL OR  price_list."customer_id" = 1))
"""

I would expect the 2nd output from both queries. This caused some quite hard-to-detect bugs for us.

It works as expected for ForignKeys, e.g. the other direction:

q3 = PriceListItem.objects.annotate(sku=F("product__sku")).filter(product__sku__icontains="x").values("id", "sku")

# sql
"""
SELECT "price_list_item"."id", "product"."sku" AS "sku" FROM "price_list_item" INNER JOIN "product" ON ("price_list_item"."product_id" = "product"."id") WHERE (UPPER("product"."sku"::text) LIKE UPPER('%x%'))
"""

q4 = PriceListItem.objects.filter(product__sku__icontains="x").annotate(sku=F("product__sku")).values("id", "sku")

# sql
"""
SELECT "price_list_item"."id", "product"."sku" AS "sku" FROM "price_list_item" INNER JOIN "product" ON ("price_list_item"."product_id" = "product"."id") WHERE (UPPER("product"."sku"::text) LIKE UPPER('%x%'))
"""


To Summarize:
Filtering on a reverse relation does not reuse joins, if it was already declared in annotations, but adds more joins causing invalid queires because the extra joins do not have proper filtering.

The ORM generates this:

SELECT "product"."id", t4."price" 
FROM "product" 
LEFT OUTER JOIN "price_list_item" ON ("product"."id" = "price_list_item"."product_id") 
LEFT OUTER JOIN "price_list" ON ("price_list_item"."price_list_id" = "price_list"."id") 
LEFT OUTER JOIN "price_list_item" t4 ON ("product"."id" = t4."product_id") 
LEFT OUTER JOIN "price_list" t5 ON (t4."price_list_id" = t5."id") 
WHERE (t4."price_list_id" IS NULL OR  t5."customer_id" = 1))

Instead of this:

SELECT "product"."id", price_list_item."price" 
FROM "product" 
LEFT OUTER JOIN "price_list_item" ON ("product"."id" = "price_list_item"."product_id") 
LEFT OUTER JOIN "price_list" ON ("price_list_item"."price_list_id" = "price_list"."id") 
WHERE (price_list_item."price_list_id" IS NULL OR  price_list."customer_id" = 1))

Tested on django 4.1.7

Change History (1)

comment:1 by Simon Charette, 11 months ago

Resolution: duplicate
Status: newclosed

Pretty sure this is a duplicate of #15049 without the aggregation part.

The TL;DR is that filter won't reuse existing JOINs but annotate will. Changing it now is non trivial as there are legitimate reasons for not doing so which makes be believe we need an optin mechanism when doing either about JOIN reuse.

The reason why this happens is the following. When the annotation is made before it refers to the fullset of objects as filter has not been called yet and thus filter lack of JOIN reuse avoids changing what price refers to as it might have been referenced by other expressions since then. In the scenario the relationship is already filtered so by the point the annotation is made it can only refer to the filtered relationship. Whether or not this is intuitive is debatable but changing the ORM to do one over the other without option would be backward incompatible.

Last edited 11 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top