Opened 3 years ago
Closed 3 years ago
#33706 closed Cleanup/optimization (needsinfo)
Implement subquery annotations as (lateral) joins
Reported by: | Bálint Balina | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | annotate, performance, subquery, postgresql |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
The ORM annotations are a very good and flexible tool for dynamic data. However there is a huge performance loss because of the way they are implemented. See the following example:
Product.objects.annotate( on_stock=Subquery( Stock.objects.filter( product_id=OuterRef('id'), ).values('product_id').annotate( on_stock=Sum('quantity'), ).values('on_stock').order_by(), output_field=DecimalField(), ), outgoing=Subquery( OrderItem.objects.filter( product_id=OuterRef('id'), ).values('product_id').annotate( outgoing=Sum('quantity'), ).values('outgoing').order_by(), output_field=DecimalField(), ), available=F('on_stock') - F('outgoing'), ).filter( available__gt=0, ).values('id', 'on_stock', 'outgoing', 'available')
When aggregates are needed from multiple tables annotation can be used as subqueries. Annotations can also be filtered against and reused. However django implements this query with the following SQL:
SELECT "product"."id", (SELECT SUM(u0."quantity") AS "on_stock" FROM "stock" u0 WHERE ( u0."product_id" = ("product"."id")) GROUP BY u0."product_id") AS "on_stock", (SELECT SUM(u0."quantity") AS "outgoing" FROM "order_item" u0 WHERE u0."product_id" = ("product"."id") GROUP BY u0."product_id") AS "outgoing", ((SELECT SUM(u0."quantity") AS "on_stock" FROM "stock" u0 WHERE (u0."product_id" = ("product"."id")) GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS "outgoing" FROM "order_item" u0 WHERE u0."product_id" = ("product"."id") GROUP BY u0."product_id")) AS "available" FROM "product" WHERE (((SELECT SUM(u0."quantity") AS "on_stock" FROM "stock" u0 WHERE (u0."product_id" = ("product"."id")) GROUP BY u0."product_id") - (SELECT SUM(u0."quantity") AS "outgoing" FROM "order_item" u0 WHERE u0."product_id" = ("product"."id") GROUP BY u0."product_id")) > 0)
Both the on_stock and outgoing annotations are inlined 3 times and calculated 3 times by the database, which is 6 subquery instead of 2.
See the following PostgreSQL query adjusted by hand:
SELECT "product"."id", "on_stock", "outgoing", "on_stock" - "outgoing" as available FROM "product" , LATERAL (SELECT SUM(u0."quantity") AS "on_stock" FROM "stock" u0 WHERE (u0."product_id" = ("product"."id")) GROUP BY u0."product_id") AS "on_stock" , LATERAL (SELECT SUM(u0."quantity") AS "outgoing" FROM "order_item" u0 WHERE u0."product_id" = ("product"."id") GROUP BY u0."product_id") AS "outgoing" WHERE (("on_stock" - "outgoing") > 0)
The exact same subquery can be joined with the lateral keyword, producing the exact same results, but reusing calculations. The performance benefit is huge, it can be hundreds of milliseconds even for this simple query, even more so the more complicated and reused these calculations are.
I like the way annotate works, there are many small pieces to put together to get a complex result. I'm planning to look into implementing this behaviour, but wanted to check first about your opinions:
- do you think it is possible to patch the ORM to generate sql like above?
- do you see any pitfalls or caveats to this method of performing the query?
- if there are no pitfalls, or they can be resolved do you think that this could be the default way of annotating calculations in django? The examples are in postgresql, not sure about the other supported database engines compatibility.
Some links:
Change History (2)
comment:1 by , 3 years ago
Description: | modified (diff) |
---|
comment:2 by , 3 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Summary: | Implement annotations as (lateral) joins → Implement subquery annotations as (lateral) joins |
To answer your questions I believe it would be doable to adjust the SQL compilers to have subquery annotations to rely on lateral joins but I'm not sure doing it by default on all engines that support it will be beneficial given the amount of work necessary to achieve it.
Some database engine will optimize away computed expressions with the same identity if they are specified multiple times. The article you linked provides no proof that this is actually beneficial in anyway beyond SQL DRY-ness (no
EXPLAIN
output, no benchmarks) which isn't the ORM's main focus. The SO article is also barely related to this issue, sure it hacks into the compiler internals to use aLATERAL
join but it has nothing to do with subquery annotation handling.This ticket tracker is used to manage accepted and confirmed issues, I suggest you use the developer mailing list to request feedback on this feature request. I suggest providing proof about claimed performance benefits and details about implementation database engines that Django currently supports to gather positive traction for your proposal. Without them it's unlikely that work that specialize the way subquery annotation works and complexify the SQL compiler will be accepted. Searching this ticket tracker, mailing lists, and Github for keywords such as
subquery
andlateral
might also help you gather context on the subject.I'll close this issue in the mean time as further discussion and investigation is required before we can accept this work.