#29577 closed Cleanup/optimization (duplicate)
Generated SQL in PostGres for exists filter does Exists(...)=true. Having just Exists(...) without "true" is faster by nearly 50% in many of my queries
Reported by: | Michael Chiciak | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 2.0 |
Severity: | Normal | Keywords: | slow query |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I am trying to optimize my exists filters, since on large record sets I noticed some slowness. I have narrowed down part of it to the "= true" portion of the "Exists(...)" subquery. Removing the "= true" gives a noticeable performance boost
Also, it would be really nice to be able to just filter and not have a field inside the select portion of the query. When I use the Paginator class for pagination, it turns the count query into one big subquery and adds all my Exists filters as select fields and thus performs a Group BY on each of them, drastically slowing down my count query. Only having the Exists in the where clause would be very nice. Is there a way to do this already?
Sample django code
fits_vehicle = ProductFitment.objects.filter_vehicle(2008, 2013, 'Chevrolet', 'Corvette').filter(product=OuterRef('pk')).only("pk") queryset = queryset.annotate(fits_vehicle=Exists(fits_vehicle)).filter(fits_vehicle=True) # With exists = true 500ms SELECT COUNT(*) FROM "aces_pies_data_product" WHERE EXISTS(SELECT 1 FROM "aces_pies_data_productfitment" U0 INNER JOIN "django_vehiclefitment_vehicle" U1 ON (U0."vehicle_id" = U1."id") INNER JOIN "django_vehiclefitment_vehiclemake" U2 ON (U1."make_id" = U2."id") INNER JOIN "django_vehiclefitment_vehiclemodel" U3 ON (U1."model_id" = U3."id") WHERE (U2."name" = 'Chevrolet' AND U3."name" = 'Corvette' AND (2008 BETWEEN U0."start_year" AND U0."end_year" OR 2009 BETWEEN U0."start_year" AND U0."end_year" OR 2010 BETWEEN U0."start_year" AND U0."end_year" OR 2011 BETWEEN U0."start_year" AND U0."end_year" OR 2012 BETWEEN U0."start_year" AND U0."end_year" OR 2013 BETWEEN U0."start_year" AND U0."end_year") AND U0."product_id" = ("aces_pies_data_product"."id"))) = true #With no "true" 350MS SELECT COUNT(*) FROM "aces_pies_data_product" WHERE EXISTS(SELECT 1 FROM "aces_pies_data_productfitment" U0 INNER JOIN "django_vehiclefitment_vehicle" U1 ON (U0."vehicle_id" = U1."id") INNER JOIN "django_vehiclefitment_vehiclemake" U2 ON (U1."make_id" = U2."id") INNER JOIN "django_vehiclefitment_vehiclemodel" U3 ON (U1."model_id" = U3."id") WHERE (U2."name" = 'Chevrolet' AND U3."name" = 'Corvette' AND (2008 BETWEEN U0."start_year" AND U0."end_year" OR 2009 BETWEEN U0."start_year" AND U0."end_year" OR 2010 BETWEEN U0."start_year" AND U0."end_year" OR 2011 BETWEEN U0."start_year" AND U0."end_year" OR 2012 BETWEEN U0."start_year" AND U0."end_year" OR 2013 BETWEEN U0."start_year" AND U0."end_year") AND U0."product_id" = ("aces_pies_data_product"."id")))
Edit
Poked around some of the annotation source code and got a work around
To remove the annotations from my selects and thus drastically increase the count(*) query speed, I just did products.query.annotation_select.pop('fits_vehicle'). This correctly filters without actually selecting, big improvement in speed!
Change History (5)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Summary: | Generated SQL for exists filter does Exists(...)=true. Having just Exists(...) without "true" is faster by nearly 50% in many of my queries → Generated SQL in PostGres for exists filter does Exists(...)=true. Having just Exists(...) without "true" is faster by nearly 50% in many of my queries |
---|
comment:3 by , 6 years ago
Description: | modified (diff) |
---|
comment:4 by , 6 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Duplicate of #25367 which would allow the Exist()
expression to be directly passed to filter
instead of having to annotate it and filter against.
comment:5 by , 6 years ago
Here's a discussion about not including the =true
part on backends that support it that might interest you.
https://github.com/django/django/pull/8119#issuecomment-373844543
Did some playing around with the "only having columns as filters". If you use "values", it does take away the annotation which is what I want. But it would be nice if defer behaved the same way so I can still retain all the model fields naturally.