Opened 20 months ago

Closed 20 months ago

Last modified 20 months ago

#34552 closed New feature (needsinfo)

Delaying get_from_clause call as much as possible.

Reported by: Matthew Schinckel Owned by: Matthew Schinckel
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Simon Charette, Florian Apolloner Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've got a couple of projects that supplement ORM functionality. One in particular is [django-shared-property](https://github.com/schinckel/django-shared-property), which allows a field to be defined on the model, that will automatically be added as an annotation, and as such can be filtered/aggregated.

However, it's also "live" in the sense that changes made after loading from the database to dependencies of the property will be reflected.

For example:

class Ordering(models.Model):
    a = models.IntegerField()
    b = models.IntegerField()

class MyModel(models.Model):
    ord = models.ForeignKey(Ordering)
    
    @shared_property(models.F('ord__a') / models.F('ord__b'))
    def ordering(self):
        return self.ord.a / self.ord.b

In this case, you can filter/order by ordering, but also if you changed ord.a, then ordering would update accordingly, without having to save the object and refetch.

Anyway, long story short, this code works amazingly well without any changes to the Django code, except for one edge case: when a shared_property refers to another model (which on it's own works), but you then filter on this shared_property (which again, works) and then do a .count() or .exists().

MyModel.objects.filter(ordering__gte=10).count()

It turns out that because of the way the ORM compiler works, get_from_clause is called before any filters are applied. Which normally works okay, but in this case results in the table not being available.

A solution I came up with was to delay the evaluation of get_from_clause until after the where/having clauses have been processed, allowing them to flag tables that need to be referenced.

As such, I'd like to propose the change in https://github.com/django/django/pull/14683, which passes django tests (and also fixes the problem for me).

Change History (7)

comment:2 by Matthew Schinckel, 20 months ago

Owner: changed from nobody to Matthew Schinckel
Status: newassigned

comment:3 by Matthew Schinckel, 20 months ago

From Luke's links in the django-developers mailing list post, I wonder if we _should_ be building the references in the reverse order to the order-of-operations:

  • DISTINCT
  • SELECT
  • HAVING
  • GROUP BY
  • WHERE
  • FROM

(I'm not sure if that is a reasonable thing to do, or even necessary. Personally, I'd be super happy with just my single change being applied, but would also be happy to have a further discussion).

comment:4 by Mariusz Felisiak, 20 months ago

Cc: Simon Charette Florian Apolloner added

comment:5 by Simon Charette, 20 months ago

If you need these changes it's likely because you are performing expression resolving during the compilation phase which is known to cause subtle issues with regards to join reuse and compilation in general e.g. what if your newly referenced expressions trigger aggregation or filter against window functions, now you have an improper order_by

It feels like this feature should be implementable by using QuerySet.alias if the latter was implemented in a way that deferred any form of joining until the alias is first referenced. Am I right in the assessment that you implemented your expression resolving in as_sql because you wanted to avoid unnecessary JOINs and that if QuerySet.alias allowed you to stash ordering=models.F('ord__a') / models.F('ord__b') without joining into ord__ until ordering is referenced then you could implement the same feature by simply having shared_property augment the managers querysets with the alias you defined?

comment:6 by Mariusz Felisiak, 20 months ago

Resolution: needsinfo
Status: assignedclosed

Marking as "needsinfo" per Simon's comment.

comment:7 by Matthew Schinckel, 20 months ago

Hi Simon, thanks for your input.

It's been ages since I actually implemented it, and the resolving of the expression during the compilation phase was (IIRC) more a pragmatic thing: Field instances are not resolved (because they already belong to a model/query), instead they provide a "Col" instance to the query directly.

Normally, during a .annotate(x=expression), the expression is resolved at this point. However, because the query object does not exist at the point where we build the Col instance, I'm not able to force a join (if one does not exist) at that point.

In some sense part of this idea is to provide syntactic sugar for "always apply an annotation", but it's a bit more than that (in the sense that it also recalculates the value in python upon reference). As such I'm not exactly sure if being able to apply the alias to the queryset(s) would be sufficient.

Anyway, it's time to put my kids to bed; I'll try to have a bit more of a think about it later.

Note: See TracTickets for help on using tickets.
Back to Top