Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#32449 closed New feature (wontfix)

Allow specifying tables in RawSQL().

Reported by: João Carneiro Haas Owned by: nobody
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: annotate, count, RawSQL, select_related, JOIN
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We have in our application a RawSQL annotation which depends on a internal join to work:

# accounts/models.py

class Org(models.Model):
    name = models.TextField()

class Account(models.Model):
    org = models.ForeignKey(Org)

query = Account.objects.select_related("org").annotate(org_name=RawSQL("SELECT custom_name(accounts_org.name)"))

Well, sometimes Django tries to optimize the queries and removes unneded JOINs, such as when calling count:

query.count()
# SELECT COUNT(*) FROM (
#     SELECT "accounts_account"."id" AS Col1, (SELECT "accounts_org"."name") AS "org_name"
#     FROM "accounts_org"
#     GROUP BY "devices_device"."id", (SELECT "accounts_org"."name")
# )

Which results in an 'missing FROM-clause' error.

Since (I believe) it's impossible for Django to analyze the passed RawSQL query, I would like to request some way to specify which joins/select_related the annotated RawSQL depends on.

My suggestion would be to pass a list of table dependencies to the RawSQL:

raw = RawSQL("SELECT custom_name(accounts_org.name)", tables=["org"])

If the ticket gets 'confirmed' and an API gets specified I can try tackling this issue.

For now I'm using the following workaround, which forces a join:

query = Accounts.objects.annotate(org_id=F('org__id'), custom_name=...)

Obs.: Fixing this before https://code.djangoproject.com/ticket/28477 would result in queries with unused annotations (such as the ones above) to always perform the join. I believe it won't impact the development of this ticket, but who knows ¯\_(ツ)_/¯

Change History (5)

comment:1 by Mariusz Felisiak, 4 years ago

Resolution: wontfix
Status: newclosed
Type: UncategorizedNew feature

Thanks for this ticket, however RawSQL() is the last resort and I don't think we would like to add any extra API to it. You can do the same by creating a custom function (preferable):

class CustomName(Func):
    function = 'custom_name'
    output_field = CharField()

Account.objects.select_related('org').annotate(org_name=CustomName('org__name')).count()

or by using QuerySet.extra() (which also supports tables):

Account.objects.select_related('org').extra(select={'org_name': 'SELECT custom_name(accounts_org.name)'}).count()

or by specifying the full subquery in RawSQL():

Account.objects.select_related("org").annotate(org_name=RawSQL(
    'SELECT custom_name(accounts_org.name) '
    'FROM accounts_org '
    'WHERE accounts_account.org_id = ccounts_org.id
)).count()

I treated this as a request for a new API, because resolving #28477 will strip unused annotation, so crash is a duplicate of #28477.

comment:2 by Mariusz Felisiak, 4 years ago

Summary: Issues with RawSQL annotations dependencies and count()Allow specifying tables in RawSQL().

in reply to:  1 ; comment:3 by João Carneiro Haas, 4 years ago

Replying to Mariusz Felisiak:

You can do the same by creating a custom function

Ok, that actually solves the issue I presented, but I'm at fault here, the custom_name example was just the easiest way I found to explain the issue, we're actually doing a Postgres recursive query, which as far as I know there's no way to represent in a Django structure.

Here's a more truthful example on how we're using the RawSQL. It's not the exact query and models, but I believe if there's a way to solve for this our case would also be solved:

# graph/models.py

class Node(models.Model):
    pass

class Edge(models.Model):
    parent = models.ForeignKey(Node)
    child = models.ForeignKey(Node)

class A(models.Model):
    node = models.ForeignKey(Node)

class B(models.Model):
    a = models.ForeignKey(A)

# b_qs is a really complex 'B' queryset
b_qs = b_qs.objects.select_related('a').annotate(graph_child_count=RawSQL("""
    WITH RECURSIVE children(id) AS (
        SELECT "e"."child_id"
          FROM "graph_edge" AS "e"
         WHERE "e"."parent_id" = "graph_a"."node_id"
         UNION
        SELECT "e"."child_id"
          FROM "graph_edge" AS "e", "children"
         WHERE "e"."parent_id" = "children"."id"
    ) SELECT COUNT(*) FROM "graph_children"
""", [])).filter(graph_child_count__lt=10).count()

Now, regarding this:

RawSQL() is the last resort
using QuerySet.extra()

extra is the real last resort though. Every place in the documentation says it should be avoided, and it will be deprecated as soon as it's not needed anymore, and if needed, you should use a RawSQL with an annotate. Yes, I believe I could use extra to solve my issue, but I want to use a reccomended way to solve this issue.

So, to summarize, if there's a way I'm unaware of about encapsulating my query in a Django structure (such as in the Func custom_name example), yeah, I think this ticket is invalid. Otherwise, I really think there should be a way to run more specific queries while having the guarantee that stuff inside the final query will be there.

Edit: So, maybe the ticket should be changed to implement some kind of CustomQuery class which would allow me to define a query and dependent fields, or receive arguments with __ operators as in the Func example.

Last edited 4 years ago by João Carneiro Haas (previous) (diff)

in reply to:  3 ; comment:4 by Mariusz Felisiak, 4 years ago

Replying to João Carneiro Haas:

Ok, that actually solves the issue I presented, but I'm at fault here, the custom_name example was just the easiest way I found to explain the issue, we're actually doing a Postgres recursive query, which as far as I know there's no way to represent in a Django structure.

We have an accepted ticket for CTE, see #28919.

So, to summarize, if there's a way I'm unaware of about encapsulating my query in a Django structure (such as in the Func custom_name example), yeah, I think this ticket is invalid. Otherwise, I really think there should be a way to run more specific queries while having the guarantee that stuff inside the final query will be there.

There is no need for a new API, because resolving #28477 will strip unused annotation, so crash is a duplicate of #28477. You can also push everything to a subquery.

in reply to:  4 comment:5 by João Carneiro Haas, 4 years ago

Replying to Mariusz Felisiak:

We have an accepted ticket for CTE, see #28919.

Yeah, I believe this will end up solving my issue.

There is no need for a new API, because resolving #28477 will strip unused annotation, so crash is a duplicate of #28477

The crash will still occur even after #28477 is fixed, since you can (and probably will) use the annotation on other statements which will influence count:

b_qs = b_qs.objects.annotate(graph_child_count=RawSQL(...)).filter(graph_child_count__lt=10).count()

So yeah, the CTE ticket solves our specific issue, although I still believe this feature would be useful for other non-supported Django stuff which require a RawSQL.

Note: See TracTickets for help on using tickets.
Back to Top