Opened 3 years ago
Last modified 3 years ago
#32801 closed New feature
Support for non-aggregation subqueries in FROM clause — at Initial Version
Reported by: | Jameel A. | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This relates to https://code.djangoproject.com/ticket/24462 and is perhaps a duplicate, but the underlying issue is a bit different.
When using JSONB columns in Postgres, it is sometimes useful to query JSON array data as if they're rows. This can be accomplished with the jsob_array_elements
function. The problem with this function is that the resulting row cannot be filtered directly and must first be wrapped in a subquery.
In particular, I would like to be able to generate a query such as:
select foo_w_elems.id from ( select id, jsonb_array_elements(json_data->'some_array') as elem from foo as foo1 union select id, jsonb_array_elements(json_data->'other_array') as elem from foo as foo2 ) as foo_w_elems where (elem->>'bar_id')::int in ( select id from bar where expires_at >= CURRENT_TIMESTAMP )
Even without the union, I could not find a way to generate such a query with Django today.
It appears that Django doesn't support querying FROM a subquery. While AggregateQuery
gets us part of the way there, it doesn't support filtering or any other operation applied after a FROM
clause.
I have managed to hack together support with something similar to:
class SubqueryTable: join_type = None parent_alias = None def __init__(self, query, alias): self.subquery = query self.subquery_alias = alias def as_sql(self, compiler, connection): alias_str = f' {self.subquery_alias}' base_sql, params = self.subquery.as_sql(compiler, connection) return base_sql + alias_str, params def relabeled_clone(self, change_map): return self.__class__( self.subquery, change_map.get(self.subquery_alias, self.subquery_alias), ) class JsonbArrayElements(Func): function = 'jsonb_array_elements' class FooQuerySet(models.QuerySet): def has_expired_bar(self): obj = self._chain() # Expand each leg into a separate row foo_with_some_array = self.annotate( elem=JsonbArrayElements(KeyTransform('some_array', 'json_data')) ) foo_with_other_array = self.annotate( elem=JsonbArrayElements(KeyTransform('other_array', 'json_data')) ) foo_with_array_elems = foo_with_some_array.union(foo_with_other_array) # Convert the query into a subquery subquery = foo_with_array_elems.query subquery.subquery = True # Exclude foo with active bars # The use of "Ref" is a bit of a hack and an abuse of the API. # Unfortunately, I couldn't find a better way to reference # fields defined in the subquery. obj = obj.annotate( elem_bar_id=Cast( KeyTextTransform('bar_id', Ref('elem', subquery)), models.IntegerField() ), ) bars = Bar.objects.active() obj = obj.exclude(elem_bat_id__in=bars.values('pk')) # This is a hack to force Django to relabel a field's table name # to match the FROM clause table alias relabels = {t: 'subquery' for t in subquery.alias_map} relabels[None] = 'subquery' obj.query.change_aliases(relabels) # Override the FROM clause by aliasing to the subquery obj.query.alias_map['subquery'] = SubqueryTable(subquery, 'subquery') # Remove the annotation from the select clause to avoid # introducing a hidden extra column annotation_mask = obj.query.annotation_select_mask annotation_mask = annotation_mask.remove('elem_bar_id') if annotation_mask else [] obj.query.set_annotation_mask(annotation_mask) return obj.distinct()
While implementing this workaround, I found a few issues:
- I was unable to find a good way to have the query layer automatically resolve a reference to an annotation defined in a subquery without redefining it. While aggregated queries technically seem to have support for this, it appears they "lift" the annotation into the outer query instead of referring to the annotation within the subquery. This distinction is important for
jsonb_array_elements
. - Because of #1, I had to annotate the outer query and then find a way to trick the query later to use a field name that would resolve when executed against the DB.
- While the query layer seems to handle renaming aliases in subqueries for aggregate queries, I had to manually change the outer query table alias to properly refer to the subquery fields.
I realize that full support for subqueries is likely far more challenging than my workaround, but support would be appreciated as filtering subqueries in an outer query is sometimes necessary to make use of more advanced DB features.