Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#34016 closed Bug (fixed)

QuerySet.values_list() crash on simple ArrayAgg.

Reported by: Alex Kerkum Owned by: Alex Kerkum
Component: contrib.postgres Version: 4.1
Severity: Release blocker Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Alex Kerkum)

Using ArrayAgg in combination with values_list results in 'TypeError: Complex expressions require an alias'.

For example:

from django.contrib.postgres.aggregates import ArrayAgg
SampleUser.objects.values_list(ArrayAgg("post_id"))

File django/db/models/aggregates.py:98, in Aggregate.default_alias(self)
     96 if len(expressions) == 1 and hasattr(expressions[0], "name"):
     97     return "%s__%s" % (expressions[0].name, self.name.lower())
---> 98 raise TypeError("Complex expressions require an alias")

TypeError: Complex expressions require an alias

The expressions variable here seems to contain [F(post_id), OrderByList()] causing the len(expressions) check to fail. That's as far as I got.

To me this seems related to #33898 caused by a regression in https://github.com/django/django/commit/e06dc4571ea9fd5723c8029959b95808be9f8812

This still worked in 4.0.7.

Change History (13)

comment:1 by Alex Kerkum, 2 years ago

Description: modified (diff)

comment:2 by Alex Kerkum, 2 years ago

Description: modified (diff)

comment:3 by Mariusz Felisiak, 2 years ago

Cc: Simon Charette added
Severity: NormalRelease blocker
Summary: QuerySet.values_list crash when using ArrayAggQuerySet.values_list() crash on simple ArrayAgg.
Triage Stage: UnreviewedAccepted

Thanks for the report.

Regression in e06dc4571ea9fd5723c8029959b95808be9f8812.

comment:4 by Mariusz Felisiak, 2 years ago

We could skip an empty OrderByList 🤔:

  • django/contrib/postgres/aggregates/mixins.py

    diff --git a/django/contrib/postgres/aggregates/mixins.py b/django/contrib/postgres/aggregates/mixins.py
    index b2f4097b8f..2bf236474a 100644
    a b class OrderableAggMixin:  
    1414        return super().resolve_expression(*args, **kwargs)
    1515
    1616    def get_source_expressions(self):
     17        if not self.order_by.source_expressions:
     18            return super().get_source_expressions()
    1719        return super().get_source_expressions() + [self.order_by]
    1820
    1921    def set_source_expressions(self, exprs):
    20         *exprs, self.order_by = exprs
     22        if isinstance(exprs[-1], OrderByList):
     23            *exprs, self.order_by = exprs
    2124        return super().set_source_expressions(exprs)
    2225
    2326    def as_sql(self, compiler, connection):

Alex, would you like to prepare a patch? (regression test is required)

comment:5 by Alex Kerkum, 2 years ago

Thanks for the quick reply! Your proposed fix does indeed seem to fix the issue.

I would love to create a patch, but I have no experience with the django code.
I'll have a look at it this afternoon to see if I can come up with something.

comment:6 by Simon Charette, 2 years ago

This approach will work as long no OrderableAggMixin subclass use this get_source_expression entry removal trick as well which I believe is the case for our internal usages.

Somewhat related but the Window expression is an example of a composite that has some components that could be missing (partition_by, frame, order_by) and cannot use the len of the expressions to determine which ones were provided back (e.g. does 3 source expressions mean (source_expression, partition_by, frame) or (source_expression, frame, order_by)?).

It solves this problem by somewhat violating the get_source_expressions API by returning None placeholders which makes use some special casing in a few locations

But not in all of them!

All that so say that we should probably settle on what the signature of get_source_expressions is; list[Optional[Expression]] or list[Expression]?

If we go with list[Optional[Expression]] then we don't have to do any len based special casing when dealing with optional sub-expressions but it means that checks of the form len(source_expressions) are not adequate; they must exclude None values. I guess we could provide a property/method that excludes them if this is a common pattern.

If we go with list[Expression] then we should adjust Window other expressions violating this guarantee and drop special casing for None value. I guess Window.get_source_expressions() could do something along the lines of

def get_source_expressions(self):
   expressions = self.source_expression
   if self.partition_by:
      expressions.append(self.partition_by)
   ...
   return expressions

def set_source_expressions(self, expressions):
   self.source_expression = expressions.pop(0)
   if self.partition_by:
       self.partition_by = expressions.pop(0)
   ...

Given we've had these special cases in place for a while I think a better long term solution, and one to happen to maintain backward compatibility, would be to fully embrace the list[Optional[Expression]] approach by using None placeholders to address this problem. It doesn't have to be in this patch but at least agreeing on which signature we should support in main and going forward would be beneficial.

From having more experience playing with Python typing in the past years, this is the kind of issues mypy and friends could help us properly enforce. Maybe it's time for us to re-visit adding typing at least to some parts of Django?

comment:7 by Alex Kerkum, 2 years ago

I've made a PR with Mariusz' suggestions here: https://github.com/django/django/pull/16064

Is that sufficient, or does it need some more work?

comment:8 by Simon Charette, 2 years ago

Thanks for taking the time to submit a PR Alex, greatly appreciated. It should be enough assuming all tests are passing.

The long blob of text above was mostly about this regression being a sign of a larger problem and possible solutions to address it in follow up tickets.

comment:9 by Alex Kerkum, 2 years ago

Thanks Simon! As all tests are passing I've marked it as 'Ready for review'.

Fixing the larger problem in follow up tickets sounds good, although the implications of it are over my head :)

comment:10 by Mariusz Felisiak, 2 years ago

If we go with list[Optional[Expression]] then we don't have to do any len based special casing when dealing with optional sub-expressions but it means that checks of the form len(source_expressions) are not adequate; they must exclude None values. I guess we could provide a property/method that excludes them if this is a common pattern.

Thanks for checking. I'd choose list[Optional[Expression]] and None for empty optional expressions. However, Alex's PR is more suitable for backporting. We can standardized the behavior for optional sub-expressions in a separate cleanup targeted only to the main branch.

comment:11 by Mariusz Felisiak, 2 years ago

Has patch: set
Owner: set to Alex Kerkum
Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In f88fc72d:

Fixed #34016 -- Fixed QuerySet.values()/values_list() crash on ArrayAgg() and JSONBAgg().

Regression in e06dc4571ea9fd5723c8029959b95808be9f8812.

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 2d20386b:

[4.1.x] Fixed #34016 -- Fixed QuerySet.values()/values_list() crash on ArrayAgg() and JSONBAgg().

Regression in e06dc4571ea9fd5723c8029959b95808be9f8812.

Backport of f88fc72da4eb76f2d464edb4874ef6046f8a8658 from main

Note: See TracTickets for help on using tickets.
Back to Top