Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#33992 closed Bug (fixed)

QuerySet.annotate() with subquery and aggregation crashes.

Reported by: Fernando Flores Villaça Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.1
Severity: Release blocker Keywords: database, orm, aggregation
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I updated one app from 4.0 to 4.1.1 and found a issue with one annotation using Count. I tested with SQLite and PostgreSQL, and both raised exception. The same app works with 4.0.7.

Exception with SQLite:

sub-select returns 13 columns - expected 1
Traceback (most recent call last):
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute
    return Database.Cursor.execute(self, query, params)
sqlite3.OperationalError: sub-select returns 13 columns - expected 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../.venv/lib/python3.10/site-packages/django/db/models/query.py", line 1225, in exists
    return self.query.has_results(using=self.db)
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/query.py", line 592, in has_results
    return compiler.has_results()
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1363, in has_results
    return bool(self.execute_sql(SINGLE))
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1395, in execute_sql
    cursor.execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 103, in execute
    return super().execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File ".../.venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/sqlite3/base.py", line 357, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: sub-select returns 13 columns - expected 1

Exception with Postgres:

subquery must return only one column
LINE 1: ...iked_by"."post_id") GROUP BY "network_post"."id", (SELECT U0...
                                                             ^
Traceback (most recent call last):
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.SyntaxError: subquery must return only one column
LINE 1: ...iked_by"."post_id") GROUP BY "network_post"."id", (SELECT U0...
                                                             ^

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../.venv/lib/python3.10/site-packages/django/db/models/query.py", line 1225, in exists
    return self.query.has_results(using=self.db)
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/query.py", line 592, in has_results
    return compiler.has_results()
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1363, in has_results
    return bool(self.execute_sql(SINGLE))
  File ".../.venv/lib/python3.10/site-packages/django/db/models/sql/compiler.py", line 1395, in execute_sql
    cursor.execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 103, in execute
    return super().execute(sql, params)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File ".../.venv/lib/python3.10/site-packages/django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File ".../.venv/lib/python3.10/site-packages/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: subquery must return only one column
LINE 1: ...iked_by"."post_id") GROUP BY "network_post"."id", (SELECT U0...
                                                             ^

The exception is raised by annotate(likes=Count("liked_by")) in method fetch_all_posts.

class PostManager(models.Manager):
    def request_data(self, request_user):
        liked_by_user = Value(False)
        is_following = Value(False)
        is_owner = Case(When(user__id=request_user.id, then=True), default=False)

        if request_user.is_authenticated:
            # Check if the user has liked the post in each row of the query
            liked_by_user = Exists(request_user.liked_posts.filter(id=OuterRef("id")))
            is_following = Exists(
                request_user.following.filter(id=OuterRef("user__id"))
            )

        return is_owner, liked_by_user, is_following

    def fetch_all_posts(self, request_user) -> QuerySet[Post]:
        is_owner, liked_by_user, is_following = self.request_data(request_user)

        return (
            self.select_related()
            .prefetch_related(
                Prefetch(
                    "comments",
                    queryset=Comment.objects.select_related().filter(reply=False),
                ),  # filter related "comments" inside the post QuerySet
            )
            .order_by("-publication_date")
            .annotate(is_following=is_following)
            .annotate(is_owner=is_owner)
            .annotate(likes=Count("liked_by"))  # Doesn't work on 4.1
            .annotate(liked_by_user=liked_by_user)
        )

    def fetch_following_posts(self, request_user: User) -> QuerySet[Post]:
        return self.fetch_all_posts(request_user).filter(
            user__in=request_user.following.all()
        )

Models

class User(AbstractUser):
    id: int
    posts: RelatedManager[Post]
    liked_posts: RelatedManager[Post]
    comments: RelatedManager[Comment]

    about = models.CharField(blank=True, max_length=255)
    photo = models.ImageField(
        blank=True,
        null=True,
        upload_to=upload_path,
        validators=[file_validator],
    )

    following = models.ManyToManyField(
        "self", related_name="followers", symmetrical=False
    )

    objects: CustomUserManager = CustomUserManager()

    # Related fields
    # posts = ManyToOne("Post", related_name="user")
    # liked_posts = ManyToMany("Post", related_name="liked_by")
    # comments = ManyToOne("Comment", related_name="user")

    def save(self, *args, **kwargs):
        """
        full_clean is not called automatically on save by Django
        """
        self.full_clean()
        super().save(*args, **kwargs)

    def __str__(self):
        return f"{self.username}"  # type: ignore


class Post(models.Model):
    id: int
    comments: RelatedManager[Comment]

    user_id: int
    user = models.ForeignKey(
        settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name="posts"
    )
    text = models.CharField(max_length=200)
    publication_date = models.DateTimeField(auto_now_add=True)
    edited = models.BooleanField(default=False)
    last_modified = models.DateTimeField(auto_now_add=True)
    liked_by = models.ManyToManyField(
        settings.AUTH_USER_MODEL, related_name="liked_posts", blank=True
    )

    # Related Fields
    # comments = ManyToOne("Comment", related_name="post")

    objects: PostManager = PostManager()

    class Meta:
        ordering = ["-publication_date"]

    def __str__(self):
        return f"{self.text}"


class Comment(models.Model):
    id: int
    replies: RelatedManager[Comment]

    post_id: int
    post = models.ForeignKey(Post, on_delete=models.CASCADE, related_name="comments")
    user_id: int
    user = models.ForeignKey(
        settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name="comments"
    )
    text = models.CharField(max_length=200)
    publication_date = models.DateTimeField(auto_now_add=True)
    reply = models.BooleanField(default=False)
    parent_comment_id: int
    parent_comment = models.ForeignKey(
        "self", on_delete=models.CASCADE, null=True, blank=True, related_name="replies"
    )

    class Meta:
        ordering = ["-publication_date"]

    def save(self, *args, **kwargs):
        self.full_clean()
        if self.parent_comment is not None:
            if self.parent_comment.post.id != self.post.id:
                raise ValidationError("Parent comment must be from the same post.")
            self.reply = True
        super().save(*args, **kwargs)

    def __str__(self):
        return f"{self.text} - reply: {self.reply}"  # type: ignore

Change History (7)

comment:1 by Mariusz Felisiak, 2 years ago

Cc: Simon Charette added
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Thanks for the report. Regression in e5a92d400acb4ca6a8e1375d1ab8121f2c7220be.

I will try to reproduce this issue with a simpler models definition.

comment:2 by Mariusz Felisiak, 2 years ago

Summary: Count subquery issueQuerySet.annotate() with subquery and aggregation crashes.

I was able to reproduce this issue with:

Post.objects.annotate(
    is_following=Exists(
        request_user.following.filter(id=OuterRef("user__id"))
    ),  
).annotate(likes=Count("liked_by"))

comment:3 by Simon Charette, 2 years ago

Has patch: set

comment:4 by Mariusz Felisiak, 2 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:5 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In e0f14d8:

[4.1.x] Fixed #33992 -- Fixed queryset crash when aggregating over a group containing Exists.

A more in-depth solution is likely to make sure that we always GROUP BY
selected annotations or revisit how we use Query.exists() in the Exists
expression but that requires extra work that isn't suitable for a
backport.

Regression in e5a92d400acb4ca6a8e1375d1ab8121f2c7220be.

Thanks Fernando Flores Villaça for the report.

Backport of 32536b1324e98768dd892980408a8c6b26c23fd9 from main

comment:6 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 32536b1:

Fixed #33992 -- Fixed queryset crash when aggregating over a group containing Exists.

A more in-depth solution is likely to make sure that we always GROUP BY
selected annotations or revisit how we use Query.exists() in the Exists
expression but that requires extra work that isn't suitable for a
backport.

Regression in e5a92d400acb4ca6a8e1375d1ab8121f2c7220be.

Thanks Fernando Flores Villaça for the report.

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 3d734c0:

Refs #33992 -- Refactored subquery grouping logic.

This required moving the combined queries slicing logic to the compiler
in order to allow Query.exists() to be called at expression resolving
time.

It allowed for Query.exists() to be called at Exists() initialization
time and thus ensured that get_group_by_cols() was operating on the
terminal representation of the query that only has a single column
selected.

Note: See TracTickets for help on using tickets.
Back to Top