Opened 5 weeks ago

Closed 4 weeks ago

Last modified 4 weeks ago

#35833 closed Bug (invalid)

Annotation yielding an empty set causes entire QuerySet to become empty

Reported by: Jacob Walls Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: Devin Cox, Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With a nullable JSONField in my model:

class TileModel(models.Model):
   provisionaledits = JSONField(null=True)

If I then annotate with a set-returning function (e.g. JSONB_ARRAY_ELEMENTS on postgres) that happens to return an empty set, the entire QuerySet is zeroed out.

In [1]: from django.db.models import Func, F

In [2]: from arches.app.models.models import TileModel

In [3]: class JsonbArrayElements(Func):
   ...:     arity = 1
   ...:     contains_subquery = True  # Django 5.2: set_returning = True
   ...:     function = "JSONB_ARRAY_ELEMENTS"
   ...: 

In [4]: TileModel.objects.count()
Out[4]: 9

In [5]: TileModel.objects.annotate(empty_set=JsonbArrayElements(F("provisionaledits")))
Out[5]: <QuerySet []>

I distilled this from a more complicated failure. If it helps, here is a query that fails more helpfully when you actually have JSON in the queried field but still can't extract array elements because it's not an array:

In [14]: TileModel.objects.create(provisionaledits={})
Out[14]: <TileModel: TileModel object (6ee450f7-b516-4c61-b6e7-52e876b878fe)>

In [15]: TileModel.objects.annotate(no_longer_empty=JsonbArrayElements(F("provisionaledits")))
Out[15]: DataError: cannot extract elements from an object

I guess I would expect an error in both cases instead of an empty queryset in the former. At the very least, I would only expect the annotation to yield some sort of empty value, not for the whole queryset to silently become empty.

Change History (6)

comment:1 by Sarah Boyce, 5 weeks ago

Cc: Devin Cox added
Triage Stage: UnreviewedAccepted

Agree this looks like a bug, here's a rough testy thing in case folks find it useful

  • tests/annotations/models.py

    a b class Ticket(models.Model):  
    6161
    6262
    6363class JsonModel(models.Model):
    64     data = models.JSONField(default=dict, blank=True)
     64    data = models.JSONField(null=True)
    6565    id = models.IntegerField(primary_key=True)
    6666
    6767    class Meta:
  • tests/annotations/tests.py

    diff --git a/tests/annotations/tests.py b/tests/annotations/tests.py
    index 29660a827e..31dfed6ca4 100644
    a b from decimal import Decimal  
    33from unittest import skipUnless
    44
    55from django.core.exceptions import FieldDoesNotExist, FieldError
    6 from django.db import connection
     6from django.db import connection, DataError
    77from django.db.models import (
    88    BooleanField,
    99    Case,
    class NonAggregateAnnotationTestCase(TestCase):  
    11881188
    11891189        self.assertEqual(qs.count(), len(qs))
    11901190
     1191    @skipUnless(connection.vendor == "postgresql", "PostgreSQL tests")
     1192    @skipUnlessDBFeature("supports_json_field")
     1193    def test_set_returning_functions(self):
     1194        class JsonbArrayElements(Func):
     1195            function = "JSONB_ARRAY_ELEMENTS"
     1196            arity = 1
     1197            set_returning = True
     1198
     1199        JsonModel.objects.create(id=1)
     1200        qs = JsonModel.objects.annotate(empty_set=JsonbArrayElements(F("data")))
     1201        self.assertEqual(qs.count(), 0)  # <-- expect to be a DataError
     1202
     1203        JsonModel.objects.create(id=2, data={})
     1204        with self.assertRaises(DataError):
     1205            JsonModel.objects.annotate(empty_set=JsonbArrayElements(F("data"))).count()
     1206
    11911207
    11921208class AliasTests(TestCase):

comment:2 by Jacob Walls, 4 weeks ago

Thanks for the draft test. Looking again, it seems providing NULL to this function is fine, and doesn't raise an error:

% python manage.py dbshell
psql (16.1, server 15.4)
Type "help" for help.

my_project=# SELECT JSONB_ARRAY_ELEMENTS(NULL);
 jsonb_array_elements 
----------------------
(0 rows)

So I guess my expectation is that I can use the annotation as written without interfering with my queryset as a whole. My use case was providing a path lookup that happened to not exist (returning NULL) rather than just querying the root of the field.

comment:3 by Simon Charette, 4 weeks ago

Cc: Simon Charette added

I could not reproduce the difference of count and len locally with the provided test case when set_returning = True is appropriately set; both result in returning no rows when dealing with empty sequence references. This is definitely a bug if we can reproduce the mismatch.

So I guess my expectation is that I can use the annotation as written without interfering with my queryset as a whole.

I'm not sure how much of a this is a bug over a misunderstanding of how JSONB_ARRAY_ELEMENTS and set-returning functions actually works. AFAIK when using a set-returning function in a SELECT clause (which is what annotate does) Postgres performs a Cartesian product of the existing rows and the each values returned for the row. The Cartesian product of any set with the empty set is the empty set.

If you want to avoid the empty set product then you must avoid providing a value to JSONB_ARRAY_ELEMENTS that results in one. In the case of NULL this can be avoided using Coalesce but you also have to deal with the empty array case. Something like the following is possibly what you're after?

class JsonbArrayLength(Func):
    arity = 1
    ouput_field = models.IntegerField()
    function = "jsonb_array_length"


empty_set_condition = Q(provisionaledits=None) | LesserThan(JsonbArrayLength("provisionaledits"), 1)
empty_set_value = Value(["<empty>"], JSONField())

TileModel.objects.annotate(
    annotation=JsonbArrayElements(
        Case(
            When(empty_set_condition, then=empty_set_value),
            default="provisionaledits",
        )
    )
)

So to summarize here I don't think that returning no rows when JSONB_ARRAY_ELEMENTS results in an empty set is a bug (it's just how set-returning annotations work) and I cannot reproduce the mismatch between count() and len against main when setting set_returning on the function used for annotation (it correctly prevents it from being elided when performing the aggregation).

Version 2, edited 4 weeks ago by Simon Charette (previous) (next) (diff)

comment:4 by Jacob Walls, 4 weeks ago

Resolution: invalid
Status: newclosed

Thanks very much, Simon. This is user error, then. I guess rather than a Cartesian product I was expecting an implicit subquery so that anything that happened during annotate would not "filter" the QuerySet. I didn't intend to suggest there was a count/len mismatch; I was just trying to show that I had some number of model instances persisted in general, and I expected to see all of them no matter what happened in annotate(). But your explanation helps me see why that's wrong.

Thanks for the draft query, too. While this ticket was being triaged, I ended up using an explicit subquery, along these lines:

# Start from an FK-related model
ResourceInstance.objects.annotate(
    annotation=ArraySubquery(
        TileModel.objects.filter(resourceinstance_id=OuterRef("resourceinstanceid"))
        .annotate(as_array=JsonbArrayElements(F("more_complex_path_lookup"))
        .values("as_array")
    )
)

Then, in some queries where my json path lookup needed to be resilient against missing keys in I ended up wrapping the expression given to the JsonbArrayElements in a Case as you suggested.

I would have come back and closed the ticket if I could have put all these pieces together more quickly (sorry!), but I guess I was still bewitched by the assumption that whatever I did inside annotate() wouldn't possibly filter down my queryset.

comment:5 by Natalia Bidart, 4 weeks ago

Triage Stage: AcceptedUnreviewed

comment:6 by Simon Charette, 4 weeks ago

The way Postgres set-returning annotations work is definitely unintuitive if you are used to subqueries which forces you to have a single row and implicitly use OUTER instead of INNER semantics so I wouldn't be too harsh on yourself here.

What I mean is that the fact subqueries which are somewhat set-returning constructs behave in an OUTER JOIN LATERAL manner

SELECT 
    author.id, (
    SELECT post.id
    FROM post
    WHERE post.author_id = author.id
    ORDER BY id LIMIT 1
) first_post_id
FROM author

Is equivalent to

SELECT
  author.id,
  first_post.id
FROM author
LEFT OUTER JOIN LATERAL (
  SELECT post.*
  FROM post
  WHERE post.author_id = author.id
  ORDER BY id
  LIMIT 1
) first_post ON true
id first_post_id
1 1
2 null

While other set-returning functions behave in a INNER JOIN LATERAL manner

SELECT
  author.id,
  jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0 to 0]')) first_post_id
FROM author

Is equivalent to

SELECT
  author.id,
  first_post.id AS first_post_id
FROM author
INNER JOIN LATERAL (
  SELECT * FROM jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0 to 0]'))
) first_post(id) ON true
id first_post_id
1 1

Definitely breaks the principle of least astonishment, at least it did for me!

Last edited 4 weeks ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top