#35833 closed Bug (invalid)
Annotation yielding an empty set causes entire QuerySet to become empty
Reported by: | Jacob Walls | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Devin Cox, Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With a nullable JSONField in my model:
class TileModel(models.Model): provisionaledits = JSONField(null=True)
If I then annotate with a set-returning function (e.g. JSONB_ARRAY_ELEMENTS on postgres) that happens to return an empty set, the entire QuerySet is zeroed out.
In [1]: from django.db.models import Func, F In [2]: from arches.app.models.models import TileModel In [3]: class JsonbArrayElements(Func): ...: arity = 1 ...: contains_subquery = True # Django 5.2: set_returning = True ...: function = "JSONB_ARRAY_ELEMENTS" ...: In [4]: TileModel.objects.count() Out[4]: 9 In [5]: TileModel.objects.annotate(empty_set=JsonbArrayElements(F("provisionaledits"))) Out[5]: <QuerySet []>
I distilled this from a more complicated failure. If it helps, here is a query that fails more helpfully when you actually have JSON in the queried field but still can't extract array elements because it's not an array:
In [14]: TileModel.objects.create(provisionaledits={}) Out[14]: <TileModel: TileModel object (6ee450f7-b516-4c61-b6e7-52e876b878fe)> In [15]: TileModel.objects.annotate(no_longer_empty=JsonbArrayElements(F("provisionaledits"))) Out[15]: DataError: cannot extract elements from an object
I guess I would expect an error in both cases instead of an empty queryset in the former. At the very least, I would only expect the annotation to yield some sort of empty value, not for the whole queryset to silently become empty.
Change History (6)
comment:1 by , 5 weeks ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 5 weeks ago
Thanks for the draft test. Looking again, it seems providing NULL
to this function is fine, and doesn't raise an error:
% python manage.py dbshell psql (16.1, server 15.4) Type "help" for help. my_project=# SELECT JSONB_ARRAY_ELEMENTS(NULL); jsonb_array_elements ---------------------- (0 rows)
So I guess my expectation is that I can use the annotation as written without interfering with my queryset as a whole. My use case was providing a path lookup that happened to not exist (returning NULL) rather than just querying the root of the field.
comment:3 by , 4 weeks ago
Cc: | added |
---|
I could not reproduce the difference of count
and len
locally with the provided test case when set_returning = True
is appropriately set; both result in returning no rows when dealing with empty sequence references. This is definitely a bug if we can reproduce the mismatch.
So I guess my expectation is that I can use the annotation as written without interfering with my queryset as a whole.
I'm not sure how much of a this is a bug over a misunderstanding of how JSONB_ARRAY_ELEMENTS
and set-returning functions actually works. AFAIK when using a set-returning function in a SELECT
clause (which is what annotate
does) Postgres performs a Cartesian product of the existing rows and the each values returned for the row. The Cartesian product of any set with the empty set is the empty set.
If you want to avoid the empty set product then you must avoid providing a value to JSONB_ARRAY_ELEMENTS
that results in one. In the case of NULL
this can be avoided using Coalesce
but you also have to deal with the empty array case. Something like the following is possibly what you're after?
class JsonbArrayLength(Func): arity = 1 ouput_field = models.IntegerField() function = "jsonb_array_length" empty_set_condition = Q(provisionaledits=None) | LesserThan(JsonbArrayLength("provisionaledits", 1)) empty_set_value = Value(["<empty>"], JSONField()) TileModel.objects.annotate( annotation=JsonbArrayElements( Case( When(empty_set_condition, then=empty_set_value), default="provisionaledits", ) ) )
comment:4 by , 4 weeks ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Thanks very much, Simon. This is user error, then. I guess rather than a Cartesian product I was expecting an implicit subquery so that anything that happened during annotate would not "filter" the QuerySet. I didn't intend to suggest there was a count/len mismatch; I was just trying to show that I had some number of model instances persisted in general, and I expected to see all of them no matter what happened in annotate()
. But your explanation helps me see why that's wrong.
Thanks for the draft query, too. While this ticket was being triaged, I ended up using an explicit subquery, along these lines:
# Start from an FK-related model ResourceInstance.objects.annotate( annotation=ArraySubquery( TileModel.objects.filter(resourceinstance_id=OuterRef("resourceinstanceid")) .annotate(as_array=JsonbArrayElements(F("more_complex_path_lookup")) .values("as_array") ) )
Then, in some queries where my json path lookup needed to be resilient against missing keys in I ended up wrapping the expression given to the JsonbArrayElements
in a Case
as you suggested.
I would have come back and closed the ticket if I could have put all these pieces together more quickly (sorry!), but I guess I was still bewitched by the assumption that whatever I did inside annotate()
wouldn't possibly filter down my queryset.
comment:5 by , 4 weeks ago
Triage Stage: | Accepted → Unreviewed |
---|
comment:6 by , 4 weeks ago
The way Postgres set-returning annotations work is definitely unintuitive if you are used to subqueries which forces you to have a single row and implicitly use OUTER
instead of INNER
semantics so I wouldn't be too harsh on yourself here.
What I mean is that the fact subqueries which are somewhat set-returning constructs behave in an OUTER JOIN LATERAL
manner
SELECT author.id, ( SELECT post.id FROM post WHERE post.author_id = author.id ORDER BY id LIMIT 1 ) first_post_id FROM author
SELECT author.id, first_post.id FROM author LEFT OUTER JOIN LATERAL ( SELECT post.* FROM post WHERE post.author_id = author.id ORDER BY id LIMIT 1 ) first_post ON true
id | first_post_id |
---|---|
1 | 1 |
2 | null |
While other set-returning functions behave in a INNER JOIN LATERAL
manner
SELECT author.id, jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0 to 0]')) first_post_id FROM author
SELECT author.id, first_post.id AS first_post_id FROM author INNER JOIN LATERAL ( SELECT * FROM jsonb_array_elements(jsonb_path_query_array(post_ids, '$[0 to 0]')) ) first_post(id) ON true
id | first_post_id |
---|---|
1 | 1 |
Definitely breaks the principle of least astonishment, at least it did for me!
Agree this looks like a bug, here's a rough testy thing in case folks find it useful
tests/annotations/models.py
default=dict, blank=True)tests/annotations/tests.py