Opened 7 months ago

Closed 7 months ago

Last modified 7 months ago

#35568 closed Uncategorized (invalid)

Searching nested JSON values

Reported by: Vasu Nagendra Owned by: nobody
Component: Uncategorized Version: 5.0
Severity: Normal Keywords: QuerySet.extra
Cc: Vasu Nagendra Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

As requested in the documentation I am filing the use case for extra. The data in my database looks similar to the data here https://www.postgresql.org/docs/16/functions-json.html#FUNCTIONS-SQLJSON-PATH. It is in a column called data in my database in a table called alerts (model called Alert). If I have a key of track.segments.*.HR (as shown in the example), I can't quite use a Q operator. If the data was in the upper level, Q operators work just fine -- but nested arrays it doesn't work.

The way to do this is really just using a QuerySet extra keyword so using the example they've provided here that would translate to

Alert.objects.extra(where=[r"""data @? '$.track.segments[*].HR ? (@ > 130)'])

It is quite scary in this case if 130 OR path track.segments[*].HR was obtained from the user because there is no way to send those in a params tuple. Whenever params is used with the where clause, it adds a quote which JSONPath doesn't like.

I looked at KeyTransform as documented in django.db.models.fields.json. This can _somewhat_ be managed if there was only one nesting by messing with the key and using __contains. But when there are multiple levels of nesting like this one for example https://learn.microsoft.com/en-us/graph/api/security-alert-get?view=graph-rest-1.0&tabs=http#response-1 and I am trying to find '$.evidence[*].roles[*] ? (@ == "compromised")' there is no choice but to use extra.

I am happy to take this on if anyone has any ideas on how best to approach this. I have plenty of data to test and am familiar with the general organization of the Django project (have not contributed before).

Change History (2)

comment:1 by Simon Charette, 7 months ago

Resolution: invalid
Status: newclosed

Hello Vasu,

As referenced in the docs you can simply use RawSQL assuming you specifying an output_field=BooleanField()

Alert.objects.filter(
    RawSQL("data @? %s", ('$.track.segments[*].HR ? (@ > 130)',), output_field=BooleanField()) 
)

In the case of lookups though you can create your own to avoid having to use RawSQL in the first place

from django.db.models import JSONField, Lookup


class AnyJSONPath(Lookup):
    lookup_name = "anypath"

    def as_postgresql(self, compiler, connection):
        lhs_sql, lhs_params = self.process_lhs(compiler, connection)
        rhs_sql, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return f"{lhs_sql} @? {rhs_sql}", params

JSONField.register_lookup(AnyJSONPath)

And then

Alert.objects.filter(data__anypath='$.track.segments[*].HR ? (@ > 130)')

Are there any admonition that could have been made to the existing documentation that could have pointed you in the right direction without requiring the creation of ticket? Would you be interested in submitting them?

Last edited 7 months ago by Simon Charette (previous) (diff)

comment:2 by Vasu Nagendra, 7 months ago

Hi Simon -
Thanks for the pointer. My apologies, I didn't realize RawSQL could be written like that by specifying an output_field. I thought the only way to do this was extra and the documentation suggested I file a ticket with my specific use case for consideration during deprecation. I will play with this a bit and then submit changes to docs as you suggested. I think generally your example there is perfect to add to the docs -- I can do that though, you don't need to. I think the right place where it belongs is here https://docs.djangoproject.com/en/5.0/topics/db/queries/#containment-and-key-lookups. This is the first place I looked to understand what is possible with KT() expressions. If you have a better/different place, that's OK too - please suggest.

I'll test this out a bit with different use cases and add an example.

In my specific use case I have a few keys/values like this that I am trying to mess around with for the filter clause. For everything that doesn't have a * I was easily able to get Q objects working with multiple AND and OR conditions and it beautifully does the right thing. I want to try out your example above where I'd prefer to write something like this (simplified, since I have other conditions that the PostgreSQL JSONPath operator doesn't support directly like endswith)...

conditions = []
for k, v in search.items():
   if "*" in k:
      conditions.append(Q(**{f"data__{k}__anypath": v}))
   else:
      conditions.append(Q(**{f"data__{k)__exact": v}))
for condition in conditions:
   combined &= Q(condition)
Alert.objects.filter(combined)

I appreciate your patience and friendliness! Thank you!

Note: See TracTickets for help on using tickets.
Back to Top