Opened 7 years ago

Closed 7 years ago

Last modified 3 years ago

#29527 closed New feature (needsinfo)

Multi-column comparisons

Reported by: Ryan Hiebert Owned by: nobody
Component: Database layer (models, ORM) Version: 2.0
Severity: Normal Keywords: QuerySet.extra
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Multi-column comparisons

In order to implement keyset pagination, AKA the seek method, across multiple fields including ordering on non-unique fields, I'm wanting to write a PostgreSQL that compares multiple fields in an ordered fashion. To my knowledge this isn't possible with the current ORM syntax. My particular use-case further complicates this by also wanting a subquery in order to avoid serializing more than the unique key in the response. Here's a generic example where clause that mirrors my use-case:

(a,b,c,id) > (SELECT a,b,c,id FROM mytable WHERE id = ?)

What this does special is account for all of the columns when making the comparison. So it will only compare the id column if all of the a, b, and c columns are the same, to allow for exact indentification in an arbitrary ordering.

In the PostgreSQL docs this feature is called Row Constructor Comparison, and the documentation is here: https://www.postgresql.org/docs/9.6/static/functions-comparisons.html#ROW-WISE-COMPARISON.

Change History (5)

comment:1 by Ryan Hiebert, 7 years ago

With the help of a patient colleague (thanks Tim!) I've been able to accomplish what I needed to natively, without using .extra(). With one caveat, I needed to add the LHS as a calculated column using annotate, which is a bit of a kludge, which ends up returning annotated data that I didn't really need. For my case, that's not too big of a deal, but perhaps less than ideal.

It might be deserving of a way to remove annotations that don't need to be selected, or to allow filter expressions that allow a dynamic expression on both sides. I'm not sure if that deserves its own ticket, if it should be on this one, or if perhaps there's already a ticket open for that.

Here's what we came up with:

from django.db.models import Func, F, Subquery

def after(queryset, value):
    return (
        queryset
        .annotate(rank=Func(F('a'), F('b'), F('c'), F('id'), function='ROW'))
        .filter(rank__gt=Subquery(queryset.filter(id=value).values('a', 'b', 'c', 'id')))
        .order_by('a', 'b', 'c', 'id')
    )

One other thing to note is that this opts to use the explicit ROW function, rather than relying on that being the result of just using parenthesis, because parenthesis alone with a single value doesn't necessarily make a ROW.

Last edited 7 years ago by Ryan Hiebert (previous) (diff)

comment:2 by Tim Graham, 7 years ago

Component: UncategorizedDatabase layer (models, ORM)

I'm not sure what's to be done to close this ticket. Can you offer a patch?

comment:3 by Tim Graham, 7 years ago

Resolution: needsinfo
Status: newclosed

comment:4 by Mariusz Felisiak, 3 years ago

Type: UncategorizedNew feature

comment:5 by Simon Charette, 3 years ago

With one caveat, I needed to add the LHS as a calculated column using annotate, which is a bit of a kludge, which ends up returning annotated data that I didn't really need.

FWIW this can now be avoided in two ways

  1. Use QuerySet.alias instead of .annotate
  2. Pass lookups directly to filter
from django.db.models.lookups import GreaterThan
from django.db.models import Func, F, Subquery

class Row(Func):
    function = 'ROW'

def after(queryset, value):
    return (
        queryset
        .filter(GreaterThan(
            Row('a', 'b', 'c', 'id'),
            Subquery(queryset.filter(id=value).values('a', 'b', 'c', 'id'))
        ))
        .order_by('a', 'b', 'c', 'id')
    )
Note: See TracTickets for help on using tickets.
Back to Top