Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#29692 closed Bug (fixed)

Incorrect removal of order_by clause created as multiline RawSQL

Reported by: Marcin Nowak Owned by: Can Sarıgöl
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Can Sarıgöl Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi.

The SQLCompiler is ripping off one of my "order by" clause, because he "thinks" the clause was already "seen" (in SQLCompiler.get_order_by()). I'm using expressions written as multiline RawSQLs, which are similar but not the same.

The bug is located in SQLCompiler.get_order_by(), somewhere around line computing part of SQL query without ordering:

without_ordering = self.ordering_parts.search(sql).group(1)

The sql variable contains multiline sql. As a result, the self.ordering_parts regular expression is returning just a line containing ASC or DESC words. This line is added to seen set, and because my raw queries have identical last lines, only the first clasue is returing from SQLCompiler.get_order_by().

As a quick/temporal fix I can suggest making sql variable clean of newline characters, like this:

sql_oneline = ' '.join(sql.split('\n'))
without_ordering = self.ordering_parts.search(sql_oneline).group(1)

Note: beware of unicode (Py2.x u'') and EOL dragons (\r).

Example of my query:

    return MyModel.objects.all().order_by(
        RawSQL('''
            case when status in ('accepted', 'verification')
                 then 2 else 1 end''', []).desc(),
        RawSQL('''
            case when status in ('accepted', 'verification')
                 then (accepted_datetime, preferred_datetime)
                 else null end''', []).asc(),
        RawSQL('''
            case when status not in ('accepted', 'verification')
                 then (accepted_datetime, preferred_datetime, created_at)
                 else null end''', []).desc())

The ordering_parts.search is returing accordingly:

  • ' then 2 else 1 end)'
  • ' else null end'
  • ' else null end'

Second RawSQL with a else null end part is removed from query.

The fun thing is that the issue can be solved by workaround by adding a space or any other char to the last line.

So in case of RawSQL I can just say, that current implementation of avoiding duplicates in order by clause works only for special/rare cases (or does not work in all cases).

The bug filed here is about wrong identification of duplicates (because it compares only last line of SQL passed to order by clause).

Hope my notes will help you fixing the issue. Sorry for my english.

Change History (14)

comment:1 by Tim Graham, 6 years ago

Type: UncategorizedBug

Is there a reason you can't use conditional expressions, e.g. something like:

MyModel.objects.annotate(
    custom_order=Case(
        When(...),
    )
).order_by('custom_order')

I'm thinking that would avoid fiddly ordering_parts regular expression.

If there's some shortcoming to that approach, it might be easier to address that. Allowing the ordering optimization stuff to handle arbitrary RawSQL may be difficult.

comment:2 by Marcin Nowak, 6 years ago

Is there a reason you can't use ​conditional expressions

No, but I didn't knew about the issue, and writing raw sqls is sometimes faster (not in this case ;)
I'm really happy having possibility to mix raw sqls with object queries. Next time I'll use expressions, for sure.

Allowing the ordering optimization stuff to handle arbitrary RawSQL may be difficult.

Personally I'd like to skip RawSQL clauses in the block which is responsible for finding duplicates. If someone is using raw sqls, he knows the best what he is doing, IMO. And it is quite strange if Django removes silently part of your SQL. This is very confusing. And please note that printing a Query instance was generating incomplete sql, but while checking Query.order_by manually, the return value was containing all clauses. I thought that just printing is affected, but our QA dept told me the truth ;)

I know there is no effective way to compare similarity of two raw clauses. This may be hard for expression objects, too, but you have a possibility to implement some __eq__ magic (instead of comparation of generated sqls). Unfortunately I don't know why duplicates detection was implemented, so it's hard to tell how to improve this part.

Version 0, edited 6 years ago by Marcin Nowak (next)

comment:3 by Tim Graham, 6 years ago

Triage Stage: UnreviewedAccepted

Patches welcome, I suppose.

comment:4 by Can Sarıgöl, 6 years ago

Cc: Can Sarıgöl added
Has patch: set

comment:5 by Tim Graham, 6 years ago

Needs tests: set

Is there a reason why you didn't add tests?

comment:6 by Can Sarıgöl, 6 years ago

I was waiting for confirmation, I've added a test. Is it enough?

comment:7 by Tim Graham, 6 years ago

Needs tests: unset
Patch needs improvement: set

comment:8 by Can Sarıgöl, 6 years ago

Patch needs improvement: unset

comment:9 by Carlton Gibson, 6 years ago

Patch needs improvement: set

Some additional test coverage needed.

comment:10 by Can Sarıgöl, 6 years ago

Patch needs improvement: unset

comment:11 by Can Sarıgöl, 6 years ago

Owner: changed from nobody to Can Sarıgöl
Status: newassigned

comment:12 by Mariusz Felisiak, 6 years ago

Patch needs improvement: set
Version: 1.11master

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 6 years ago

Resolution: fixed
Status: assignedclosed

In 567b9928:

Fixed #29692 -- Fixed removing ordering parts for multiline RawSQL expressions.

comment:14 by Mariusz Felisiak, 6 years ago

Patch needs improvement: unset
Note: See TracTickets for help on using tickets.
Back to Top