Opened 17 years ago

Closed 6 years ago

#6785 closed Cleanup/optimization (fixed)

QuerySet.get() should only attempt to fetch a limited number of rows

Reported by: Brantley Owned by: Amir Hadi
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: timograham@…, Shai Berger Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

.get() selects and lists every record in the db matching the given filter, when it only needs to select two at most.

Attachments (1)

query-get-patch.diff (1.8 KB ) - added by Brantley 17 years ago.
QuerySet.get() nit-pick / optimize

Download all attachments as: .zip

Change History (20)

by Brantley, 17 years ago

Attachment: query-get-patch.diff added

QuerySet.get() nit-pick / optimize

comment:1 by Brantley, 17 years ago

Found a small bug that slipped through the tests, so I fixed it and also updated the tests to test .get() raising MultipleObjectsReturned.

comment:2 by Malcolm Tredinnick, 17 years ago

Resolution: wontfix
Status: newclosed

Not worth the extra overhead in this performance critical piece of code. In the case when you're calling get correctly, only a single result will be returned and we need that result. In the error case, it's not harmful that we're selecting a few extra results. Any code that is relying on the error case for speed is broken by design.

comment:3 by wim@…, 11 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

Strange though it may seem: would it be possible to explicitly state the number of objects returned and restrain that number to 1, 2, 3, 4, 5, more than 5? In my experience, the number of objects is helpful when debugging.

comment:4 by Tim Graham, 11 years ago

Cc: timograham@… added
Component: UncategorizedDatabase layer (models, ORM)
Keywords: qs-rf removed
Patch needs improvement: set
Summary: QuerySet.get(), nit-pick / optimizeQuerySet.get() should only attempt to fetch a limited number of rows
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: queryset-refactormaster

Seems like there's consensus to re-open this: https://groups.google.com/d/topic/django-developers/PkzS9Wv6hIU/discussion

Pull request (which needs improvement): https://github.com/django/django/pull/1139

comment:5 by Tim Graham, 11 years ago

Patch needs improvement: unset
Resolution: wontfix
Status: closednew

Updated pull request based on django-developers discussion: https://github.com/django/django/pull/1320

I'm not sure it needs docs or a mention in the release notes, but happy to add them if necessary.

comment:6 by Tim Graham <timograham@…>, 11 years ago

Resolution: fixed
Status: newclosed

In da79ccca1d34f427952cce4555e598a700adb8de:

Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows.

Thanks Patryk Zawadzki.

comment:7 by Shai Berger, 10 years ago

Cc: Shai Berger added
Resolution: fixed
Status: closednew

As noted in #23061 the current implementation is problematic on Oracle. It also does more work than necessary elsewhere: Slicing a query clones it, which is an expensive operation, and potentially makes the database work harder.

It is better to implement this as hinted by the ticket's summary -- by limiting the fetches, not the select.

comment:8 by Anssi Kääriäinen, 10 years ago

Limiting the fetches doesn't work that well on most core databases. If I recall correctly sqlite, postgresql and mysql all do transfer all of the rows of the query when accessing the results (for PostgreSQL I know this is the case). So, while limiting the fetches would result in less model instances being generated, the overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

in reply to:  8 comment:9 by Shai Berger, 10 years ago

Replying to akaariai:

Limiting the fetches doesn't work that well on most core databases. [...] [T]he overhead of generating all the rows in the database and transferring the results would still be there.

To get rid of clone() overhead we could just call query.set_limits() manually.

With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.

The intent of re-opening is exactly to optimize the successful case; the issue is that the current implementation imposes an overhead on all get() calls for better handling of the error case. Doing it with fetches alone will remove this overhead -- for the successful case, asking for 21 results or for all results would be the same. I am not sure about the added performance costs either, except that (as noted) the nested selects on Oracle cost us in functionality.

On a side note, if more than one row is returned from the database, no model instance at all needs to be created.

comment:10 by Anssi Kääriäinen, 10 years ago

Has patch: unset
Patch needs improvement: set

Yes, optimizing the successful case is important. The .get() method can be used in try-except workflows, but I don't know of any realistic use case where the try-except is interested in multiple objects returned case.

The question is how much overhead the LIMIT 21 clause adds for get() on databases that support it. If it doesn't add overhead, then keeping the current behavior on databases that support LIMIT seems OK to me. If it adds overhead, then lets just get rid of the LIMIT clause and close this ticket as wontfix. As said before we can get rid of the clone overhead easily, so that isn't a factor in deciding what to do here.

comment:11 by Jafula, 10 years ago

Any chance of revisiting this for 1.8? In Oracle, the get query SQL is wrapped in an additional select and is just noise when debugging (no known performance complaints in Production so far).

We have our own Oracle backend that inherits from the one that ships with Django that we have added Oracle DRCP (connection pooling) support to it.

So some sort of hook, parameter or over-ridable function that would let us turn this feature off in our own backend would be wonderful.

Michael

comment:12 by Tim Graham, 10 years ago

Patch needs improvement: unset

Of course, if someone writes a patch we'll look at it.

comment:13 by Anssi Kääriäinen, 10 years ago

I'm beginning to think committing this was a mistake. Correctly working code shouldn't face this issue, so why optimize it. Removing this from 1.8 is ok for me.

One possibility is to use the iterator() method, so we don't turn all the rows to model instances.

comment:14 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea:

Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

comment:15 by Tim Graham <timograham@…>, 10 years ago

In 7060ef71581c740bcc28ed405225537a411c36b5:

[1.8.x] Reverted "Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows."

This reverts commit da79ccca1d34f427952cce4555e598a700adb8de.

This optimized the unsuccessful case at the expense of the successful one.

Backport of 293fd5da5b8c7b79bd34ef793ab45c1bb8ac69ea from master

comment:16 by Tim Graham, 10 years ago

Resolution: fixed
Status: closednew

I've reverted the original patch.

comment:17 by Shai Berger, 6 years ago

So, a solution was added then reverted, because it did two things: Added significant overhead in the general case, and broke things on Oracle.

I believe we can try again now -- I think Oracle has added OFFSET/LIMIT support in later versions and the versions which lack it are no longer supported; and Anssi has suggested in comment:8 a way to avoid the overhead.

comment:18 by Mariusz Felisiak, 6 years ago

Has patch: set
Owner: changed from nobody to Amir Hadi
Status: newassigned

comment:19 by Mariusz Felisiak <felisiak.mariusz@…>, 6 years ago

Resolution: fixed
Status: assignedclosed

In 330638b8:

Fixed #6785 -- Made QuerySet.get() fetch a limited number of rows.

Co-authored-by: Tim Graham <timograham@…>
Co-authored-by: Patryk Zawadzki <patrys@…>
Co-Authored-By: Mariusz Felisiak <felisiak.mariusz@…>

Note: See TracTickets for help on using tickets.
Back to Top