Opened 17 years ago
Closed 6 years ago
#6785 closed Cleanup/optimization (fixed)
QuerySet.get() should only attempt to fetch a limited number of rows
Reported by: | Brantley | Owned by: | Amir Hadi |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | timograham@…, Shai Berger | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
.get() selects and lists every record in the db matching the given filter, when it only needs to select two at most.
Attachments (1)
Change History (20)
by , 17 years ago
Attachment: | query-get-patch.diff added |
---|
comment:1 by , 17 years ago
Found a small bug that slipped through the tests, so I fixed it and also updated the tests to test .get() raising MultipleObjectsReturned.
comment:2 by , 17 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Not worth the extra overhead in this performance critical piece of code. In the case when you're calling get correctly, only a single result will be returned and we need that result. In the error case, it's not harmful that we're selecting a few extra results. Any code that is relying on the error case for speed is broken by design.
comment:3 by , 12 years ago
Easy pickings: | unset |
---|---|
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
Strange though it may seem: would it be possible to explicitly state the number of objects returned and restrain that number to 1, 2, 3, 4, 5, more than 5? In my experience, the number of objects is helpful when debugging.
comment:4 by , 12 years ago
Cc: | added |
---|---|
Component: | Uncategorized → Database layer (models, ORM) |
Keywords: | qs-rf removed |
Patch needs improvement: | set |
Summary: | QuerySet.get(), nit-pick / optimize → QuerySet.get() should only attempt to fetch a limited number of rows |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | queryset-refactor → master |
Seems like there's consensus to re-open this: https://groups.google.com/d/topic/django-developers/PkzS9Wv6hIU/discussion
Pull request (which needs improvement): https://github.com/django/django/pull/1139
comment:5 by , 12 years ago
Patch needs improvement: | unset |
---|---|
Resolution: | wontfix |
Status: | closed → new |
Updated pull request based on django-developers discussion: https://github.com/django/django/pull/1320
I'm not sure it needs docs or a mention in the release notes, but happy to add them if necessary.
comment:6 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:7 by , 10 years ago
Cc: | added |
---|---|
Resolution: | fixed |
Status: | closed → new |
As noted in #23061 the current implementation is problematic on Oracle. It also does more work than necessary elsewhere: Slicing a query clones it, which is an expensive operation, and potentially makes the database work harder.
It is better to implement this as hinted by the ticket's summary -- by limiting the fetches, not the select.
follow-up: 9 comment:8 by , 10 years ago
Limiting the fetches doesn't work that well on most core databases. If I recall correctly sqlite, postgresql and mysql all do transfer all of the rows of the query when accessing the results (for PostgreSQL I know this is the case). So, while limiting the fetches would result in less model instances being generated, the overhead of generating all the rows in the database and transferring the results would still be there.
To get rid of clone() overhead we could just call query.set_limits() manually.
With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.
comment:9 by , 10 years ago
Replying to akaariai:
Limiting the fetches doesn't work that well on most core databases. [...] [T]he overhead of generating all the rows in the database and transferring the results would still be there.
To get rid of clone() overhead we could just call query.set_limits() manually.
With this all being said... I guess we are optimizing the wrong case here. Successful .get() (just one row returned) should be optimized, not the error case. I am not sure how much overhead LIMIT (or the nested selects on Oracle) cost.
The intent of re-opening is exactly to optimize the successful case; the issue is that the current implementation imposes an overhead on all get()
calls for better handling of the error case. Doing it with fetches alone will remove this overhead -- for the successful case, asking for 21 results or for all results would be the same. I am not sure about the added performance costs either, except that (as noted) the nested selects on Oracle cost us in functionality.
On a side note, if more than one row is returned from the database, no model instance at all needs to be created.
comment:10 by , 10 years ago
Has patch: | unset |
---|---|
Patch needs improvement: | set |
Yes, optimizing the successful case is important. The .get() method can be used in try-except workflows, but I don't know of any realistic use case where the try-except is interested in multiple objects returned case.
The question is how much overhead the LIMIT 21 clause adds for get() on databases that support it. If it doesn't add overhead, then keeping the current behavior on databases that support LIMIT seems OK to me. If it adds overhead, then lets just get rid of the LIMIT clause and close this ticket as wontfix. As said before we can get rid of the clone overhead easily, so that isn't a factor in deciding what to do here.
comment:11 by , 10 years ago
Any chance of revisiting this for 1.8? In Oracle, the get query SQL is wrapped in an additional select and is just noise when debugging (no known performance complaints in Production so far).
We have our own Oracle backend that inherits from the one that ships with Django that we have added Oracle DRCP (connection pooling) support to it.
So some sort of hook, parameter or over-ridable function that would let us turn this feature off in our own backend would be wonderful.
Michael
comment:12 by , 10 years ago
Patch needs improvement: | unset |
---|
Of course, if someone writes a patch we'll look at it.
comment:13 by , 10 years ago
I'm beginning to think committing this was a mistake. Correctly working code shouldn't face this issue, so why optimize it. Removing this from 1.8 is ok for me.
One possibility is to use the iterator() method, so we don't turn all the rows to model instances.
comment:14 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:16 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I've reverted the original patch.
comment:17 by , 6 years ago
So, a solution was added then reverted, because it did two things: Added significant overhead in the general case, and broke things on Oracle.
I believe we can try again now -- I think Oracle has added OFFSET/LIMIT
support in later versions and the versions which lack it are no longer supported; and Anssi has suggested in comment:8 a way to avoid the overhead.
QuerySet.get() nit-pick / optimize