#15361 closed Cleanup/optimization (fixed)
Document performance considerations when using Queryset.get()
Reported by: | Markus Bertheau | Owned by: | mmcnickle |
---|---|---|---|
Component: | Documentation | Version: | 1.2 |
Severity: | Normal | Keywords: | |
Cc: | timograham@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Currently QuerySet.get() counts the number of rows in the result set. For get()'s semantics though, what's important is only whether there were 0, 1 or more rows returned. Counting the whole number of rows is a potentially expensive operation compared to stopping after the first 2 rows. Therefore QuerySet.get() should use LIMIT 2 (and no ORDER BY)
Attachments (4)
Change History (23)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
Has patch: | set |
---|
comment:3 by , 14 years ago
I've done some benchmarks, using djangobench and the query_get benchmark (use my fork https://github.com/spookylukey/djangobench), and this patch actually slows it down for both SQLite and Postgres (I haven't tested MySQL). If you get different results, please let us know so that we can reproduce the performance increase - otherwise we obviously won't commit this.
comment:4 by , 14 years ago
I question the validity of that benchmark. With 10 rows what you measure is probably the overhead of the slicing in the Django ORM. The original argument for using LIMIT makes still perfect sense.
comment:5 by , 14 years ago
I'm not sure it does, the majority of get()
calls (in my experience) are on fields such as pk
or slug
, which have database level uniqueness constraints, and thus the database has no need to be told that there is a limit on the number of results. Indeed the only case where this should make a large difference is if you're doing a get()
from which the database will return a large number of results and then become a MultipleObjectsReturned
exception, a case which I contend is unlikely.
comment:6 by , 14 years ago
I concur. It was indeed in the context of a database design flaw that I noticed the unneeded row number counting of get()
.
However, I don't believe that the negative effects of this patch are so significant that it's worth sacrificing its positive effects, regardless of their low likelihood of occurring in well designed databases.
lukeplant, do the results of the other benchmark tests differ significantly?
comment:7 by , 14 years ago
Owner: | changed from | to
---|
The concern that this change may make the get() slower may be valid, but one thing is for sure, that benchmark is inadequate for anything but huge performance regressions. I'm able to produce "Significant" slower results by running 2 identical versions of django:
Running all benchmarks Control: Django 1.3 beta 1 (in django-control) Experiment: Django 1.3 beta 1 (in django-experiment) Running 'query_get' benchmark ... Min: 0.020000 -> 0.020000: no change Avg: 0.022400 -> 0.024800: 1.1071x slower Significant (t=-2.556039) Stddev: 0.00431 -> 0.00505: 1.1698x larger (N = 50)
The telling fact is that these "significant" results are transient, differing from run to run.
I'll write a benchmark that will properly stress the query.get() and see if:
a) There is a performance regression for pk (or other fields with unique constraints) lookups
b) There is a performance increase for non-indexed, large match lookups.
If (b) and not (a), it's a +1.
If (a) and (b), it's probably a -1, given the more usual case in (a)
If (a) and not (b), -1
comment:8 by , 14 years ago
Patch with the documentation change as decided in the django-dev post.
I've retained the tests from the old patch, as they increase the test coverage of query.get() and aren't really specific to this issue.
by , 14 years ago
Attachment: | 15361-2.diff added |
---|
Documentation as per discussion. Added tests for MultipleObjectsReturned.
comment:9 by , 14 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Keywords: | QuerySet get limit MultipleObjectsReturned removed |
milestone: | → 1.3 |
Summary: | QuerySet.get() should use LIMIT 2 → Document performance considerations when using Queryset.get() |
comment:10 by , 14 years ago
Changing title, component so it's more likely to be picked up for 1.3 docs.
comment:11 by , 14 years ago
Patch needs improvement: | set |
---|
The patch looks mostly good.
However, the last chunk in the optimization addition isn't quite right. While I understand how that query can potentially be slow, in the sample code that the slowness is trivialized by the fact that it's going to raise MultipleObjectsReturned
and (in the context of the example) result in an uncaught exception.
So, while the advice on unique, indexed columns is solid, I'd like to see the contrasting slowness illustrated in a way that would work in a real-world use case. Additionally, a little more explanation of *why* that query could be slow would add a lot for new users.
by , 14 years ago
Attachment: | 15361-2.2.diff added |
---|
comment:12 by , 14 years ago
Added a clearer explanation of why returning multiple objects from get() is a performance issue.
comment:13 by , 14 years ago
Patch needs improvement: | unset |
---|
comment:14 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Cleanup/optimization |
comment:15 by , 13 years ago
Easy pickings: | unset |
---|---|
Patch needs improvement: | set |
UI/UX: | unset |
I noticed that the tests compare an exception message to "get\(\) returned more than one Article -- it returned 2! Lookup parameters were {'pub_date__month': 7, 'pub_date__year': 2005}"
. Unless I missed something, the order of the elements in the dictionary is not guaranteed to be stable across platforms. I suggest relaxing this test, for instance testing only "get\(\) returned more than one Article -- it returned 2!"
.
Also, I struggled with the structure of the paragraph added to the docs: "Firstly", "also", "First of all", "Secondly"... I'm not a native speaker, but many users of Django aren't either :)
by , 12 years ago
Attachment: | 15361-3.diff added |
---|
comment:17 by , 12 years ago
Cc: | added |
---|---|
Patch needs improvement: | unset |
Cleaned up doc patch, relaxed tests per Aymeric's suggestion.
comment:18 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Sounds reasonable to me.