Opened 14 years ago

Closed 12 years ago

Last modified 12 years ago

#15361 closed Cleanup/optimization (fixed)

Document performance considerations when using Queryset.get()

Reported by: Markus Bertheau Owned by: mmcnickle
Component: Documentation Version: 1.2
Severity: Normal Keywords:
Cc: timograham@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently QuerySet.get() counts the number of rows in the result set. For get()'s semantics though, what's important is only whether there were 0, 1 or more rows returned. Counting the whole number of rows is a potentially expensive operation compared to stopping after the first 2 rows. Therefore QuerySet.get() should use LIMIT 2 (and no ORDER BY)

Attachments (4)

15361.diff (3.0 KB ) - added by mmcnickle 14 years ago.
Patch with test
15361-2.diff (4.2 KB ) - added by mmcnickle 14 years ago.
Documentation as per discussion. Added tests for MultipleObjectsReturned.
15361-2.2.diff (4.6 KB ) - added by mmcnickle 14 years ago.
15361-3.diff (3.8 KB ) - added by Tim Graham 12 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 by Russell Keith-Magee, 14 years ago

Triage Stage: UnreviewedAccepted

Sounds reasonable to me.

by mmcnickle, 14 years ago

Attachment: 15361.diff added

Patch with test

comment:2 by mmcnickle, 14 years ago

Has patch: set

comment:3 by Luke Plant, 14 years ago

I've done some benchmarks, using djangobench and the query_get benchmark (use my fork https://github.com/spookylukey/djangobench), and this patch actually slows it down for both SQLite and Postgres (I haven't tested MySQL). If you get different results, please let us know so that we can reproduce the performance increase - otherwise we obviously won't commit this.

comment:4 by Markus Bertheau, 14 years ago

I question the validity of that benchmark. With 10 rows what you measure is probably the overhead of the slicing in the Django ORM. The original argument for using LIMIT makes still perfect sense.

comment:5 by Alex Gaynor, 14 years ago

I'm not sure it does, the majority of get() calls (in my experience) are on fields such as pk or slug, which have database level uniqueness constraints, and thus the database has no need to be told that there is a limit on the number of results. Indeed the only case where this should make a large difference is if you're doing a get() from which the database will return a large number of results and then become a MultipleObjectsReturned exception, a case which I contend is unlikely.

comment:6 by Markus Bertheau, 14 years ago

I concur. It was indeed in the context of a database design flaw that I noticed the unneeded row number counting of get().

However, I don't believe that the negative effects of this patch are so significant that it's worth sacrificing its positive effects, regardless of their low likelihood of occurring in well designed databases.

lukeplant, do the results of the other benchmark tests differ significantly?

comment:7 by mmcnickle, 14 years ago

Owner: changed from nobody to mmcnickle

The concern that this change may make the get() slower may be valid, but one thing is for sure, that benchmark is inadequate for anything but huge performance regressions. I'm able to produce "Significant" slower results by running 2 identical versions of django:

Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)

Running 'query_get' benchmark ...
Min: 0.020000 -> 0.020000: no change
Avg: 0.022400 -> 0.024800: 1.1071x slower
Significant (t=-2.556039)
Stddev: 0.00431 -> 0.00505: 1.1698x larger (N = 50)

The telling fact is that these "significant" results are transient, differing from run to run.

I'll write a benchmark that will properly stress the query.get() and see if:

a) There is a performance regression for pk (or other fields with unique constraints) lookups
b) There is a performance increase for non-indexed, large match lookups.

If (b) and not (a), it's a +1.
If (a) and (b), it's probably a -1, given the more usual case in (a)
If (a) and not (b), -1

comment:8 by mmcnickle, 14 years ago

Patch with the documentation change as decided in the django-dev post.

I've retained the tests from the old patch, as they increase the test coverage of query.get() and aren't really specific to this issue.

by mmcnickle, 14 years ago

Attachment: 15361-2.diff added

Documentation as per discussion. Added tests for MultipleObjectsReturned.

comment:9 by mmcnickle, 14 years ago

Component: Database layer (models, ORM)Documentation
Keywords: QuerySet get limit MultipleObjectsReturned removed
milestone: 1.3
Summary: QuerySet.get() should use LIMIT 2Document performance considerations when using Queryset.get()

comment:10 by mmcnickle, 14 years ago

Changing title, component so it's more likely to be picked up for 1.3 docs.

comment:11 by Gabriel Hurley, 14 years ago

Patch needs improvement: set

The patch looks mostly good.

However, the last chunk in the optimization addition isn't quite right. While I understand how that query can potentially be slow, in the sample code that the slowness is trivialized by the fact that it's going to raise MultipleObjectsReturned and (in the context of the example) result in an uncaught exception.

So, while the advice on unique, indexed columns is solid, I'd like to see the contrasting slowness illustrated in a way that would work in a real-world use case. Additionally, a little more explanation of *why* that query could be slow would add a lot for new users.

by mmcnickle, 14 years ago

Attachment: 15361-2.2.diff added

comment:12 by mmcnickle, 14 years ago

Added a clearer explanation of why returning multiple objects from get() is a performance issue.

comment:13 by mmcnickle, 14 years ago

Patch needs improvement: unset

comment:14 by Łukasz Rekucki, 14 years ago

Severity: Normal
Type: Cleanup/optimization

comment:15 by Aymeric Augustin, 13 years ago

Easy pickings: unset
Patch needs improvement: set
UI/UX: unset

I noticed that the tests compare an exception message to "get\(\) returned more than one Article -- it returned 2! Lookup parameters were {'pub_date__month': 7, 'pub_date__year': 2005}". Unless I missed something, the order of the elements in the dictionary is not guaranteed to be stable across platforms. I suggest relaxing this test, for instance testing only "get\(\) returned more than one Article -- it returned 2!".

Also, I struggled with the structure of the paragraph added to the docs: "Firstly", "also", "First of all", "Secondly"... I'm not a native speaker, but many users of Django aren't either :)

comment:16 by Jacob, 13 years ago

milestone: 1.3

Milestone 1.3 deleted

by Tim Graham, 12 years ago

Attachment: 15361-3.diff added

comment:17 by Tim Graham, 12 years ago

Cc: timograham@… added
Patch needs improvement: unset

Cleaned up doc patch, relaxed tests per Aymeric's suggestion.

comment:18 by Tim Graham <timograham@…>, 12 years ago

Resolution: fixed
Status: newclosed

In feaf9f279a73d87549c17fc7fb36463f1c7367a1:

Fixed #15361 - Documented performance considerations for QuerySet.get()

Thanks mmcnickle for the patch.

comment:19 by Tim Graham <timograham@…>, 12 years ago

In ffc649df888ea35d0e6848bf53ee26917fccc27e:

[1.5.X] Fixed #15361 - Documented performance considerations for QuerySet.get()

Thanks mmcnickle for the patch.

Backport of feaf9f279a from master

Note: See TracTickets for help on using tickets.
Back to Top