#11287 closed Uncategorized (invalid)
Better performance of the Paginator list
Reported by: | QingFeng | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.0 |
Severity: | Normal | Keywords: | Paginator |
Cc: | qingfeng@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
old:
objects = Model1.objects.all()#-->Read all the data, poor performance p = Paginator(objects, 2)
NBPageList:
objs = NBPageList(Model1.objects) p = Paginator(objs, 5)
Attachments (3)
Change History (9)
by , 16 years ago
Attachment: | 10953.patch added |
---|
comment:1 by , 16 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I have no idea what this ticket is describing, but a) this won't be faster in any way, and b) Model.objects.all() does not pull in all records.
follow-up: 3 comment:2 by , 16 years ago
(Alex was faster than me. But I'll post this anyway in case it helps guide future submissions.)
The comment here:
objects = Model1.objects.all()#-->Read all the data, poor performance
is incorrect. Query sets are lazy, see:
http://docs.djangoproject.com/en/dev/topics/db/queries/#id3
and:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
Thus whatever is being proposed seems to be founded on an incorrect assumption.
For future reference, when proposing something it would be helpful to have more words describing the problem and solution. NBPageList
means nothing to me. What is the NB supposed to signify? Also, patches should be complete in a single file, not posted as multiple individual diffs. And if you are proposing a new public interface/class (which seems to be where this was going), it needs to have some documentation.
comment:3 by , 16 years ago
Replying to kmtracey:
is incorrect. Query sets are lazy, see:
http://docs.djangoproject.com/en/dev/topics/db/queries/#id3
and:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated
Thus whatever is being proposed seems to be founded on an incorrect assumption.
Thank you, I made a mistake:)
comment:4 by , 13 years ago
Component: | Core (Other) → Database layer (models, ORM) |
---|---|
Easy pickings: | unset |
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
But there is a real performance problem with the standard paginator.
For MVCC databases like postgres, performance is very poor for tables above a couple million lines when doing a count(*), because postgres doesn't stores enough information on indexes and will perform a seqscan. The default Paginator seems to call count(*), this can bring a standard server install to its knees.
My workaround os overriding _get_count to use something like:
SELECT reltuples FROM pg_class WHERE relname = 'myapp_mymodel';
May be the ORM should implement a "estimated_count" method for when you don't need very accurate numbers. I would volunteer to send a patch but I don't know how to perform a fast "estimated count" for all database engines supported.
comment:5 by , 13 years ago
The downside of estimated tuples is that it can return wildly wrong results (the PostgreSQL one mentioned above at least).
The preferred way for me would be to have a couple of different new options for the paginator (pony requests if you wish):
max_pages:
- This would be useful in limiting the maximum page to say 20, and if you have 25 objs per page, you would do "SELECT count(*) FROM (select 1 from real_query limit 500) tmp" instead of "SELECT count(*) FROM real_query". This should be more than fast enough assuming you have proper indexes in place.
Links-only:
- This would give you only "next page" and "previous page" links. The trick is you can use a "SELECT objs FROM somepage WHERE id > last_page_id ORDER BY id LIMIT 25" to fetch the next page. This is lightning-fast if you have the proper indexes. This requires the ordering is done on an unique condition.
Using the estimated_count() method would still leave you open to people just going to page 10000, hit F5 a couple of times, and watch the DB server burn.
This ticket is closed and should remain so. The above ideas might be worth investigation. But not in this ticket.
comment:6 by , 13 years ago
Just too add a little note. This estimation of total query count by using the tables statistics (reltuples in postgresql) won't work even if it would give back an a reasonable count all the time.
Since this will return the estimated amount of rows in a table not in query. It's very usual for a query backing the pager to have several WHERE clauses. eg: Blog.objects.filter(pub_date__lte=now, status="approved", site=site)
I really like the max pages idea, if page 20 still yields 25 results (or retrieve 26 so we know for sure there more then 25) we can just add another page number and go up one by one from there.
Paginator patch