Opened 16 years ago

Last modified 2 months ago

#8408 new New feature

Add a new meta option: don't do count(*) in admin

Reported by: LI Daobing Owned by: Thomas C
Component: contrib.admin Version: dev
Severity: Normal Keywords:
Cc: Thomas Kerpe, Ludovico Magnocavallo, marcoberi@…, djfische@…, boxm@…, kmike84@…, slav0nic0@…, dan.fairs@…, markus.magnuson@…, t.chaumeny@…, josh.smeaton@…, Ionel Cristian Mărieș, Ondřej Chmelař, elonzh, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Hello,

everytime when I use Admin to view a table, the Admin count the rows in the table. if the database is mysql+innodb, this step is very slow, how about add an option in Meta to prevent this, such as

class A(models.Model):
    class Meta:
        do_not_count = True

thanks

Attachments (1)

paginator-patch.diff (13.2 KB ) - added by Malcolm Box 14 years ago.
Patch admin classes to add UncountedPaginator

Download all attachments as: .zip

Change History (62)

comment:1 by Malcolm Tredinnick, 16 years ago

milestone: post-1.0
Triage Stage: UnreviewedDesign decision needed

I can't see that this is likely to happen. Firstly, it would be an admin option, not a meta option. Secondly, even with tens of thousands of rows in a test table here, the count takes far less than a second, so compared to the time for collecting the display data, rendering it and returning the page, it isn't that large a time slice.

How many rows are in this table you are seeing the problem with?

Right now I'm -1 on this as being such an edge case as to be not worth an option. If counting the rows in a table is too slow to be used, then there are going to be plenty of other options querying that table as well.

comment:2 by LI Daobing, 16 years ago

we have tens of millions rows in this table. and we want to use django Admin as a customizable viewer.

comment:3 by Adrian Holovaty, 16 years ago

Triage Stage: Design decision neededAccepted

I like this optional optimization for people who don't want the admin to do a "select count(*)" on every changelist view, but I'm not sure whether we could implement it without throwing out some valuable parts of the changelist (mostly, the pagination). So, I'm marking this as "accepted," with the caveat that we need to look at what parts of the changelist will be thrown out.

comment:4 by soroosh, 16 years ago

This can be done without losing any functionality by setting a limit for count like this:
SELECT count(*) from (select id from HUGE_TABLE LIMIT 10000) as b;
Then if the raws count is more than for example 10000, SQL server stops sequential scanning through the table and paginatior can list pages around current page but not the last page. On my system postgreql takes 60 seconds to return count of a table with 11,000,000 raws and this is suboptimal to wait a minute for each admin page to load.

comment:5 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:6 by mrts, 16 years ago

milestone: 1.2

Having problems with large datasets in admin myself (not sure if count() is the only reason though), proposing this for 1.2.

comment:7 by Gonzalo Saavedra, 16 years ago

Cc: Gonzalo Saavedra added

comment:8 by Thomas Kerpe, 16 years ago

Cc: Thomas Kerpe added

comment:9 by Ludovico Magnocavallo, 15 years ago

Cc: Ludovico Magnocavallo added

comment:10 by stephanhoyer, 15 years ago

The count(*) is essential for displaying the paginator. Without knowing the amount of entities of a table, there can be any well working paginator.
One solution could be to display only "prev/next"-buttons and input field for page.

But i think the effort of fixing this is much higher then the benefit

So -1 for this request.

comment:11 by anonymous, 15 years ago

aside from the count problem, LIMIT item_count OFFSET 25000 (maybe slower numbers too already) is damn slow in MySQL ;)

comment:12 by James Bennett, 15 years ago

milestone: 1.2

1.2 is feature-frozen, moving this feature request off the milestone.

comment:13 by marcob, 15 years ago

See #13643 for an ugly-monkey-patching workaround.

comment:14 by marcob, 15 years ago

Cc: marcoberi@… added

comment:15 by David Fischer, 14 years ago

Cc: djfische@… added

comment:16 by Mikhail Korobov, 14 years ago

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

in reply to:  16 ; comment:17 by Łukasz Rekucki, 14 years ago

Resolution: wontfix
Status: newclosed

Replying to kmike:

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

Agreed :) Here's a link to a sample Paginator that doesn't use count

in reply to:  17 comment:18 by Malcolm Box, 14 years ago

Cc: boxm@… added
Easy pickings: unset
Resolution: wontfix
Severity: Normal
Status: closedreopened
Type: Uncategorized
Version: 0.96SVN

Replying to lrekucki:

Replying to kmike:

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

Agreed :) Here's a link to a sample Paginator that doesn't use count

Reopening, as the resolution simply doesn't work. The example paginator given by kmike simply doesn't work as a ModelAdmin paginator, because the {% pagination %} tag and the ChangeList class (among others) assume you can count the number of items being paginated. See my comment on #4065 for the gory details.

There are scenarios where you don't know how many results are being paginated, but can still do useful pagination links:

  • With InnoDB tables in MySQL, count(*) is frighteningly expensive, so while in principle you can count, in practice you can't
  • With NoSQL databases it can be impossible to do a count, but you can still paginate by range queries

A proposed design for such a paginator would be to display (on page 12):

1 2 3 ..... 10 11 12 ... Next

which is better than just next/prev.

I'll see if I can create acceptable patches.

by Malcolm Box, 14 years ago

Attachment: paginator-patch.diff added

Patch admin classes to add UncountedPaginator

comment:19 by Malcolm Box, 14 years ago

Has patch: set
Needs documentation: set
Needs tests: set
Triage Stage: AcceptedDesign decision needed

OK, here's an attempt at a patch that fixes this.

I've added a new UncountedPaginator to pagination.py, and updated the admin code to check whether the paginator has the appropriate attributes before using them, falling back on alternative implementations if not.

Before I create test & doc updates, would like some feedback on whether this patch is going about things the right way and is something that might be accepted.

comment:20 by Julien Phalip, 14 years ago

Type: UncategorizedNew feature
UI/UX: unset

comment:21 by charles@…, 13 years ago

This is not just MySQL. PostgreSQL has a separate query for cheap, inexact counts (with statistics collected during the most recent ANALYZE) -- but exact counts are expensive; I have a table where a SELECT COUNT(*) is typically running upward of 50 seconds.

Perhaps on PostgreSQL we could run a query like select reltuples from pg_class where relname=?;, and use the traditional pagination system if the result is under 20,000 or so, or a newer one otherwise?

See also http://wiki.postgresql.org/wiki/Slow_Counting

comment:22 by Mikhail Korobov, 13 years ago

Cc: kmike84@… added

comment:23 by Adam Nelson, 13 years ago

Triage Stage: Design decision neededAccepted

Switching back to 'accepted' - I think the switch to DDN was in error and simply sent this back into the icebox even though it was previously accepted by a core dev and has a draft patch.

comment:24 by anonymous, 13 years ago

The issue is not restricted to pagination. Seems like the admin interface calls count(*) at several other places.

comment:25 by Gonzalo Saavedra, 13 years ago

Cc: Gonzalo Saavedra removed

comment:26 by nova77, 13 years ago

Apparently count(*) is called on the whole table even if you filter your view. For instance if you add "?idin=555,666", you'll have two select count(*): one with the "where" clause and a plain one which runs on the whole table.

comment:27 by Anssi Kääriäinen, 13 years ago

Can you track down where the unrestricted qs.count() is issued, and if there is a reason for it?

comment:28 by Aymeric Augustin, 13 years ago

The admin displays the number of matching objects and the total number of objects.

comment:29 by Sergey Maranchuk, 13 years ago

Cc: slav0nic0@… added

comment:30 by Dan Fairs, 12 years ago

Cc: dan.fairs@… added

comment:31 by bb@…, 12 years ago

+1 because I ran into this problem today.

Using PostgreSQL and django 1.4 with a table that contains about 15 million rows. executing "select count(*) from mytable" in the psql console takes about 40 seconds, even with an index on the primary key. This leads to timeouts and huge lag.

I'm going to try MySQL to see if this is any better.

comment:32 by Aymeric Augustin, 12 years ago

Status: reopenednew

comment:33 by craig.labenz@…, 12 years ago

Using a table of 130M+ rows and thus obviously running into the same problem. The COUNT(*) query takes literally 10+ minutes to run in InnoDB-land.

comment:34 by jonathan_livni, 11 years ago

When django is hosted on Heroku, there's a time limit of 30 seconds for any web request. So instead of admin pages taking a very long time to load, you get admin pages which never load.

http://s2.postimg.org/onfy59xuh/Capture.jpg

comment:35 by Tomek Paczkowski, 11 years ago

Owner: changed from nobody to Tomek Paczkowski
Status: newassigned
Summary: add a new meta option: don't do count(*) in adminAdd a new meta option: don't do count(*) in admin

comment:36 by Tomek Paczkowski, 11 years ago

I'm working on this feature in my branch here: https://github.com/oinopion/django/compare/nocountpagination

comment:37 by Markus Amalthea Magnuson, 11 years ago

Cc: markus.magnuson@… added

comment:38 by Chris Spencer, 11 years ago

+1 for this feature. I have a few databases with tables containing tens of millions of records, and browsing them in Django is painfully slow, even with caching and extensive indexing. These sections of admin takes *minutes* to load, and DDT is telling me that around 90% of the load time is just from a single "SELECT COUNT(*) ..." to display the total result count, which I don't care about.

Yes, both PostgreSQL and MySQL have tricks for approximate counts, but only for total tables sizes, which doesn't help once you begin filtering.

And can confirm that using InfinitePaginator from django-pagination and specifying a custom paginator=InfinitePaginator in your modeladmin does not work because admin still expects to receive a valid count, which causes InfinitePaginator to throw NotImplementedError exceptions.

comment:39 by Chris Spencer, 11 years ago

See here for some interesting workarounds of Django's inefficient result count mechanism: http://craiglabenz.me/2013/06/12/how-i-made-django-admin-scale/

comment:40 by Thomas C, 10 years ago

Cc: t.chaumeny@… added

comment:41 by Thomas C, 10 years ago

Needs documentation: unset
Needs tests: unset

comment:42 by Thomas C, 10 years ago

Owner: changed from Tomek Paczkowski to Thomas C

comment:44 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: assignedclosed

In 17557d068c43bd61cdc6c18caf250ffa469414a1:

Fixed #8408 -- Added ModelAdmin.show_full_result_count to avoid COUNT() query.

Thanks lidaobing for the suggestion.

comment:45 by Josh Smeaton, 8 years ago

I don't think this ticket has been properly or fully fixed. The patch that landed only affects the display of filtered results. A count(*) is still executed every time you hit a page, which leads to timeouts when tables grow large.

I think the idea of using a custom paginator is probably a good one, but that could probably be controlled by an option in model admin or something. I'm not reopening this ticket until I've got some feedback first.

comment:46 by Matthew Betts, 7 years ago

Resolution: fixed
Status: closednew

As per Josh's comment a count(*) is still executed by the paginator when using the admin, even with show_full_result_count=False. This takes multiple seconds when querying tens of millions of rows. You can hack around the count(*) query by implementing a custom Paginator with either an appromixiate or a fixed count.

from django.core.paginator import Paginator
class FixedCountPaginator(Paginator):

    @property
    def count(self):
        return 1000

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    paginator = FixedCountPaginator

I'd like to fix this properly by either paginating without requesting the number of rows / pages or disabling pagination.

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    no_count = False

I've put up a pull request to demonstrate the changes required to achieve this. It changes the pagination widget in the admin to display [Prev, PAGE, Next]. Changing this to a show_pagination field that disables pagination would also solve the count(*) issue and may be more consistent / straight forward.

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    show_pagination = True

comment:47 by Josh Smeaton, 7 years ago

Cc: josh.smeaton@… added
Needs tests: set
Patch needs improvement: set

I spoke to Matthew at the pycon sprints and agreed that we should open this one back up if we had a PR to demonstrate. The PR needs some work still. I'll take this to the Mailing List.

comment:48 by Ionel Cristian Mărieș, 7 years ago

Cc: Ionel Cristian Mărieș added

comment:49 by Jerome Leclanche, 7 years ago

I've been using the following for a while:

class EstimatedCountPaginator(Paginator):
        def __init__(self, *args, **kwargs):
                super().__init__(*args, **kwargs)
                self.object_list.count = self.count

        @cached_property
        def count(self):
                if self.object_list.query.where:
                        return self.object_list.count()

                db_table = self.object_list.model._meta.db_table
                cursor = connections[self.object_list.db].cursor()
                cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", (db_table, ))
                result = cursor.fetchone()
                if not result:
                        return 0
                return int(result[0])

At the very least I believe such a paginator should be available by default in django.contrib.postgres.

But going beyond that, the problem is that users are not always able to modify the affected admin objects. So there should be some way of setting the *default* paginator for all admin objects, I think.

comment:50 by michi88, 6 years ago

What's wrong with just always calling len(self.object_list)?
Using this saves me from the additional count of the filtered objects. This seems to work as the queryset is evaluated/fetched and cached on the len(...) call, after that another fetch is not needed to render the items in the admin list.

class LenCountPaginator(Paginator):

    @cached_property
    def count(self):
        return len(self.object_list)

This is how de default Paginator does it:

@cached_property
def count(self):
    """
    Returns the total number of objects, across all pages.
    """
    try:
        return self.object_list.count()
    except (AttributeError, TypeError):
        # AttributeError if object_list has no count() method.
        # TypeError if object_list.count() requires arguments
        # (i.e. is of type list).
        return len(self.object_list)

comment:51 by Tim Graham, 6 years ago

.count() is called on the unfiltered queryset so calling len() on it would fetch all objects from the database, not just those on the current page.

comment:52 by michi88, 6 years ago

I indeed spoke too early. This did work for me as I combine it with django-postgres-fuzzycount as well which takes care of the unfiltered total count:

https://github.com/stephenmcd/django-postgres-fuzzycount/blob/master/fuzzycount.py

class FuzzyCountAdmin(ReplicaModelAdmin):
    """ 
     Models must have:

     fuzzy_count_manager = FuzzyCountManager()
    """

    def get_queryset(self, request):
        if hasattr(self.model, 'fuzzy_count_manager') and request.method == 'GET':
            qs = self.model.fuzzy_count_manager.get_queryset()
        else:
            qs = self.model._default_manager.get_queryset()
        # taken from Django source
        # TODO: this should be handled by some parameter to the ChangeList.
        ordering = self.get_ordering(request)
        if ordering:
            qs = qs.order_by(*ordering)
        return qs

This also works for the filtered requests:

class LenCountPaginator(Paginator):

    @cached_property
    def count(self):
        if self.object_list.query.where or getattr(self.object_list.query, 'having', None):
            return len(self.object_list)
        return self.object_list.count()

comment:53 by michi88, 6 years ago

Again spoke to early, sorry for the spam, it somehow only works for one particular admin view I have. Trying to figure out why that is the case.

comment:54 by michi88, 6 years ago

Ok, it only works when the number of total rows in the database is > list_per_page. When they are less I see multiple calls to len(self.object_list). Which I don't mind as for those views performance is a non-issue.

Nonetheless, this is all highly unreliable as it relies on the queryset cache also being used for rendering the actual list of objects.

comment:55 by TapanGujjar, 5 years ago

Is it possible to have a separate table that keeps the count of all the rows in other tables? I am new here and don't know how much work.

comment:56 by Brett Higgins, 4 years ago

In addition to slow COUNT queries, I've just noticed that the listing page calls len() on a potentially slow queryset:

https://github.com/django/django/blob/9a3454f6046b9b7591fd03e21cf6da0b23c57689/django/contrib/admin/options.py#L1816

On my listing page, I have a couple nested lookups, so this becomes a complex query with four joins, and it can take tens of seconds to run for around 600k records. With too many more records than that, I'm running afoul of AWS Lambda timeouts (30s, just like the Heroku user mentioned above).

All this to say - any solution to this slow COUNT issue should also address (avoid) this slow query as well.

comment:57 by Ondřej Chmelař, 4 years ago

Cc: Ondřej Chmelař added

comment:58 by elonzh, 3 years ago

Cc: elonzh added

in reply to:  description comment:59 by Misha, 3 years ago

I couldn't find any good solution so I build myself a custom paginator.

My database is mysql using innodb, I made use of innodb to get a fast and precise enough count of rows. Please be aware, that this might not be precise enough for everyone's usecase.

My paginator shouldn't be understood as a patch, but it's an adequate workaround to speed up your django admin. I had no problem with that, and I can now easily browse my 5 million+ datasets.

from django.core.paginator import Paginator

from django.utils.functional import cached_property
from django.db import connection


class MysqlInnoDbPaginator(Paginator):
    @cached_property
    def count(self):
        with connection.cursor() as cursor:
            cursor.execute(
                "SELECT TABLE_ROWS FROM information_schema.tables WHERE TABLE_SCHEMA = %s AND TABLE_NAME = %s;",
                [connection.settings_dict['NAME'], self.object_list.model._meta.db_table]
            )

            row = cursor.fetchone()

            if row is not None:
                return row[0]
            else:
                return 0

Add this to your admin.ModelAdmin implementation. The show_full_result_count is very important, otherwise another count would happen.

class FoobarAdmin(admin.ModelAdmin):
    paginator = MysqlInnoDbPaginator
    show_full_result_count = False

Also be aware, if you have multiple database configurations, you should modify it so it would find the right database name

comment:60 by Ülgen Sarıkavak, 2 months ago

Cc: Ülgen Sarıkavak added

https://github.com/django/django/commit/17557d068c43bd61cdc6c18caf250ffa469414a1 says it's fixed this issue. Is there anything missing to close it?

comment:61 by Simon Charette, 2 months ago

From my understanding a COUNT still happens for pagination purposes even if the show_full_result_count option is disabled.

Note: See TracTickets for help on using tickets.
Back to Top