Opened 9 years ago
Closed 9 years ago
#25470 closed Cleanup/optimization (fixed)
Serious performance impact (500x slower!) when querying distinct model dates objects
Reported by: | Qian Xu | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.8 |
Severity: | Normal | Keywords: | |
Cc: | felisiak.mariusz@…, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I'm using django 1.8.4 writing a webapp. The backend uses MySQL 5.6 (MyISAM). Recently the number of table records reaches 1 million, it will take 1-1.5 seconds to query all distinct record dates. But using MySQL client, it takes less than 0.001 second.
Django Code
class Model1(models.Model): date = models.DateField(db_index=True) # benchmark code db_dates = Model1.objects.dates("date", kind="day")
I dumped the django queries.
{u'time': u'1.989', u'sql': u"SELECT DISTINCT CAST(DATE_FORMAT(`app1_model1`.`date`, '%Y-%m-%d 00:00:00') AS DATETIME) AS `datefield` FROM `app1_model1` WHERE `app1_model1`.`date` IS NOT NULL ORDER BY `datefield` ASC"}
As you can see, the query did possibly an unnecessary type cast. The performance impact is scaled to the amount of records. The table field is exactly DATE
type. I don't know, if there is some reason for the type casting.
Currently, I use another query to get these dates, which takes 0.04 second.
db_dates = Model1.objects.values_list('date', flat=True).distinct()
I posted a question at StackOverflow.
Change History (19)
comment:1 by , 9 years ago
Description: | modified (diff) |
---|
comment:2 by , 9 years ago
Description: | modified (diff) |
---|
comment:3 by , 9 years ago
Description: | modified (diff) |
---|
comment:4 by , 9 years ago
Description: | modified (diff) |
---|
comment:5 by , 9 years ago
Description: | modified (diff) |
---|
comment:6 by , 9 years ago
Description: | modified (diff) |
---|
comment:7 by , 9 years ago
Description: | modified (diff) |
---|---|
Summary: | Django cast DATE field to DATETIME unexpectedly → Serious performance impact (500x slower!) when querying distinct model dates objects |
comment:8 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:9 by , 9 years ago
Thanks for agreeing this is a serious performance impact. It is still unclear, why DATE is first converted to DATETIME. The Django team must know, whether the code is written by design?
follow-up: 12 comment:10 by , 9 years ago
Did you try modifying the code and seeing if any tests fail? That's often helpful.
comment:11 by , 9 years ago
Cc: | added |
---|
comment:12 by , 9 years ago
Replying to timgraham:
Did you try modifying the code and seeing if any tests fail? That's often helpful.
I dont have the time to do any further investigation. I don't realize that when I report issue to django community, I'd provide patch and do necessary tests. It is already 2 weeks. If I dont make a patch, no body cares? This is about a 500x slower issue! Are you guys working in this way? Really sorry about that. Bye.
comment:13 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 15 comment:14 by , 9 years ago
Has patch: | set |
---|
I made PR to omit unnecessary type cast.
In my opinion, the second step, is to change default order
in dates
, datetimes
methods
-
django/db/models/query.py
a b class QuerySet(object): 726 726 clone._iterable_class = FlatValuesListIterable if flat else ValuesListIterable 727 727 return clone 728 728 729 def dates(self, field_name, kind, order= 'ASC'):729 def dates(self, field_name, kind, order=None): 730 730 """
or give a way to disable it
-
django/db/models/query.py
a b class QuerySet(object): 733 733 """ 734 734 assert kind in ("year", "month", "day"), \ 735 735 "'kind' must be one of 'year', 'month' or 'day'." 736 assert order i n ('ASC', 'DESC'), \737 "'order' must be either 'ASC' or 'DESC'."738 returnself.annotate(736 assert order is None or order in ('ASC', 'DESC'), \ 737 "'order' must be either 'ASC', 'DESC' or None." 738 qs = self.annotate( 739 739 datefield=Date(field_name, kind), 740 740 plain_field=F(field_name) 741 741 ).values_list( 742 742 'datefield', flat=True 743 ).distinct().filter(plain_field__isnull=False).order_by(('-' if order == 'DESC' else '') + 'datefield') 743 ).distinct().filter(plain_field__isnull=False) 744 if order is not None: 745 return qs.order_by(('-' if order == 'DESC' else '') + 'datefield') 746 else: 747 return qs
comment:16 by , 9 years ago
Cc: | added |
---|
The default sort order of ASC
seems to be documented and simply making results unordered would be backward incompatible.
I guess you should open a new ticket for a feature request allowing the use of order=None
to get unordered results.
comment:17 by , 9 years ago
Needs tests: | set |
---|
comment:18 by , 9 years ago
Needs tests: | unset |
---|
If you'd like to investigate and propose a fix, that would be great!