#11088 closed (invalid)
Aggregation problem, table JOINed twice
Reported by: | Owned by: | ||
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.1-beta |
Severity: | Keywords: | aggregation, join | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I don't know if I'm doing something wrong but I've tried to compile all the information needed to track this one.
Models
video/models.py
class Video(models.Model): name = models.TextField() dir = models.TextField() videoclass = models.ForeignKey('Videoclass') status = models.CharField(max_length=24) date = models.DateTimeField() duration = models.FloatField() width = models.IntegerField() height = models.IntegerField() displayname = models.TextField() published = models.IntegerField() views = models.IntegerField() ratesum = models.IntegerField() ratenum = models.IntegerField()
statistics/models.py
class VisitorAction(models.Model): name = models.CharField(max_length = 200) def __unicode__(self): return self.name class VisitorLog(models.Model): visitor = models.ForeignKey("Visitor") video = models.ForeignKey(Video) action = models.ForeignKey(VisitorAction) seek_video = models.FloatField() event_time = models.DateTimeField(default = datetime.datetime.now) domain = models.CharField(max_length = 128)
Datas
statistics_visitorlog contains 100k+ entries
video_video contains 400 entries
statistics_visitoraction contains 6 entries ('play', 'seek' ... etc)
The QS
>>> from video.models import * >>> from django.db.models import Count >>> Video.objects.annotate(play_log = Count('visitorlog')).filter(visitorlog__action__name = 'play').order_by('-play_log')[0:10]
[Kill the mysqld because it takes forever and have a nice backtrace]
>>> from django.db import connection >>> connection.queries[-1] {'time': '7.180', 'sql': u'SELECT `video_video`.`id`, `video_video`.`name`, `video_video`.`dir`, `video_video`.`videoclass_id`, `video_video`.`status`, `video_video`.`date`, `video_video`.`duration`, `video_video`.`width`, `video_video`.`height`, `video_video`.`displayname`, `video_video`.`published`, `video_video`.`views`, `video_video`.`ratesum`, `video_video`.`ratenum`, COUNT(`statistics_visitorlog`.`id`) AS `play_log` FROM `video_video` LEFT OUTER JOIN `statistics_visitorlog` ON (`video_video`.`id` = `statistics_visitorlog`.`video_id`) INNER JOIN `statistics_visitorlog` T4 ON (`video_video`.`id` = T4.`video_id`) INNER JOIN `statistics_visitoraction` ON (T4.`action_id` = `statistics_visitoraction`.`id`) WHERE `statistics_visitoraction`.`name` = play GROUP BY `video_video`.`id` ORDER BY play_log DESC LIMIT 10'}
The sql request
SELECT `video_video`.`id` , `video_video`.`name` , `video_video`.`dir` , `video_video`.`videoclass_id`, `video_video`.`status` , `video_video`.`date` , `video_video`.`duration` , `video_video`.`width` , `video_video`.`height` , `video_video`.`displayname` , `video_video`.`published` , `video_video`.`views` , `video_video`.`ratesum` , `video_video`.`ratenum` , COUNT(`statistics_visitorlog`.`id`) AS `play_log` FROM `video_video` LEFT OUTER JOIN `statistics_visitorlog` ON ( `video_video`.`id` = `statistics_visitorlog`.`video_id` ) INNER JOIN `statistics_visitorlog` T4 ON ( `video_video`.`id` = T4.`video_id` ) INNER JOIN `statistics_visitoraction` ON ( T4.`action_id` = `statistics_visitoraction`.`id` ) WHERE `statistics_visitoraction`.`name` = 'play' GROUP BY `video_video`.`id`
It takes forever and burns my cpu. However, if I remove the LEFT OUTER JOIN on statistics_visitorlog (which seems to be joined twice), and fix the select to call the proper JOINed table:
SELECT `video_video`.`id` , `video_video`.`name` , `video_video`.`dir` , `video_video`.`videoclass_id`, `video_video`.`status` , `video_video`.`date` , `video_video`.`duration` , `video_video`.`width` , `video_video`.`height` , `video_video`.`displayname` , `video_video`.`published` , `video_video`.`views` , `video_video`.`ratesum` , `video_video`.`ratenum` , COUNT(T4.`id`) AS `play_log` FROM `video_video` INNER JOIN `statistics_visitorlog` T4 ON ( `video_video`.`id` = T4.`video_id` ) INNER JOIN `statistics_visitoraction` ON ( T4.`action_id` = `statistics_visitoraction`.`id` ) WHERE `statistics_visitoraction`.`name` = 'play' GROUP BY `video_video`.`id`
this returns in 0.6 sec with all the datas needed to fill the objects.
Change History (4)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Mir's analysis appears to be correct; this issue is discussed in the documentation.
comment:4 by , 12 years ago
Component: | ORM aggregation → Database layer (models, ORM) |
---|
Your query means to first annotate, counting the visitorlog entries for all actions, and then afterwards apply the filter to the annotated query. I think you have to swap the annotate and filter clauses in your query to get what you intend:
I'm not particularily experienced with annotations, so I don't want to close your ticket without your confirmation that you see it the same way.