Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#2479 closed enhancement (duplicate)

[patch] add sum() method to query set ( SELECT SUM( fieldname .... )

Reported by: Simon Greenhill, dev@… Owned by: Adrian Holovaty
Component: Database layer (models, ORM) Version: dev
Severity: minor Keywords:
Cc: dev@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Added a sum() method to enable SELECT SUM( ) queries e.g.:

a = Table.objects.all().sum( 'fieldname' )

implemented in the same way as the count() method.

Note: tested on MySQL and SQLite backends.

Attachments (4)

query_sum.diff (1017 bytes ) - added by dev@… 18 years ago.
models.py (1.1 KB ) - added by dev@… 18 years ago.
unit tests
models.2.py (1.1 KB ) - added by dev@… 18 years ago.
fixed modeltests
aggregate.diff (984 bytes ) - added by dev@… 18 years ago.
quick'n'dirty aggregate function

Download all attachments as: .zip

Change History (11)

by dev@…, 18 years ago

Attachment: query_sum.diff added

comment:1 by dev@…, 18 years ago

Also:

1) I left the output as-is. MySQLdb appears to return a float and SQLite an int. Not sure whether it would be better to type cast to int.

2) This makes no attempt to ensure that SUM is run on a sensible fieldtype - both MySQL/SQLite will allow a SUM on pretty much anything, but just return 0 if it doesn't make sense ( char/varchar/blob ).

3) first time writing unittests - please review brutally :)

by dev@…, 18 years ago

Attachment: models.py added

unit tests

comment:2 by dev@…, 18 years ago

Just thinking about this - rather than proliferate methods for each database function ( .count, .sum, .avg, .abs, .log, .sqrt etc etc ), it'd be better to have a method that can call each function, e.g.:

	f = Fudge.objects.all().function( function_name, fieldname )

	# e.g.:
	
	f = Fudge.objects.all().function( 'sqrt', 'some_field' )

Might have to store a map of database types to function names to hide any (can't think of any offhand - but I'm sure there are some) naming inconsistencies between the databases and make sure that the correct function gets called e.g.

map = {
	'sqlite3' = {
		'sum':'SUM(%s)', 'average':'AVG(%s)', 
	}
	'mysql' = {
		'sum':'SUM(%s)', 'average':'AVG(%s)', 
	}
	'postgres' = {
		'sum':'SUM(%s)', 'average':'AVG(%s)', 
	}
}

by dev@…, 18 years ago

Attachment: models.2.py added

fixed modeltests

comment:3 by dev@…, 18 years ago

Bit more playing - the original modeltests will fail if the database backend returns floats not ints. The second modeltest file typecasts to int, but this is probably bad practice, and depends on whether sum() should to the type managing or not.

comment:4 by Malcolm Tredinnick, 18 years ago

Before you go too far here: firstly, having various aggregate functions is indeed something we want to incorporate. The API may need a bit of thought, though (for example, should it only work like count(), or should some kind of sub-filtering be possible?).

Secondly, query.py is on the operating table at the moment being refactored, so this will need to wait a few more days at least and then be rebased against the current code (although, admittedly, the change isn't too intrusive, so that shouldn't be too hard). Thinking about what it will look like if we also add avg(), max(), ..., etc, though, it may be that one method of this length per aggregate starts to get crazy. We need to think about that.

Thirdly, stop with the casting to int! If a field contains floats, your version of sum() is going to get the answer wrong.

Finally, I would prefer we do some kind of sanity checking on the target field. The users of models are not required to have deep understanding of SQL and databases (in theory), so we should provide sensible error messages when we can. And sum(name) is rarely going to be the right thing.

comment:5 by dev@…, 18 years ago

Ok - I'll let you guys give query.py a thorough workover, and have a think about the best way to implement more of these functions.

Re: filtering - .filter() still works, so you can still get a SELECT SUM( something ) FROM table WHERE x=a, y=b etc. If you mean something like SUM( DISTINCT fieldname ), that doesn't work - but that'll only work on MySQL >5 anyway.

Certainly things like COUNT() are so common that they should have their own method, but I agree that having one method per database function is bad DRY. I've quickly had a go at something like this (diff attached) which adds .aggregate to the methods e.g. Something.objects.all().aggregate( 'function_name', 'field_name' ). You could also match the filter syntax, e.g. fieldname_ _function -> function(fieldname) if preferred.

As for sanity checks - you could definitely check for int/double ( & their Django Model equivalents ), and raise something like a FieldIsWrongType exception if it's one of the char/varchar or datetime fields. However, do you want to stop these being executed on non-numeric fields right across the board? or is this going to be something to specify like SUM works on Integer/Float/Decimal/etc Fields, whilst OTHERFUNCTION only works on, say, IntegerFields.

( Aside - are there any database aggregate functions that are useful on non-numeric fields? )

re: type casting - point taken! :)

by dev@…, 18 years ago

Attachment: aggregate.diff added

quick'n'dirty aggregate function

comment:6 by dev@…, 18 years ago

Resolution: duplicate
Status: newclosed

Bah, duplicate of #1435 which has a better patch.

comment:7 by Henrik Vendelbo <info@…>, 18 years ago

Someone suggested in the doc comments calling the method aggregate rather than function.

Also you might consider having a parameter specifying what to return if the queryset is empty. I believe SQL Server returns NULL rather than 0. And I suppose it would be nice to control that case anyway.

Note: See TracTickets for help on using tickets.
Back to Top