Opened 17 years ago
Closed 13 years ago
#6422 closed New feature (fixed)
Support for 'DISTINCT ON' queries with QuerySet.distinct()
Reported by: | Owned by: | Jeffrey Gelens | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | dceu2011 |
Cc: | cornbread, ramusus@…, tim.babych@…, hr.bjarni+django@…, jeffrey@…, James Pic, taylor.mitchell@…, diegobz, jonas-django@…, anssi.kaariainen@…, German M. Bravo | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The patch included modifies django/db/models/query.py so that the distinct method of the QuerySet object optionally takes variable number of field names and, if given, modifies the SQL statements generated so that 'DISTINCT ON (field,...)' is used. The incentive is to allow things like described in the following example:
class Example(models.Model): name = models.TextField() date = models. DateTimeField() other = models.XXX() ... qs = Example.objects.all().distinct('name').order_by('name','-date')
Now qs will return the latest entry for each distinct name. This cannot otherwise be achieved unless resorting to plain SQL.
It should be noted that - at least in standard SQL and Postgres - if name allows NULL values, the query will return all entries for which name is NULL not only the latest one as one might expect.
There will be a couple of '# CHECKME:' comments scattered over the patched file. I tried to mark all places where the change might have a side effect but it wasn't apparent to me if and how to fix it. I.e. there should be no side effect from the code outside the commented regions.
Attachments (19)
Change History (87)
by , 17 years ago
Attachment: | query.py.patch added |
---|
comment:1 by , 17 years ago
It should further be noted that - at least with Postgres - fields specified with distinct() have to appear first with the order_by() statement.
comment:2 by , 17 years ago
Has patch: | set |
---|
by , 17 years ago
Attachment: | query.py.2.patch added |
---|
New patch fixes count method and resolves quoting issues
comment:3 by , 17 years ago
I have uploaded a new patch which fixes the handling of multiple fields in a DISTINCT ON query as well as the count method for DISTINCT ON queries and resolves the qouting issues.
For DISTINCT ON queries the distinct method has now to be invoked as distinct(on_fields=<FIELD_LIST>), e.g. distinct(on_fields=('field1','field2')). For convenience a single field name can be passed as a plain string like distinct(on_fields='fieldname').
comment:4 by , 17 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
Patches against trunk's query.py aren't particularly useful at the moment, since that code has been more or less entirely rewritten in the queryset-refactor branch. So, if you feel enthusiastic, you could try to update this to work with that branch, otherwise it will need to be rewritten after the branch is merged into trunk.
As to the feature itself, the "on_fields" parameter doesn't really fit with the rest of the API. Call the parameter "fields" for consistency. In fact, given that distinct() takes no parameters currently, it probably makes the most sense just to pass it field names: so it will take *args
. That's consistent with values(), valuelist() and select_related().
I notice the patch doesn't contain any tests, which will be necessary, since we need to be able to verify it's correct. Make sure you test any interaction with the .count()
method on querysets, since that required some special handling with normal .distinct()
calls. Documentation needed as well.
comment:5 by , 16 years ago
Cc: | added |
---|
comment:6 by , 15 years ago
Cc: | added; removed |
---|
I just ran into this issue. I am trying to return unique Session keys from a table. I tried PageRequest.objects.order_by('session').distinct().count() and I get 217 records and not 1 that I should be getting. Would be nice to just be able to do PageRequest.objects.all().distinct('session').count()
Looking forward to your new patch Manfred Wassmann. Thanks for the great work!
comment:7 by , 15 years ago
Cc: | added |
---|
comment:8 by , 15 years ago
Cc: | added |
---|
comment:9 by , 15 years ago
Cc: | added |
---|
follow-up: 11 comment:10 by , 15 years ago
Any updates on this? Why isn't it being merged with trunk? It's been 2 years!
comment:11 by , 15 years ago
Replying to mnbayazit:
Any updates on this? Why isn't it being merged with trunk? It's been 2 years!
This can not be merged to trunk as it does not contain tests and works only on certain backends (e.g. it certainly does not work with SQLite), so it needs a design decision.
However, for people who need this functionality right now, a working patch against 1.1.X is included.
comment:12 by , 15 years ago
The current patch also doesn't work with Oracle, which doesn't support the "DISTINCT ON" syntax and requires an analytic query for this purpose. I tried coding that up some time ago, but I found that the query architecture didn't really lend itself well to using analytic functions. That was before the sql compiler refactor, so I'll have to give it another look. It may be easier now.
comment:13 by , 14 years ago
Cc: | added |
---|
comment:14 by , 14 years ago
Cc: | added |
---|
comment:15 by , 14 years ago
Type: | → New feature |
---|
comment:16 by , 14 years ago
Severity: | → Normal |
---|
comment:17 by , 13 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
Working on this during the DjangoCon EU sprints. Improving the patch, writing tests and documentation.
comment:18 by , 13 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
Uploaded patch including docs and tests.
comment:19 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Technically, this introduces a backwards incompatibility on the true_or_false argument, but the argument is undocumented, untested, and based on a search of Google's code search, unused.
comment:20 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Glad to see this getting some attention, but I think we should hold off on RFC for now. The patch could still use further improvement in a few areas:
In particular, there is no mapping of the provided field names to actual fields and then to column names, which means that the values passed in must actually be the column names, not the field names. It also means there is no check that the specified fields are actually defined, prior to executing the query.
Also, at some point along the way we seem to have lost the special handling for queryset.distinct('somefield').count()
. I don't think this is strictly necessary as long as we take care to raise an exception in that case, but right now I see no reason to exclude it.
The patch hard-wires in self.query.model._meta.db_table for the columns in the distinct list, which prevents fields on related tables from being specified (e.g. queryset.distinct('othertable__somefield')
). Again, probably not necessary to commit, but this should be easily fixable once the field mapping is done.
comment:22 by , 13 years ago
We're almost there.. fixed the count case.. Now for the join case...
What would be the usecase for the queryset.distinct('othertable__somefield')
comment:23 by , 13 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:24 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:27 by , 13 years ago
Keywords: | dceu2011 added |
---|
comment:29 by , 13 years ago
Needs tests: | unset |
---|
comment:30 by , 13 years ago
Cc: | added |
---|
by , 13 years ago
Attachment: | distinct_on.2.diff added |
---|
Add (failing) test for M2M intermediate model
comment:31 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Tried this against our app and found some issues using it with a M2M model. Updated tests. I also seem to have lost the AUTHORS hunk in the process and Trac doesn't want to replace the attachment, apologies.
I would also add a +1 to adding join syntax if it can be done, I will try to think of a use case that's not as domain-specific as ours if needed.
comment:34 by , 13 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:35 by , 13 years ago
The patch was improved and this feature is working OK with M2M relations now. Added a test for this special case.
comment:36 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
There's an interplay between DISTINCT ON and ORDER BY that isn't clear. I was trying to nail it down before I replied but I got stuck, sorry about that.
To demonstrate, if you use the models from tests.regressiontests.queries and try to do: Tag.objects.distinct('parent')
you get an error because the default ordering for Tag
is by 'name'. I am not sure how to rectify this. Doing Tag.objects.order_by('parent').distinct('parent')
doesn't work either. If this isn't a bug, then the docs should be expanded.
comment:37 by , 13 years ago
Patch needs improvement: | unset |
---|
The 'parent' is an foreign key to self. When you print the query you'll see it doesn't add the parent_id to the ORDER BY clause:
>>> print Tag.objects.all().order_by('parent').query SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id" FROM "queries_tag" LEFT OUTER JOIN "queries_tag" T2 ON ("queries_tag"."parent_id" = T2."id") ORDER BY T2."name" ASC
The DISTINCT ON expression requires the leftmost ORDER BY to match the fields given in the DISTINCT ON clause. To make sure the parent_id will be in ORDER BY, use parent_ _pk
or
parent_ _id
:
>>> print Tag.objects.all().order_by('parent__pk').query SELECT "queries_tag"."id", "queries_tag"."name", "queries_tag"."parent_id", "queries_tag"."category_id" FROM "queries_tag" LEFT OUTER JOIN "queries_tag" T2 ON ("queries_tag"."parent_id" = T2."id") ORDER BY "queries_tag"."parent_id" ASC
Now if you update your ORM call a lil, you'll get the expected results:
>>> print Tag.objects.distinct('parent').order_by('parent__pk') [<Tag: t3>, <Tag: t5>, <Tag: t1>]
I added a note to the documentation about the 'order_by' requirement.
It's not a bug, so I hope the documentation is more clear now.
by , 13 years ago
Attachment: | distinct_on.diff added |
---|
comment:38 by , 13 years ago
Patch needs improvement: | set |
---|
Looks like something went awry with the last patch posted, the tests and some other things disappeared. Here's the combined patch, with a test for the self-referential Fk described above and an edit to the docs for readability.
by , 13 years ago
Attachment: | distinct_on.3.diff added |
---|
comment:39 by , 13 years ago
Patch needs improvement: | unset |
---|
comment:40 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I guess it's ready for checkin now.
comment:41 by , 13 years ago
Cc: | added |
---|
comment:42 by , 13 years ago
Cc: | added |
---|
comment:43 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Sadly the patch doesn't apply any more; bumping back to accepted.
I'm really sorry we dallied so long that the patch grew stale. I'll try to get it updated myself but if someone wants to save me a bit of time and update the patch that'd make me very happy.
comment:44 by , 13 years ago
I'll update it to match the current trunk in a couple of hours. It will be great if you could apply the patch soon after that, as it seems this part of the code is changing rapidly.
by , 13 years ago
Attachment: | distinct_on.4.diff added |
---|
comment:46 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
follow-up: 48 comment:47 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
I've come up with another case where it fails; added it to the tests in the patch.
Basically, the error lies in the assumption that all fields in the DISTINCT ON set belong to the base model of the QuerySet.
comment:48 by , 13 years ago
Replying to koniiiik:
I've come up with another case where it fails; added it to the tests in the patch.
Basically, the error lies in the assumption that all fields in the DISTINCT ON set belong to the base model of the QuerySet.
That patch doesn't apply:
(django)ligfiets$ patch -p1 <distinct_on.5.diff patching file AUTHORS patching file django/db/backends/__init__.py patching file django/db/backends/postgresql_psycopg2/base.py patching file django/db/backends/postgresql_psycopg2/operations.py patching file django/db/models/query.py patching file django/db/models/sql/compiler.py patching file django/db/models/sql/query.py patching file docs/ref/models/querysets.txt patching file tests/regressiontests/queries/models.py patching file tests/regressiontests/queries/tests.py patch: **** malformed patch at line 497: def setUp(self):
comment:49 by , 13 years ago
Figured out how to fix that kind of cases. Expect a new patch in a couple of hours.
by , 13 years ago
Attachment: | distinct_on.5.diff added |
---|
Fixed patch with failing testcase (sorry, shouldn't have modified the patch by hand hastily...)
by , 13 years ago
Attachment: | distinct_on.5.2.diff added |
---|
Added a failing test case for relationship traversals. (this one does apply)
comment:50 by , 13 years ago
Patch needs improvement: | unset |
---|
comment:51 by , 13 years ago
One more nit: in the backend prototype you forgot to update BaseDatabaseOperations.distinct to the new prototype without the db_table parameter. Other than that it looks quite promising.
by , 13 years ago
Attachment: | distinct_on.7.diff added |
---|
Updated distinct method in base class to match the new params
comment:52 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:53 by , 13 years ago
bombom:/storage/dists/Django-1.3.1# patch -p1 </storage/dists/distinct_on.7.diff patching file AUTHORS Hunk #1 succeeded at 524 (offset -20 lines). patching file django/db/backends/__init__.py Hunk #1 succeeded at 335 (offset -33 lines). Hunk #2 succeeded at 481 (offset -43 lines). patching file django/db/backends/postgresql_psycopg2/base.py Hunk #1 FAILED at 70. 1 out of 1 hunk FAILED -- saving rejects to file django/db/backends/postgresql_psycopg2/base.py.rej can't find file to patch at input line 97 Perhaps you used the wrong -p or --strip option? The text leading up to this was: -------------------------- |diff --git a/django/db/backends/postgresql_psycopg2/operations.py b/django/db/backends/postgresql_psycopg2/operations.py |--- a/django/db/backends/postgresql_psycopg2/operations.py |+++ b/django/db/backends/postgresql_psycopg2/operations.py -------------------------- File to patch: Skip this patch? [y] y Skipping patch. 1 out of 1 hunk ignored patching file django/db/models/query.py Hunk #1 succeeded at 651 (offset -42 lines). Hunk #2 succeeded at 1078 (offset -42 lines). patching file django/db/models/sql/compiler.py Hunk #1 succeeded at 67 (offset -4 lines). patching file django/db/models/sql/query.py Hunk #1 FAILED at 121. Hunk #2 FAILED at 260. Hunk #3 FAILED at 389. Hunk #4 FAILED at 1592. 4 out of 4 hunks FAILED -- saving rejects to file django/db/models/sql/query.py.rej patching file docs/ref/models/querysets.txt Hunk #1 FAILED at 340. Hunk #2 FAILED at 369. 2 out of 2 hunks FAILED -- saving rejects to file docs/ref/models/querysets.txt.rej patching file tests/regressiontests/queries/models.py Hunk #2 FAILED at 340. 1 out of 2 hunks FAILED -- saving rejects to file tests/regressiontests/queries/models.py.rej patching file tests/regressiontests/queries/tests.py Hunk #1 FAILED at 10. Hunk #2 FAILED at 1732. Hunk #3 FAILED at 1818. 3 out of 3 hunks FAILED -- saving rejects to file tests/regressiontests/queries/tests.py.rej
Why this happning? :)
comment:54 by , 13 years ago
Are you sure you're applying it on a recent SVN trunk? There it applies almost fine, except for a small whitespace error in the patch (probably caused by git) that makes one hunk in tests/regressiontests/queries/models.py fail.
Just out of pure curiosity I just tried it on the 1.3.X branch and there it looks suspiciously similar to your output.
comment:55 by , 13 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
I've update the patch with:
- The new tests failed for me because on my environment the
Tag.objects.order_by('parent__pk').distinct('parent')
call choose['<Tag: t2>', '<Tag: t4>', '<Tag: t1>']
instead of['<Tag: t3>', '<Tag: t45', '<Tag: t1>']
. Had to useorder_by('parent__pk', 'pk')
instead. - Fleshed out a bit documentation changes:
- I'd like the note about the need to use an
order_by()
call to be more clear. -- I'm clearing RFC because of this. - Added some examples of usage based on my interpretation of the new functionality.
- I'd like the note about the need to use an
- Moved the addition to the AUTHORS file to the right spot (lexicographical ordering by last name)
comment:56 by , 13 years ago
Patch needs improvement: | unset |
---|
I updated the documentation explaining the use of order_by. Hope it's more clear like this.
comment:57 by , 13 years ago
Some comments (I haven't actually tried the patch, just read it in diffview):
- IMHO add_distinct_fields needs a docstring. Other similar methods seem to have one, also.
- What happens if you specify .distinct(field_list) multiple times with different fields? If the first .distinct call generated joins, won't those joins be left loitering around without any users, but still having refcount > 0?
- Haven't tested this, but I suspect you can break this by adding fields to DISTINCT ON which aren't in the normal select list and then doing a group by. If you want correct results in this case, you will probably need to subquerify the DISTINCT ON query, then do a group by. NotImplementedError seems to be the sane choice, making this work correctly might be a bit hard and I haven't ever used distinct on + group by combination when writing SQL myself.
The true_or_false argument was removed. Is there any way to get rid of the distinct after this. Does there need to be one?
comment:58 by , 13 years ago
@akaariai
- Added a docstring to add_distinct_fields.
- If you specify distinct(field_list) multiple times it will overwrite the previous one. This is the intended behavior.
- The fields in DISTINCT ON are always in the select list. I guess your explanation can't happen. If it does could you explain further and give some examples?
- I discussed the true_or_false argument removal with Russell and Alex during the DjangoCon in Amsterdam this year. As it's a undocumented feature and we couldn't find any usage of it on the web, we decided to remove the argument. It's a weird argument anyway ;) .
comment:59 by , 13 years ago
Assume you have models:
class A1(models.Model): foo = models.TextField() class B1(models.Model): a = models.ForeignKey(A1)
And you do:
qs1 = B1.objects.distinct('a__id', 'a__foo') print qs1.query qs2 = qs1.distinct('pk') print qs2.query
Now you will have a join to A even though you don't have a reference to it. The queries generated:
SELECT DISTINCT ON ("obj_creation_speed_a1"."id", "obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id" FROM "obj_creation_speed_b1" INNER JOIN "obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" = "obj_creation_speed_a1"."id") SELECT DISTINCT ON ("obj_creation_speed_b1"."id") "obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id" FROM "obj_creation_speed_b1" INNER JOIN "obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" = "obj_creation_speed_a1"."id")
Note that in the second query, there is a join into A even if no field of A is used in the query. The join should be removed. But I don't think this is a blocker issue. If you want to fix this, there are at least these two choices:
- Apply the distinct on in the compiler when the query is generated, order by is handle this way IIRC.
- Keep record of which aliases are referenced by the DISTINCT ON added fields, and subtract the refcount by one for those aliases if the distinct on is changed.
As said before, in my opinion this is not a must-fix in this ticket.
Now, if you continue from the above created qs1 and issue .annotate(Max('aid')), you will get this query:
qs3 = qs1.annotate(Max('a__id')) print qs3.query list(qs3)
The query generated is this:
SELECT DISTINCT ON ("obj_creation_speed_a1"."id", "obj_creation_speed_a1"."foo") "obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id", MAX("obj_creation_speed_b1"."a_id") AS "a__id__max" FROM "obj_creation_speed_b1" INNER JOIN "obj_creation_speed_a1" ON ("obj_creation_speed_b1"."a_id" = "obj_creation_speed_a1"."id") GROUP BY "obj_creation_speed_b1"."id", "obj_creation_speed_b1"."a_id"
And the list(qs3) will generate this:
django.db.utils.DatabaseError: column "obj_creation_speed_a1.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT DISTINCT ON ("obj_creation_speed_a1"."id", "obj_creat... ^
You could "fix" this by appending the DISTINCT ON columns into the group by clause. Unfortunately (without going into details) the results aren't what the user would expect. If you want the results the user expects, you would need to do a subquery. If the distinct on is in the inner query or group by is in the inner query depends on the order in which the .annotate and .distinct are applied. A good fix for this would be to raise NotImplementedError("Can't handle queries with aggregates and distinct on clause"). I have written some SQL by hand, and can't remember ever using distinct on and group by in the same query. Maybe there is some use case, but I just don't see the need to block on this issue, either.
comment:60 by , 13 years ago
@ akaariai Thanks for your thorough comments :)
I fixed your first point about the join removal when calling distinct() multiple times.
It now applies the distinct in the compiler. This didn't fix the problem however, so I had to unref the fields manually, which is working great.
As you've said, the combination of a annotation and an distinct on is probably a use case which is never used. This is certainly not a blocker for this feature.
Anyways, this patch took a lot of time. I hope all the bugs were fixed :-D
by , 13 years ago
Attachment: | distinct_on.10.diff added |
---|
fixed when calling distinct() multiple times on the same query set
comment:61 by , 13 years ago
Cc: | added |
---|
Sorry, I still found one more problem, and some smaller issues:
- The distinct() in postgresql_psycopg2/operations.py doesn't take aliases into account. It assumes that the distinct is always for the main table alias. If you first do a filter on m2m relation, then a distinct on for the m2m relation you get:
# I added coworkers = models.ManyToManyField('self') to Staff model used in tests >>> print Staff.objects.distinct('coworkers__name').order_by('coworkers__name').query SELECT DISTINCT ON ("queries_staff"."name") "queries_staff"."id", "queries_staff"."name", "queries_staff"."organisation", T3."name" FROM "queries_staff" LEFT OUTER JOIN "queries_staff_coworkers" ON ("queries_staff"."id" = "queries_staff_coworkers"."from_staff_id") LEFT OUTER JOIN "queries_staff" T3 ON ("queries_staff_coworkers"."to_staff_id" = T3."id") ORDER BY T3."name" ASC
If you execute that query, it will error, because distinct on is for "queries_staff"."name", but order by is for T3."name"
- In the attached patch there is a different approach of join cleanup. It also fixes one other long-standing expected_failure test in queries. The failure is related to similar "lingering joins" done by ordering.
- The
obj.query.distinct_fields = []
in query.distinct() is not needed. The variable will be set anyways in sql.query.add_distinct_fields(). obj.query.distinct = True can also be set in add_distinct_fields() - Should the joins be generated as OUTER or INNER joins? Not sure about this. I guess either way will be fine. The attached patch generates OUTER JOINS (because it uses similar code than the order by join creation).
- Prevent different distinct_fields lists when combining querysets.
I attached a patch which fixes these issues. It isn't the cleanest one (non-DRY, code copy from order_by). But it can be used as a basis for a cleaner patch if needed.
I did also check out how to support aggregates with distinct_fields. However I had a hard time figuring out a realistic test case, so I removed that part. So, jgelens is absolutely correct here: better to first find an use case :)
All in all, the only major issue is the correct alias usage in compiler.py. The rest is just nitpicking. In my opinion the alias usage needs to be fixed before commit, but I may be over-complicating things again... Other than that I think this feature is now very solid. I might be fine to just let the distinct() generated joins linger in there after all. That is what order_by has done for ages.
The attached patch passes all tests, with one long-standing expected failure now gone. As said the patch is just to show one possible approach to the alias issue. I hope jgelens will check it out and see what to do to the above mentioned issues (if anything).
by , 13 years ago
Attachment: | distinct_on.11.diff added |
---|
Possible approach for using correct aliases
comment:62 by , 13 years ago
I think your additions to the patch are great. Everything works OK. The code copy from order_by is minor and imo unnecessary to split out and generalize for both functions.
Let's mark it as RFC as 1.4alpha is nearing completion. Thanks!
comment:63 by , 13 years ago
I hope to do a final cleanup tomorrow. I still think it is a good idea to throw NotImplementedError
for aggregate & annotate (this is actually quite simple: in each method check if self.distinct_fields: raise NotImplementedError
). The reason is that annotate + DISTINCT ON actually somewhat works, sometimes. But it doesn't work in any meaningful way. So to avoid backwards compatibility issues if annotate + distinct_on combination is going to ever be supported, it is good that there is no behaviour Django must support. The exceptions are simple enough to implement, so IMHO this is a good idea still. Sorry for coming back to this again and again. :)
In addition I will check if there is some easy way to avoid the code duplication between get_ordering and get_distinct join generation (in compiler.py). It is a good idea, as actually both of them _must_ generate same joins, otherwise you will get errors on ordering not matching prefix of DISTINCT ON.
comment:64 by , 13 years ago
Cc: | added |
---|
comment:65 by , 13 years ago
Final cleanup done & attached. All tests should pass on sqlite, and the changed tests do pass on PostgreSQL.
Changes from the previous version:
- DRY in complier.py join setup for ordering and distinct
NotImplementedError
for .distinct(fields) + annotate/aggregate.- Moved the new tests to modeltests/distinct_on_fields
There are a couple of test-fixes combined in the patch. Create 2500 objects -> bulk create 2500 objects, one expected failure is no more, and one test was testing how executing a query alters the query instance (it should not alter it). It now tests the created SQL instead.
As far as I understand this is now ready for checkin. The only minor thing missing is a check for order_by prefix matching distinct_fields prefix. The DB will complain if that is broken, so that really isn't a big deal.
Please review & mark ready for checkin if no problems found. Please check extra-carefully added comments. My English isn't that good...
by , 13 years ago
Attachment: | distinct_on.12.diff added |
---|
comment:66 by , 13 years ago
Thanks! Looks great, I ran the tests, you forgot to add a init.py to the modeltests/distinct_on_fields module. Also some Tag object creation was accidentally removed. I added this again.
I added a new patch with these changes.
Now it looks we've finally come to a version which is ready for checkin! :-D
by , 13 years ago
Attachment: | distinct_on.13.diff added |
---|
comment:67 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
by , 13 years ago
Attachment: | distinct_on.14.diff added |
---|
Added Anssi to AUTHORS, release notes blurb, removed unrelated tests change, renamed DatabaseOperations.dictinct() to distinct_sql()
Patch for django/db/models/query.py