Opened 15 years ago
Closed 13 years ago
#11916 closed Uncategorized (fixed)
Extra params + aggregation creates incorrect SQL.
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.1 |
Severity: | Normal | Keywords: | |
Cc: | paluho@…, Gabriel Hurley, roberto.maurizzi@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
A query that has an annotation/aggregation (and thus a GROUP BY), extra (where) parameters and extra select parameters generates incorrect SQL because it puts the extra parameters at the end of the params list, although get_grouping() will generate parameters due to the extra select parameters. I have attached a diff for django.db.models.sql.query.py that should I believe addresses this problem.
Attachments (6)
Change History (36)
by , 15 years ago
Attachment: | django.db.models.sql.query.py added |
---|
comment:1 by , 15 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
by , 15 years ago
Attachment: | django.db.models.sql.query.patch added |
---|
changing grouping generation from values to keys of extra selects
comment:2 by , 15 years ago
Resolution: | duplicate |
---|---|
Status: | closed → reopened |
I think it's that this ticket is not duplicate of #11104, and it's related to bug in grouping columns list generation: statements of extra selects are taken to column list instead of keys. my patch (which I've attached):
--- django/db/models/sql/query.py (revision 11729) +++ django/db/models/sql/query.py (working copy) @@ -885,9 +885,9 @@ group_by = self.group_by or [] extra_selects = [] - for extra_select, extra_params in self.extra_select.itervalues(): - extra_selects.append(extra_select) - params.extend(extra_params) + for extra_select_key in self.extra_select.iterkeys(): + extra_selects.append(extra_select_key) + for col in group_by + self.related_select_cols + extra_selects: if isinstance(col, (list, tuple)): result.append('%s.%s' % (qn(col[0]), qn(col[1])))
comment:3 by , 15 years ago
Has patch: | set |
---|
Agreed, these bugs are unrelated. #11104 is about the order of parameters while this one is about using correct aliases instead of incorrectly repeating subqueries in the GROUP BY part.
comment:4 by , 15 years ago
milestone: | → 1.2 |
---|---|
Needs tests: | set |
Triage Stage: | Unreviewed → Accepted |
comment:5 by , 15 years ago
Cc: | added |
---|
comment:6 by , 15 years ago
I ran into this bug too, the whole sub-select statement of my extra() clause got copied into the group by portion of the query, when just the key name would be sufficient. Don't know if there's some case where you'd want that, but this is a +1 on paluh's patch - works for me!
qs.annotate(stock=Sum('variants__stock')).filter(Q(stock__gt=0)).extra( select={ 'op_option_group_id': ''' SELECT op.option_group_id FROM store_optionedproduct op, store_product WHERE op.product_ptr_id=store_product.id LIMIT 1 ''' } )
SELECT (SELECT op.option_group_id FROM store_optionedproduct op, store_product WHERE op.product_ptr_id=store_product.id) AS `op_option_group_id` FROM `store_product` LEFT OUTER JOIN `store_productvariant` ON (`store_product`.`id` = `store_productvariant`.`product_id`) WHERE (`store_product`.`status` IN (1)) GROUP BY `store_product`.`id`, --> SELECT op.option_group_id FROM store_optionedproduct op, store_product WHERE op.product_ptr_id=store_product.id HAVING SUM(`store_productvariant`.`stock`) > 0 ORDER BY NULL
comment:7 by , 15 years ago
FYI problem still occurs in 1.2-beta-1.
I have made a new ticket and attached the equivalent patch for 1.2 branch here: #13123
comment:8 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
going to take a stab at creating a test for this
comment:9 by , 15 years ago
Status: | new → assigned |
---|
please submit patches in a format like that given by svn diff, i.e., it should contain the full repository path to the patched file. For more info see:
http://docs.djangoproject.com/en/dev/internals/contributing/#submitting-patches
comment:10 by , 15 years ago
The code change to compiler.py in ticket_11916.diff introduces these failures into the doctests for aggregation_regress:
====================================================================== FAIL: Doctest: regressiontests.aggregation_regress.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 2180, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for regressiontests.aggregation_regress.models.__test__.API_TESTS File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: sorted((k,v) for k,v in Book.objects.all().annotate(mean_auth_age=Avg('authors__age')).extra(select={'manufacture_co st' : 'price * .5'}).get(pk=2).__dict__.items() if k != '_state') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[9]>", line 1, in <module> sorted((k,v) for k,v in Book.objects.all().annotate(mean_auth_age=Avg('authors__age')).extra(select={'manufacture_cost' : 'price * .5'}).get(pk=2).__dict__.items() if k != '_state') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 334, in get num = len(clone) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 79, in __len__ self._result_cache = list(self.iterator()) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 267, in iterator for row in compiler.results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "MANUFACTURE_COST": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: sorted((k,v) for k,v in Book.objects.all().extra(select={'manufacture_cost' : 'price * .5'}).annotate(mean_auth_age= Avg('authors__age')).get(pk=2).__dict__.items()if k != '_state') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[10]>", line 1, in <module> sorted((k,v) for k,v in Book.objects.all().extra(select={'manufacture_cost' : 'price * .5'}).annotate(mean_auth_age=Avg('authors__age')).get(pk=2).__dict__.items()if k != '_state') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 334, in get num = len(clone) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 79, in __len__ self._result_cache = list(self.iterator()) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 267, in iterator for row in compiler.results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "MANUFACTURE_COST": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: sorted((k,v) for k,v in Book.objects.all().annotate(mean_auth_age=Avg('authors__age')).extra(select={'manufacture_cost' : 'price * .5'}).values().get(pk=2).items()if k != '_state') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[11]>", line 1, in <module> sorted((k,v) for k,v in Book.objects.all().annotate(mean_auth_age=Avg('authors__age')).extra(select={'manufacture_cost' : 'price * .5'}).values().get(pk=2).items()if k != '_state') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 334, in get num = len(clone) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 79, in __len__ self._result_cache = list(self.iterator()) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 824, in iterator for row in self.query.get_compiler(self.db).results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "MANUFACTURE_COST": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: sorted((k,v) for k,v in Book.objects.all().values().annotate(mean_auth_age=Avg('authors__age')).extra(select={'manufacture_cost' : 'price * .5'}).get(pk=2).items()if k != '_state') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[12]>", line 1, in <module> sorted((k,v) for k,v in Book.objects.all().values().annotate(mean_auth_age=Avg('authors__age')).extra(select={'m anufacture_cost' : 'price * .5'}).get(pk=2).items()if k != '_state') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 334, in get num = len(clone) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 79, in __len__ self._result_cache = list(self.iterator()) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 824, in iterator for row in self.query.get_compiler(self.db).results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "MANUFACTURE_COST": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: Book.objects.extra(select={'pub':'publisher_id'}).values('pub').annotate(Count('id')).order_by('pub') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[45]>", line 1, in <module> Book.objects.extra(select={'pub':'publisher_id'}).values('pub').annotate(Count('id')).order_by('pub') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 66, in __repr__ data = list(self[:REPR_OUTPUT_SIZE + 1]) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 81, in __len__ self._result_cache.extend(list(self._iter)) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 824, in iterator for row in self.query.get_compiler(self.db).results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "PUB": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: Book.objects.extra(select={'pub':'publisher_id','foo':'pages'}).values('pub').annotate(Count('id')).order_by('pub') Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[46]>", line 1, in <module> Book.objects.extra(select={'pub':'publisher_id','foo':'pages'}).values('pub').annotate(Count('id')).order_by('pu b') File "C:\u\kmt\django\trunk\django\db\models\query.py", line 66, in __repr__ data = list(self[:REPR_OUTPUT_SIZE + 1]) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 81, in __len__ self._result_cache.extend(list(self._iter)) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 824, in iterator for row in self.query.get_compiler(self.db).results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "PUB": invalid identifier ---------------------------------------------------------------------- File "C:\u\kmt\django\trunk\tests\regressiontests\aggregation_regress\models.py", line ?, in regressiontests.aggregation _regress.models.__test__.API_TESTS Failed example: [int(x['sheets']) for x in qs] Exception raised: Traceback (most recent call last): File "C:\u\kmt\django\trunk\django\test\_doctest.py", line 1267, in __run compileflags, 1) in test.globs File "<doctest regressiontests.aggregation_regress.models.__test__.API_TESTS[61]>", line 1, in <module> [int(x['sheets']) for x in qs] File "C:\u\kmt\django\trunk\django\db\models\query.py", line 104, in _result_iter self._fill_cache() File "C:\u\kmt\django\trunk\django\db\models\query.py", line 755, in _fill_cache self._result_cache.append(self._iter.next()) File "C:\u\kmt\django\trunk\django\db\models\query.py", line 824, in iterator for row in self.query.get_compiler(self.db).results_iter(): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 673, in results_iter for rows in self.execute_sql(MULTI): File "C:\u\kmt\django\trunk\django\db\models\sql\compiler.py", line 728, in execute_sql cursor.execute(sql, params) File "C:\u\kmt\django\trunk\django\db\backends\oracle\base.py", line 485, in execute return self.cursor.execute(query, self._param_generator(params)) DatabaseError: ORA-00904: "SHEETS": invalid identifier ---------------------------------------------------------------------- Ran 2 tests in 3.953s FAILED (failures=1, errors=1)
comment:11 by , 15 years ago
Forgot to mention those failures seem to be Oracle-only. Something in the code change seems to have broken something basic with extra select parms on Oracle....
comment:12 by , 15 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:13 by , 15 years ago
Oracle doesn't allow column aliases from the select list in the "group by" clause. This is the reason that we repeat the subquery rather than use the alias. The alternative would be to define the column aliases in a subquery in the "from" clause, which adds complexity.
But I don't think that's necessary. It appears to me that the only thing wrong here is that in this case the "group by" subquery isn't getting parenthesized as it should.
comment:14 by , 15 years ago
Needs tests: | unset |
---|
@ikelly - according to your suggestion I've created a patch which adds only parenthesizes. It works under postgres and sqlite. Can anybody check if it fixes this issue under oracle? I've also moved @tobias test to doctest.
comment:15 by , 15 years ago
Why would you go through the effort of making a doctest from a unittest? Doctests are hard to maintain and even more difficult to debug. The goal is to move away from doc tests as much as possible...
My patch will have to be updated, however, as that tests.py file in aggregation_regress was recently added by a separate ticket.
comment:16 by , 15 years ago
The test case still fails in Oracle for a variety of reasons:
- The subquery uses LIMIT, which Oracle doesn't directly support.
- Oracle also doesn't like the ORDER BY in the subquery; since it's supposed to be a scalar subquery, and LIMIT is not allowed, the clause is viewed as extraneous.
- Most problematically, Oracle doesn't actually support subqueries in the GROUP BY clause at all. The reason for the duplication in the GROUP BY clause is to support other types of expressions. For more information on this, see #10290 and the django-developers discussion about it.
However, the important thing here is that this patch isn't causing any other test failures in Oracle, and because of 3. we were already expecting this case to fail, so I would be happy with committing it. The only thing I ask is that the test case be smoothed over for Oracle -- either by making the extra select something other than a subquery (which may not be possible for this bug) or simply by skipping it when the backend is oracle.
comment:17 by , 15 years ago
@tobias - I changed your tests to doctest because I missed aggregation_regress/tests.py and wanted to be consistent with existing testing style from models.py. In patch which I'm providing now your test code is in tests.py. Sorry for this mess.
@ikelly - test should skip if oracle is used. I've not provided test for scalar expressions in extra select because they seems to work fine without this fix.
comment:19 by , 15 years ago
Cc: | added |
---|
I just ran into this bug myself, and I can say that paluh's ticket_11916.3.diff
fixed it for me perfectly. I really hope this makes it into 1.2 since I'd hate to have to patch the Django core to make my current project run long-term!
comment:20 by , 15 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:21 by , 15 years ago
comment:23 by , 13 years ago
Easy pickings: | unset |
---|---|
Resolution: | fixed |
Severity: | → Normal |
Status: | closed → reopened |
Type: | → Uncategorized |
UI/UX: | unset |
I just tested this in 1.3, 1.3.1, 1.4 with PostgreSQL and on a 1.4 demo project with SQLITE
Every time I got a whole subquery expanded in my GROUP BY
Assuming this Book model (taken fromthe aggregation docs at en/dev/topics/db/aggregation/, the link is caught as spam :-) )
class Author(models.Model): name = models.CharField(max_length=100) age = models.IntegerField() friends = models.ManyToManyField('self', blank=True) class Publisher(models.Model): name = models.CharField(max_length=300) num_awards = models.IntegerField() class Book(models.Model): isbn = models.CharField(max_length=9) name = models.CharField(max_length=300) pages = models.IntegerField() price = models.DecimalField(max_digits=10, decimal_places=2) rating = models.FloatField() authors = models.ManyToManyField(Author) publisher = models.ForeignKey(Publisher) pubdate = models.DateField() class Store(models.Model): name = models.CharField(max_length=300) books = models.ManyToManyField(Book)
if I define
qs = Book.objects.extra(select={'example': 'SELECT COUNT(*) FROM BOOK WHERE pages < 100'}).filter(price__gte=20).annotate(totpages=Sum('pages'))
then I get
>>> print qs.query SELECT (SELECT COUNT(*) FROM BOOK WHERE pages < 100) AS "example", "bugtest_book"."id", "bugtest_book"."isbn", "bugtest_book"."name", "bugtest_book"."pages", "bugtest_book"."price", "bugtest_book"."rating", "bugtest_book"."publisher_id", "bugtest_book"."pubdate", SUM("bugtest_book"."pages") AS "totpages" FROM "bugtest_book" WHERE "bugtest_book"."price" >= 20 GROUP BY "bugtest_book"."id", "bugtest_book"."isbn", "bugtest_book"."name", "bugtest_book"."pages", "bugtest_book"."price", "bugtest_book"."rating", "bugtest_book"."publisher_id", "bugtest_book"."pubdate", (SELECT COUNT(*) FROM BOOK WHERE pages < 100) <----
The code originally in django/db/models/sql/query.py is now in django/db/models/sql/compiler.py @ line 555 and is the same that was used in query.py before the patch.
comment:24 by , 13 years ago
Is there some actual error here, to me it seems the fix was just adding parentheses in there, and backends other than Oracle are supposed to work with the subquery in the GROUP BY clause.
comment:25 by , 13 years ago
Cc: | added |
---|
comment:26 by , 13 years ago
We could add "supports_labels_in_group_by" feature flag or somesuch, and add the label instead of the SQL of the extra clauses to the group by for those backends supporting this feature.
Before taking that approach this ticket needs some evidence that the current behavior is broken, or that the current code results in severely degraded query plans. So, I am closing this needsinfo. Of course, anybody having the info required can reopen this ticket.
comment:27 by , 13 years ago
Resolution: | → needsinfo |
---|---|
Status: | reopened → closed |
comment:28 by , 13 years ago
You're right, I checked the attached patch and not the committed one (I'm sorry, first time here :D )
After some more tests, I discovered that my problem was different: I had an expression with a window function ( rowid() ) that got repeated in the GROUP BY clause, giving me an error, but when I tried to use its alias (typing the query) the result was the same, window functions aren't allowed in the GROUP BY by PostgreSQL. Using the alias, the parser seems to do the exact same thing as using the query (you get the same error). I'd really need to group by a field defined in a subquery but I don't think there's a way... I guess it's .raw time :)
A couple of quick tests trying to EXPLAIN ANALYZE a (different) query once using a subquery in GROUP BY and the othe using its alias gave the same result (in PostgreSQL).
comment:29 by , 13 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → reopened |
comment:30 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Restoring original close status: this problem is fixed.
This is a duplicate of #11104.