Opened 9 years ago

Last modified 3 weeks ago

#25705 assigned Cleanup/optimization

Parameters are not adapted or quoted in Query.__str__

Reported by: Dmitry Dygalo Owned by: Alex
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Petr Přikryl, Ülgen Sarıkavak, Simon Charette, Mariusz Felisiak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Now it is just string interpolation of the SQL template with parameters and in most cases produces invalid queries for the following reasons:

  • No quoting
  • No adaptation. So, some python objects will be used as is, not like their SQL equivalents

Yes, there are situations, when output of Query.__str__ is equal to actual query. But for debugging reasons, it will be better to see real query here. Also it is logical and expected behavior of this method - to show actual query.

Attachments (1)

ticket25705.patch (4.8 KB ) - added by Dmitry Dygalo 9 years ago.
Initial draft

Download all attachments as: .zip

Change History (24)

comment:1 by Dmitry Dygalo, 9 years ago

Has patch: set
Needs tests: set
Owner: changed from nobody to Dmitry Dygalo
Patch needs improvement: set
Status: newassigned

by Dmitry Dygalo, 9 years ago

Attachment: ticket25705.patch added

Initial draft

comment:3 by Tim Graham, 9 years ago

This looks like a duplicate of #17741 and #25092 which are "wontfix", however, I'm not sure I see any harm in your proposal besides the question of whether or not it can be implemented on other database backends.

comment:4 by Tim Graham, 9 years ago

Triage Stage: UnreviewedAccepted

in reply to:  description comment:5 by Bernd Wechner, 6 years ago

Replying to Dmitry Dygalo:

Now it is just string interpolation of the SQL template with parameters and in most cases produces invalid queries for the following reasons:

  • No quoting
  • No adaptation. So, some python objects will be used as is, not like their SQL equivalents

Yes, there are situations, when output of Query.__str__ is equal to actual query. But for debugging reasons, it will be better to see real query here. Also it is logical and expected behavior of this method - to show actual query.

Fascinating. I inadvertantly filed a duplicate of this it seems:

https://code.djangoproject.com/ticket/30132#ticket

and I would dearly love this fixed. In essence Query.__str__ should return exactly what is logged as per here:

https://docs.djangoproject.com/en/2.1/faq/models/#how-can-i-see-the-raw-sql-queries-django-is-running

so we can reliably see and extract teh SQL of a query in a production environment in which DEBUG is not enabled!

I would gladly fix this and PR it, if I had the skills and I am close to that, as I am coding and debugging python, but single stepping into Query.__str__ didn't find me a quick easy answer and I bailed for now. So if you have any tips as to where the code is that Query.__str__ uses to generate SQL and where the code is that logs SQL to connection.queries I canbegin to look at it consider how this might be fixed.

It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:

qs=model.objects.somequeryset
sql=str(qs.query)
raw_qs=model.objects.raw(sql)

I have a test bed in which I confirmed this failure here:

https://github.com/bernd-wechner/DjangoTutorial/blob/238787c83ef8515aeeb405577980e71ff35664e8/Library/views.py

Let me know how I might help get this fixed sooner, given ti was reported 3 years ago and is still not fixed!

comment:6 by GitHub <noreply@…>, 5 years ago

In 845042b3:

Refs #25705 -- Fixed invalid SQL generated by SQLFuncMixin.as_sql() in custom_lookups tests.

Generated SQL was invalid because parameters are quoted by a driver.

comment:7 by Petr Přikryl, 4 years ago

Cc: Petr Přikryl added

comment:8 by Mariusz Felisiak, 3 years ago

IMO we should close all related tickets as duplicates:

  • #24803 was marked as a duplicate (empty strings in parameters),
  • #24991 was marked as a duplicate (range types in parameters).

comment:9 by Mariusz Felisiak, 3 years ago

Owner: Dmitry Dygalo removed
Status: assignednew

comment:10 by Ülgen Sarıkavak, 8 months ago

Cc: Ülgen Sarıkavak added

comment:11 by Alex, 5 months ago

Owner: set to Alex
Status: newassigned

comment:12 by Alex, 4 months ago

I've done some investigation.

The main issue comes to the Python DB API doesn't have a way to do this. The only way to see the query with the parameters binded correctly is after executing it.

As Mariusz commented here, only Mysql/MariaDB and Postgres have a way to do it via a mogrify function which is their own extension to the API.
Django is already using the postgres mogrify in its own compose_sql function in the search backend, schema queries and ensuring the role of the connection

In Oracle and SQLite, none of the extensions to the API they add allows do this.

I see two approaches:

  • We fix this issue for the first 3 backends and leave it as it is in Oracle and SQLite.
  • Use the mogrify function in the first three backends, and manually quote the parameters in the other two. Something similar was already attempted in this PR and it was rejected. The amount of effort needed to implement and maintain this, example on how cPython does it for SQLite, would probably be too much since this seems to be the only use case.

comment:13 by Simon Charette, 4 months ago

My concerns with fixing this issue related to comments such as comment:5

It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:

qs=model.objects.somequeryset
sql=str(qs.query)
raw_qs=model.objects.raw(sql)

We absolutely don't want to support this pattern in a context where we can't guarantee that the proper quoting is performed on all supported backends as that might result in SQL injection problems. In this sense I think that it's a good thing that sql.Query.__str__ doesn't attempt to perform the proper quoting to make it clear it should not be used for this purpose.

I'd much rather see us document sql.Query.sql_with_params(using: str = DEFAULT_DB_ALIAS) which could be used as

sql, params = qs.query.sql_with_params()
model.objects.raw(sql, params)

over spending time trying to dangerously emulate parameters quoting.

Last edited 4 months ago by Simon Charette (previous) (diff)

comment:14 by Carlton Gibson, 4 months ago

@Simon do you think we should close this ticket as wontfix.

I can recall 2 or 3 (or 4?) occasions since say DjangoCon US 2018 where a (new) contributor has tried to pick this up, only to hit the same wall. The trail of breadcrumbs gets slightly clearer each time, but, is there a realistic fix for this issue, and if not can we not say so?

in reply to:  13 comment:15 by Alex, 4 months ago

Replying to Simon Charette:

My concerns with fixing this issue related to comments such as comment:5

It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:

qs=model.objects.somequeryset
sql=str(qs.query)
raw_qs=model.objects.raw(sql)

We absolutely don't want to support this pattern in a context where we can't guarantee that the proper quoting is performed on all supported backends as that might result in SQL injection problems. In this sense I think that it's a good thing that sql.Query.__str__ doesn't attempt to perform the proper quoting to make it clear it should not be used for this purpose.

I'm not sure I understand your point. I think the original commenter was suggesting that just as a simple (and probably not ideal) way to do the test. I wouldn't consider someone doing str(qs.query) to then pass in to raw() (which the docs already warn of SQL injections) a real use case that anyone would do.
Personally, the use case I've had with this issue is printing the query to then try to format it and maybe execute it in a sql editor connected that my database while testing stuff in local. For that case your suggestion of sql_with_params wouldn't cover it. I'm not against having that method either, but I don't see it as a replacement of a correctstr(qs.query).

Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify method in 3 of the 5 supported backends.

Last edited 4 months ago by Alex (previous) (diff)

comment:16 by Simon Charette, 4 months ago

Cc: Simon Charette added

@Carlton I think the request for this feature is a symptom of a different problem so maybe we could re-purpose this ticket instead?


@Alex

I wouldn't consider someone doing str(qs.query) to then pass in to raw() (which the docs already warn of SQL injections) a real use case that anyone would do.

I don't agree on that point. From helping users through various forums in most cases the reason why they were bringing up the issue of improper quoting was when they tried executing the query as is. The duplicates of this ticket seem to support this position (see #17741 and #25092 SO) and I don't think would be hard to find other examples.

It might not be your personal use case as you are well versed into why this a bad idea but I think it would be irresponsible to our user base from an API design (newcomers footgun) and maintenance burden perspective (think of the security and bug reports about mogrify not producing the exact same SQL as backend escaping normally does).

Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify method in 3 of the 5 supported backends.

You second alternative mentioned

Use the mogrify function in the first three backends, and manually quote the parameters in the other two.

I think that would qualify as a form of emulation or best-effort wouldn't it? What guarantees do we have that the quoting is appropriate for all datatypes that these backends support as parameters? Are we feeling confident, knowing that users will attempt to pipe str(qs.query) into raw and that Query.__str__ always use DEFAULT_DB_ALIAS for compilation #25947 (even if the proper backend can often only be determined as execution time) that we are exposing the right tools to users and we can commit to them being safe for the years to come?

IMO the usage of raw(str(qs.query)), and the main motive for this ticket, is a symptom of a lack of documented way for safely building and executing the SQL and parameters from a QuerySet object which makes me believe the focus should be on documenting queryset.qs.sql_with_params() first instead.

As for the printing and copy-pasting into a shell use case we've reached an agreement that the only way to see the query with the parameters binded correctly is after executing it. Knowing that, and the fact Query.__str__ is still pretty legible even without the exact mogrifying that only leaves the copy-pasting into a shell use case. Should we make it easier to retrieve the exact SQL associated with a particular queryset instead? Something like QuerySet.executed_query: str | None that is only present on evaluated querysets.

These appears like solutions that prevent misuse of Query.__str__ and avoid maintaining correct and safe x-backend mogrifying solutions?

Last edited 4 months ago by Simon Charette (previous) (diff)

in reply to:  16 comment:17 by Alex, 4 months ago

Replying to Simon Charette:

I wouldn't consider someone doing str(qs.query) to then pass in to raw() (which the docs already warn of SQL injections) a real use case that anyone would do.

I don't agree on that point. From helping users through various forums in most cases the reason why they were bringing up the issue of improper quoting was when they tried executing the query as is. The duplicates of this ticket seem to support this position (see #17741 and #25092 SO) and I don't think would be hard to find other examples.

If the worry is raw(str(qs.query)), then not having the quoting is a security issue.
In fact I can actually do an SQL injection abusing this lack of quoting.

qs = User.objects.filter(first_name='"test")--', is_staff=True).only('id')
>>> <QuerySet []>
list(User.objects.raw(str(qs.query)))
>>> [<User: test>]
list(User.objects.raw(str(qs.query)))
>>> 'SELECT "auth_user"."id" FROM "auth_user" WHERE ("auth_user"."first_name" = "test")-- AND "auth_user"."is_staff")'

With proper quoting, as the one done when executing normally, the query is SELECT "auth_user"."id" FROM "auth_user" WHERE ("auth_user"."first_name" = \'"test")--\' AND "auth_user"."is_staff")
So if raw(str(qs.query)) is a risk, then quoting the parameters would fix it.

Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify method in 3 of the 5 supported backends.

You second alternative mentioned

Use the mogrify function in the first three backends, and manually quote the parameters in the other two.

I think that would qualify as a form of emulation or best-effort wouldn't it? What guarantees do we have that the quoting is appropriate for all datatypes that these backends support as parameters? Are we feeling confident, knowing that users will attempt to pipe str(qs.query) into raw and that Query.__str__ always use DEFAULT_DB_ALIAS for compilation #25947 (even if the proper backend can often only be determined as execution time) that we are exposing the right tools to users and we can commit to them being safe for the years to come?

And as I said just after that, that alternative was rejected in the past, would be too much effort and I was only mentioning it as the only way to fix it in every backend. I should have been clearer that I wasn't pushing for this option.

IMO the usage of raw(str(qs.query)), and the main motive for this ticket, is a symptom of a lack of documented way for safely building and executing the SQL and parameters from a QuerySet object which makes me believe the focus should be on documenting queryset.qs.sql_with_params() first instead.

That idea was rejected 14 months ago, but it could be reconsidered in light of this discussion.
Also, I didn't actually find qs.query being documented, mostly this mention about pickling querysets.

As for the printing and copy-pasting into a shell use case we've reached an agreement that the only way to see the query with the parameters binded correctly is after executing it. Knowing that, and the fact Query.__str__ is still pretty legible even without the exact mogrifying that only leaves the copy-pasting into a shell use case. Should we make it easier to retrieve the exact SQL associated with a particular queryset instead? Something like QuerySet.executed_query: str | None that is only present on evaluated querysets.

I would be happy with this option. An alternative would be an error if the queryset hasn't been evaluated instead of None. But I'm not that big of a fan of the error idea.

These appears like solutions that prevent misuse of Query.__str__ and avoid maintaining correct and safe x-backend mogrifying solutions?

However, my initial investigation was not deep enough and I found more reasons to discard the mogrify option:

  • In mysqlclient, mogrify was added in version 2.2.0 and django supports 1.4.3 onwards, so there would have to be an additional check and a difference in behavior.
  • In the MySQL Connector/Python driver I couldn't find a mogrify method. Not the recommended driver, but still supported.
  • In psicopg3, mogrify is only on the ClientCursor class. While that's the default in Django, server side is also supported. Again, an additional check. Also, I'm not sure if this means that these usages of mogrify I mentioned in the postgres backend don't work with server side parameter binding.

Django is already using the postgres mogrify in its own compose_sql function in the search backend, schema queries and ensuring the role of the connection

My proposal is to close this as a wontfix and work on the QuerySet.executed_query idea and maybe reconsider documenting sql_with_params
I'm happy to do both as long as I don't get yelled for reopenniing the wonfix sql_with_params ticket #34636

Last edited 4 months ago by Alex (previous) (diff)

comment:18 by Simon Charette, 4 months ago

Cc: Mariusz Felisiak added

So if raw(str(qs.query)) is a risk, then quoting the parameters would fix it.

I was not arguing that it is not possible to cause SQL injection today by using raw(str(qs.query)) but that the moment we do fix it by quoting parameters we must ensure to safely support this anti-pattern going forward unless we document sql_with_params as the blessed raw of doing that.

And as I said just after that, that alternative was rejected in the past, would be too much effort and I was only mentioning it as the only way to fix it in every backend. I should have been clearer that I wasn't pushing for this option.

Sorry I missed that, I thought you were pushing for this option.

That idea was rejected 14 months ago, but it could be reconsidered in light of this discussion.

I wasn't aware that documenting sql_with_params was rejected in #34636. I do think we should reconsider.

I would be happy with this option. An alternative would be an error if the queryset hasn't been evaluated instead of None. But I'm not that big of a fan of the error idea.

I don't have a strong opinion on the subject, either works for me!

My proposal is to close this as a wontfix and work on the QuerySet.executed_query idea and maybe reconsider documenting sql_with_params. I'm happy to do both as long as I don't get yelled for reopenniing the wonfix sql_with_params ticket #34636.

I'm happy to support you towards re-opening #34636 but we should likely follow the normal process of gathering a bit more consensus. It'd be great to hear from Mariusz and others what their thoughts are on the matter.

Last edited 4 months ago by Simon Charette (previous) (diff)

comment:19 by Carlton Gibson, 4 months ago

FWIW I’ve wanted sql_with_params() to be public for other reasons, so I’d be keen there is there are no blockers

Last edited 4 months ago by Alex (previous) (diff)

in reply to:  19 comment:20 by Mariusz Felisiak, 4 months ago

Replying to Carlton Gibson:

FWIW I’ve wanted sql_with_params() to be public for other reasons, so I’d be keen there is there are no blockers

I don't mind documenting sql_with_params(), but would like to avoid encouraging users to use raw SQL queries. Also, sql_with_params() is not a panacea, it has it's limitation e.g. it always uses the default database connection. For most users it may be tricky to include parameters into an SQL string which can lead to SQL injection vectors. IMO, sql_with_params() is only for advanced users, who know the risks.

comment:21 by Simon Charette, 4 months ago

Thank you for chiming in Mariusz.

I don't mind documenting sql_with_params(), but would like to avoid encouraging users to use raw SQL queries.. [snip]
For most users it may be tricky to include parameters into an SQL string which can lead to SQL injection vectors. IMO, sql_with_params() is only for advanced users, who know the risks.

I share this sentiment but I believe that there will always be a need for an escape hatch and if we don't provide a safe one users will follow the path of least resistance that sql.Query.__str__ provides. My hope is that with the documentation of sql_with_params we could even consider disallowing passing str(qs.query) to raw and cursor.execute completely to prevent its misuse. Something like

  • django/db/backends/utils.py

    diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py
    index ab0ea8258b..9874b362b3 100644
    a b  
    99
    1010from django.apps import apps
    1111from django.db import NotSupportedError
     12from django.db.utils import QueryStr
    1213from django.utils.dateparse import parse_time
    1314
    1415logger = logging.getLogger("django.db.backends")
    def _execute_with_wrappers(self, sql, params, many, executor):  
    9495    def _execute(self, sql, params, *ignored_wrapper_args):
    9596        # Raise a warning during app initialization (stored_app_configs is only
    9697        # ever set during testing).
     98        if isinstance(sql, QueryStr):
     99            raise TypeError("str(qs.query) cannot be passed directly to execute(sql).")
    97100        if not apps.ready and not apps.stored_app_configs:
    98101            warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)
    99102        self.db.validate_no_broken_transaction()
    def _execute(self, sql, params, *ignored_wrapper_args):  
    107110    def _executemany(self, sql, param_list, *ignored_wrapper_args):
    108111        # Raise a warning during app initialization (stored_app_configs is only
    109112        # ever set during testing).
     113        if isinstance(sql, QueryStr):
     114            raise TypeError("str(qs.query) cannot be passed directly to execute(sql).")
    110115        if not apps.ready and not apps.stored_app_configs:
    111116            warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning)
    112117        self.db.validate_no_broken_transaction()
  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index f00eb1e5a5..9e7cc03527 100644
    a b  
    1818
    1919from django.core.exceptions import FieldDoesNotExist, FieldError
    2020from django.db import DEFAULT_DB_ALIAS, NotSupportedError, connections
     21from django.db.utils import QueryStr
    2122from django.db.models.aggregates import Count
    2223from django.db.models.constants import LOOKUP_SEP
    2324from django.db.models.expressions import (
    class RawQuery:  
    145146    """A single raw SQL query."""
    146147
    147148    def __init__(self, sql, using, params=()):
     149        if isinstance(sql, QueryStr):
     150            raise TypeError("Query strings cannot be safely used by RawQuery. Use sql_with_params().")
    148151        self.params = params
    149152        self.sql = sql
    150153        self.using = using
    def params_type(self):  
    191194
    192195    def __str__(self):
    193196        if self.params_type is None:
    194             return self.sql
    195         return self.sql % self.params_type(self.params)
     197            return QueryStr(self.sql)
     198        return QueryStr(self.sql % self.params_type(self.params))
    196199
    197200    def _execute_query(self):
    198201        connection = connections[self.using]
    def __str__(self):  
    340343        done by the database interface at execution time.
    341344        """
    342345        sql, params = self.sql_with_params()
    343         return sql % params
     346        return QueryStr(sql % params)
    344347
    345348    def sql_with_params(self):
    346349        """
  • django/db/utils.py

    diff --git a/django/db/utils.py b/django/db/utils.py
    index e45f1db249..a5f821cee7 100644
    a b def get_migratable_models(self, app_config, db, include_auto_created=False):  
    276276        """Return app models allowed to be migrated on provided db."""
    277277        models = app_config.get_models(include_auto_created=include_auto_created)
    278278        return [model for model in models if self.allow_migrate_model(db, model)]
     279
     280
     281class QueryStr(str):
     282    """
     283    String representation of a SQL query with interpolated params that is
     284    unsafe to pass to the database in raw form.
     285    """
     286    def __str__(self):
     287        return self

Also, sql_with_params() is not a panacea, it has it's limitation e.g. it always uses the default database connection.

I don't think this is a big issue, the above discussions suggest allowing alias to be passed

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index 9e7cc03527..f952c1bb86 100644
    a b def __str__(self):  
    345345        sql, params = self.sql_with_params()
    346346        return QueryStr(sql % params)
    347347
    348     def sql_with_params(self):
     348    def sql_with_params(self, alias=DEFAULT_DB_ALIAS):
    349349        """
    350350        Return the query as an SQL string and the parameters that will be
    351351        substituted into the query.
    352352        """
    353         return self.get_compiler(DEFAULT_DB_ALIAS).as_sql()
     353        return self.get_compiler(alias).as_sql()
    354354
    355355    def __deepcopy__(self, memo):
    356356        """Limit the amount of work when a Query is deepcopied."""

With your approval I'll re-open #34636, Alex feel free to pick it up.

comment:22 by Alex, 3 months ago

Added a draft PR https://github.com/django/django/pull/18468 copying the description for posterity and easy access.

No tests done yet. I'll like to know first if the approach I did is valid.

Since the DB connections are per-thread as far as I know, this approach should not have any race condition issues. And since it's done before the prefetch, it should pick the Queryset query. Also, since queries are only logged if DEBUG is enabled, this approach requires that too.

My discarded approach was to to get it somehow in the return of self._result_cache = list(self._iterable_class(self)), or stored in the QuerySet.
However, it's much more complicated, because as far as I see the flow is something like
QuerySet when evaluating calls the IterableClass which then executes the query which then gets the connection cursor, which is wrapped in the previous mentioned one that can log the query.

So the a way would be to make all 4 SqlCompiler.execute_sql return the result and the SQL query. Then you have to adapt all 6 Iterable.__iter__ to also return the query. Then you can store it in the Queryset. Since these methods are part of the public API, this would be a breaking change in both methods. Also I'm not 100% sure how to get the query in the wrapper since the Cursor.execute is wrapped in a context manager which is the one logging the query.

Last edited 3 months ago by Alex (previous) (diff)

comment:23 by Alex, 3 weeks ago

Needs tests: unset
Patch needs improvement: unset
Note: See TracTickets for help on using tickets.
Back to Top