#34459 closed Bug (fixed)
SearchVector() can return query strings that are unsafe to combine.
Reported by: | Patryk Zawadzki | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | contrib.postgres | Version: | 4.2 |
Severity: | Release blocker | Keywords: | |
Cc: | Florian Apolloner, Daniele Varrazzo, Simon Charette, Natalia Bidart | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
As the function specifically calls connection.ops.compose_sql
, the returned sql
will have all parameters inlined.
An unintended consequence is that if you pass it a value that contains a percent sign, like Value("10% OFF")
, the resulting sql
will have the %
character inlined. Such values will result in a ProgrammingError
as soon as you attempt to combine the SearchVector with any expression that relies on params
.
Depending on whether you use psycopg2 or psycopg 3, the resulting error will tell you that there are not enough params to format the query template or that there is an unescaped %
in the query.
Change History (18)
comment:1 by , 22 months ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 22 months ago
Cc: | added |
---|---|
Severity: | Normal → Release blocker |
Summary: | contrib.postgres.search.SearchVector.as_sql can return query strings that are unsafe to combine → SearchVector() can return query strings that are unsafe to combine. |
Regression in 09ffc5c1212d4ced58b708cbbf3dfbfb77b782ca.
comment:3 by , 22 months ago
David, I’m not sure if it’s “supported” or not, but my project uses SearchVector with expressions such as Value instead of column names so I see a lot of percent signs passed as the corpus of text, not the config or weight. We calculate some of the values we index in Python code and combine strings with different weights.
comment:4 by , 22 months ago
Cc: | added |
---|
comment:5 by , 22 months ago
We use compose_sql()
in different places. Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape %
when we know that the parameters are bound later, e.g.
-
django/contrib/postgres/search.py
diff --git a/django/contrib/postgres/search.py b/django/contrib/postgres/search.py index 4e370aa167..279a39e80e 100644
a b class SearchVector(SearchVectorCombinable, Func): 146 146 147 147 # These parameters must be bound on the client side because we may 148 148 # want to create an index on this expression. 149 sql = connection.ops.compose_sql(sql, config_params + params + extra_params) 149 sql = connection.ops.compose_sql( 150 sql, config_params + params + extra_params, quote_params=True 151 ) 150 152 return sql, [] 151 153 152 154 … … class SearchHeadline(Func): 321 323 if self.options: 322 324 options_params.append( 323 325 ", ".join( 324 connection.ops.compose_sql(f"{option}=%s", [value]) 326 connection.ops.compose_sql( 327 f"{option}=%s", [value], quote_params=True 328 ) 325 329 for option, value in self.options.items() 326 330 ) 327 331 ) -
django/db/backends/postgresql/operations.py
diff --git a/django/db/backends/postgresql/operations.py b/django/db/backends/postgresql/operations.py index 18cfcb29cb..8aa8bb5173 100644
a b class DatabaseOperations(BaseDatabaseOperations): 201 201 return name # Quoting once is enough. 202 202 return '"%s"' % name 203 203 204 def compose_sql(self, sql, params): 204 def quote_value(self, value): 205 if isinstance(value, str): 206 value = value.replace("%", "%%") 207 return value 208 209 def compose_sql(self, sql, params, quote_params=False): 210 if quote_params and params: 211 params = [self.quote_value(param) for param in params] 205 212 return mogrify(sql, params, self.connection) 206 213 207 214 def set_time_zone_sql(self):
Patryk, does it work for you?
follow-up: 7 comment:6 by , 22 months ago
I've confirmed Simon's patch works with this test:
-
tests/postgres_tests/test_search.py
diff --git a/tests/postgres_tests/test_search.py b/tests/postgres_tests/test_search.py index 6ec20c0654..9b5b841539 100644
a b class SearchVectorFieldTest(GrailTestData, PostgreSQLTestCase): 161 161 ) 162 162 self.assertNotIn("COALESCE(COALESCE", str(searched.query)) 163 163 164 def test_values_with_percent(self): 165 searched = Line.objects.annotate( 166 search=SearchVector(Value("This week everything is 10% off")) 167 ).filter(search="10 % off") 168 self.assertEqual(len(searched), 9) 169 164 170 165 171 class SearchConfigTests(PostgreSQLSimpleTestCase): 166 172 def test_from_parameter(self):
comment:7 by , 22 months ago
Replying to David Sanders:
I've confirmed Simon's patch works with this test
Thanks :+1: I'm not Simon ;)
comment:8 by , 22 months ago
This is probably the same issue, but wanted to check it wasn't only tangentially related and so missed.
If I have a model with a field like:
search_document = SearchVectorField(null=True)
Then I try to update that for object 123 by doing:
MyModel.objects.filter(pk=123).update(search_document=SearchVector(Value('10%')))
then with psycopg2 I get "IndexError: tuple index out of range".
and with psycopg3 I get "ProgrammingError: only '%s', '%b', '%t' are allowed as placeholders, got '%"
If I omit the % then there's no error.
comment:9 by , 22 months ago
I am a bit confused, shouldn't compose_sql
(and then mogrify
) do that automatically? ie is this a psycopg bug?
follow-up: 12 comment:10 by , 22 months ago
Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape % when we know that the parameters are bound later, e.g.
Ugh, we should probably not use compose_sql
then
comment:11 by , 22 months ago
Cc: | added |
---|
comment:12 by , 22 months ago
Replying to Florian Apolloner:
Unfortunately, it's sometimes used together with uncomposed SQL statements causing issues when binding parameters. IMO, we should escape % when we know that the parameters are bound later, e.g.
Ugh, we should probably not use
compose_sql
then
Escaping %
sounds like a reasonable compromise to me, we already do this in schema editors, e.g. in MySQL or PostgreSQL. Also, we avoid side-effects by providing the quote_params
argument.
follow-up: 14 comment:13 by , 22 months ago
Yes, but the schema editor has generally been a thing where we know we do suboptimal things and we have the generated SQL under control. Here we have (in the worst case) user input and it simply feels like opening a can of worms if we are not able to distinguish between parameters and the sql iteself clearly.
comment:14 by , 22 months ago
Replying to Florian Apolloner:
Yes, but the schema editor has generally been a thing where we know we do suboptimal things and we have the generated SQL under control. Here we have (in the worst case) user input and it simply feels like opening a can of worms if we are not able to distinguish between parameters and the sql iteself clearly.
Alternatively, we can remove compose_sql()
from SearchVector
(it was added in f83ee2a23ee28a271a50a005b62259d735fbea3f) but looks unnecessary now, see draft PR. We can add test for an index with SearchVector
to confirm this.
comment:15 by , 22 months ago
Lovely, if that passes the testsuite that would be certainly preferred. Might be that follow up changes made this unnecessary
comment:16 by , 22 months ago
Has patch: | set |
---|---|
Owner: | set to |
Status: | new → assigned |
Tentatively accepting, I'm seeing errors but only when the args to SearchVector are outside that which is documented to accept.
Felix & Simon will likely be able to clear things up as the arguments to SearchVector aren't "meant" to contain %: https://docs.djangoproject.com/en/4.2/ref/contrib/postgres/search/#changing-the-search-configuration & https://docs.djangoproject.com/en/4.2/ref/contrib/postgres/search/#weighting-queries
So, to clarify, is this the behaviour you're seeing?
There's no issue if you filter by the search vector like so:
It's when one of the arguments to SearchVector contain a percent that it becomes an issue: