Opened 14 hours ago

Last modified 72 minutes ago

#36248 assigned Bug

Bulk deletion of model referred to by a SET_NULL key can exceed parameter limit

Reported by: bobince Owned by: bobince
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: bobince, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

To avoid hitting DBMS-specific limits on the number of parameters that can be used in a parameterised query, Django tries to do bulk operations in limited-size batches. This includes in db.models.deletion.Collector.collect, which calls get_del_batches to split a list of objects into manageable chunks for deletion. This was done in #16426.

If there are keys on other models referring to the model being deleted, and those keys have an on_delete setting that would cause an update to that field (such as SET_NULL), the Collector's field_updates list will gather per-chunk updates to each such field. However, Collector.delete then combines the QuerySets generated for each update into a single combined filter. If there are more rows in total than the parameter limit then this will fail.

It seems straightforward to stop doing that:

@@ -481,9 +481,8 @@ class Collector:
                         updates.append(instances)
                     else:
                         objs.extend(instances)
-                if updates:
-                    combined_updates = reduce(or_, updates)
-                    combined_updates.update(**{field.name: value})
+                for update in updates:
+                    update.update(**{field.name: value})
                 if objs:
                     model = objs[0].__class__
                     query = sql.UpdateQuery(model)

However:

  • I'm not certain what the original intent was behind combining the updates here. Can there be multiple updates for some other reason than batch chunking, that we want to minimise queries for? This happened in #33928.
  • Testing it is a bit tricky since the SQLite parameter limit is now often higher than the traditional 999, and Python's sqlite3 doesn't let us read or change it. On my Ubuntu box here it's 250000, which is a bit more than is comfortable to be doing in a test. We can count queries like DeletionTests.test_large_delete does but that's not _exactly_ checking the thing we actually care about, that the number of parameters is within bounds. (It's not possible at present to read the number of parameters used from the queries_log.)

(Diversion: actually in practice I'm personally affected more by this issue in mssql-django, but there are a bunch of confounding factors around that backend's prior wonky attempts to work around the parameter limit on its own. https://github.com/microsoft/mssql-django/issues/156 has some background.)

Change History (5)

comment:1 by bobince, 14 hours ago

We can count queries like DeletionTests.test_large_delete does but [...]

Done this here FWIW: https://github.com/django/django/compare/main...bobince:django:batch_delete_with_set_null?expand=1

comment:2 by Simon Charette, 8 hours ago

Owner: set to bobince
Status: newassigned
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

The analysis and the patch here are on point (thanks bobince!) so I'll accept and mark with has_patch.

The only thing I'd change is the test so bulk_batch_size is mocked to return a small number (e.g. 10) and not only test the number of executed queries but count the number of parameters for each queries (you can use self.assertNumQueries as a context manager for that).

That would avoid having to create 200s objects unnecessarily but more importantly test the chunking logic in a backend agnostic way.

Version 0, edited 8 hours ago by Simon Charette (next)

comment:3 by bobince, 105 minutes ago

The only thing I'd change is the test so bulk_batch_size is mocked to return a small number

will do!

count the number of parameters for each queries

Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.

I guess I could scrape the SQL string for number of integers but this feels pretty fragile. Or I could add a "params_count" entry to the dicts in queries_log, but that's more intrusive than I'd like just for the benefit of one test.

comment:4 by Simon Charette, 72 minutes ago

Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.

Crap, I forgot that I ran into the exact same thing so many times and wished it would capture uninterpolated SQL and params intead. Never mind in this case.

comment:5 by Simon Charette, 72 minutes ago

Cc: Simon Charette added
Note: See TracTickets for help on using tickets.
Back to Top