Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#34091 closed Bug (wontfix)

Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from multiple threads

Reported by: Marti Raudsepp Owned by: nobody
Component: Database layer (models, ORM) Version: 4.1
Severity: Normal Keywords: race-condition, orm, threading, SQLCompiler
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I am experiencing very rare cases, where Django generates invalid SQL for select_related() tables. This occurs in a place where multiple threads are trying to execute the same QuerySet at the same time. (The access from multiple threads was unintentional, but I believe this deserves to be fixed in Django anyway)

When this occurs, tables/joins in the FROM clause can be missing entirely, but they are still referenced in the SELECT clause, and the query fails, for example sqlite3.OperationalError: no such column: app_fk18.id or ORA-00904: "xxx"."xxx": invalid identifier

I could not reproduce this with vanilla Django, but after patching Django to add time.sleep(0) in SQLCompiler.compile(), I can reliably reproduce this.

The reproducer and longer explanation is here: https://github.com/intgr/bug-reports/tree/main/django-query-race-condition

I suspect some mutation of the Query object is going on during SQL compilation, but by efforts to narrow down the mutation have been unsuccessful so far.

Change History (4)

comment:1 by Simon Charette, 2 years ago

The access from multiple threads was unintentional, but I believe this deserves to be fixed in Django anyway

Thank you for your report and investigation but I disagree with your conclusion.

Django doesn't provide any thread safety guarantees for ORM objects (model instances, querysets) and it's your responsibility to ensure that copies of these objects are used in threads instead of sharing them. That's the reason why thread safe built-in abstractions that interact with the ORM and share references to objects (e.g. generic model views) make sure they create copies of the references before manipulating them.

Even if we were to have the compiler make a copy of its query before altering its alias counts and then setting it back it's likely only one of the locations that make query compilation non-thread safe.

We can't accept this ticket without committing to making the ORM entirely thread-safe with the performance and complexity penalties it incurs. If you disagree with my conclusion feel free to open a thread on the developers mailing list to discuss the subject further.

comment:2 by Simon Charette, 2 years ago

Resolution: wontfix
Status: newclosed

comment:3 by Marti Raudsepp, 2 years ago

Django doesn't provide any thread safety guarantees for ORM objects (model instances, querysets) and it's your responsibility

Do document this! Most Django APIs seem to be thread-safe, how is a user supposed to know, where the limits of thread (un)safety are?

make sure ​they create copies of the references before manipulating them.

So QuerySet.all() is thread-safe, but iterating over QuerySet is not? How is one supposed to know?

Before investigating this, I explicitly searched to figure out whether QuerySets are thread safe, the only result that was sort-of authoritative was https://code.djangoproject.com/ticket/11906 and since it's marked as "fixed", it looked like this API aims to support thread safety.

comment:4 by Simon Charette, 2 years ago

Do document this! Most Django APIs seem to be thread-safe, how is a user supposed to know, where the limits of thread (un)safety are?

I was under the impression that this was formally documented but just like multi-processing gotchas when dealing with connections it doesn't seem to be the case.

So QuerySet.all() is thread-safe, but iterating over QuerySet is not? How is one supposed to know?

Accessing shared_qs: QuerySet in a thread and calling thread_qs = shared_qs.all() creates a copy of the queryset as document hence threads don't share the same instance.

he only result that was sort-of authoritative was #11906 and since it's marked as "fixed", it looked like this API aims to support thread safety.

I again don't agree with your conclusion here. If you read the ticket's evolution it was marked as fixed because QuerySet._fill_cache was removed during a refactor, not because thread safety guarantees for ORM objects changed. There's even a comment in there that points to a page that is now long gone and stated

However, QuerySets are known not to be thread-safe, see #11906. Usually that does not pose problems as they are (or should be) not shared between threads in Django. The exception to that rule is the use of exotic global/class-level/shared instance variable querysets in your own code (e.g. when using the ORM outside of the Django dispatch system), where you are assumed to know what you are doing and protect them appropriately anyway.

Note: See TracTickets for help on using tickets.
Back to Top