Opened 11 years ago

Last modified 4 years ago

#20516 new New feature

Allow use of prepared statements

Reported by: est Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: db, prepared statement
Cc: mpessas@…, aron45, Jeongsoo, Park Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,

Since we have persistant connection in Django 1.6 now, can we consider adding prepared statements?

So we can create statements upon the first query, and use the prepared statement for all the rest and so on.

http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-prepared-statements.html

This could be a huge speed up for heavy db IO applications. Reduce traffice size as well as RTT. Also improves security by parameterize queries.

Change History (15)

comment:1 by Anssi Kääriäinen, 11 years ago

Keywords: mysql removed
Summary: Prepared statement for MySQLAllow use of prepared statements
Triage Stage: UnreviewedAccepted
Version: 1.6-alpha-1master

Yes, I think addition of prepared statements is something to consider. No idea of what would be a nice API, or if this could be somehow delegated to third party apps. The latter option means some hooks are needed somewhere in the query execution path. One such point is execute_sql() where it would be fairly straightforward to change the used SQL to prepared statements if the query is set to use prepared statements.

FWIW there are use cases where prepared statements can give easily over 50% speedup. Such cases are simple to execute queries which retrieve just a few rows of data, but which have a massive amount of joins (so that planning overhead is big). Using .select_related() and complicated .filter() clauses are easy ways to produce such queries.

I don't see any reason to restrict this to MySQL only.

(This feature seems to need at least these parts:

  • API for setting given queryset to use prepared statements - using them automatically is generally a bad idea
  • Convert the query's execution to use prepared statement instead of SQL. This consists of:
    • A check if the given query is already prepared for the used connection
    • If so, use the correct prepared statement
    • If not, create a new prepared statement and use that)

comment:2 by Apostolis Bessas, 11 years ago

Cc: mpessas@… added

comment:3 by anonymous, 11 years ago

with statements seem to be the correct way to handle this from an API standpoint.

comment:4 by FunkyBob, 11 years ago

Surely the security only improves if the DB driver is faulty? Django already passes parameterised queries to the DB driver to handle escaping [or pass through parameterised, as some do].

But in general, I'm happy to throw my brain behind this effort :)

comment:5 by FunkyBob, 11 years ago

One idea that struck me the other day-

A prepared statement could yield a function with keyword arguments matching its bound parameters.

Something like:

s = MyModel.objects.filter(foobar=Param('able'), baz=Param('baker')).exclude(foo_quux=7)

Though perhaps an explicit "prepare()" would be better, the Param arguments mark where to bind parameters, and what to call them.

The, you can execute the query with:

for my in s(able=1, baker="wibble")

Version 0, edited 11 years ago by FunkyBob (next)

comment:6 by david@…, 11 years ago

This isn't just for MySQL. If possible, it would be really nice to have some facility for using server-side prepare statements not created by Django. https://github.com/dimitri/preprepare is an example of such a use case

comment:7 by Omer Katz, 10 years ago

PostgresSQL also supports prepare statements.

@FunkyBob: That syntax looks great but the PreparedQuerySet object should support a context manager.

comment:8 by FunkyBob, 10 years ago

Could you elaborate on how you see context manager working with this feature?

comment:9 by Omer Katz, 10 years ago

The way I see it a PreparedQuerySet should support the following:

  • Preparing on execution of a query for the entire session:
qs = list(MyModel.objects.filter(foo__bar=1).prepare())
qs2 = list(MyModel.objects.filter(foo__bar=2)) # Already prepared for this session (that is the connection)

The query parameters are inferred from the query itself.
This is very convenient and doesn't require anyone to learn the new syntax.

  • Preparing for a certain scope and deallocating:
with MyModel.objects.filter(foo__bar=Param('foobar')).prepare() as qs:
     results = qs(1)
     # Some logic
     results = qs(2)
     # Some logic

When the PreparedQuerySet enters the context the query is prepared. When it exists the prepared query is deallocated.
It's not always a good idea to prepare queries for a very long time for rapidly changing tables (small tables don't need to use indexes for example).

  • Preparing a query for usage in a Manager or QuerySet:
class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare()

This will turn foobars into a callable with an argument named foobar.
The query will be prepared once the first query using it will be executed.
You can prepare ahead of time:

class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare(ahead=True) # I need a better name for the argument here

You can also prepare a query that will be deallocated after a TTL:

class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare(ttl=60000)

Explicitly deallocating a prepared query should also be possible:

qs = MyModel.objects.filter(foo__bar=1).prepare()
results = list(qs)
qs2 = list(MyModel.objects.filter(foo__bar=2)) # Already prepared for this session (that is the connection)
qs.deallocate()
results = list(qs) # the query is prepared again

The query will be prepared again once the first query is executed.

What do you guys think?

comment:10 by Omer Katz, 10 years ago

I investigated about implementing it today and it turns out that the Query class has a method called sql_with_params() which returns the SQL query without concatenating the parameters.

MySQL uses the following syntax:

PREPARE usrrptplan FROM "SELECT * FROM users u, logs l WHERE u.usrid=? AND u.usrid=l.usrid
    AND l.date = ?;"
EXECUTE usrrptplan USING 1, NOW()

PostgreSQL uses the following syntax:

PREPARE usrrptplan (int) AS
    SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
    AND l.date = $2;
EXECUTE usrrptplan(1, NOW());

It seems that when you call prepare() you need to switch the query entirely with the EXECUTE syntax.
Moreover, at least in PostgreSQL you have to check if the statement is already prepared and there's no SQL syntax to do that.
You need to query the pg_prepared_statements table in order to verify that it doesn't exist.
Since a prepared statement requires a name we can either randomize one or have one provided for us.
I think that providing both options is a good idea.

comment:11 by Omer Katz, 10 years ago

As apollo13 mentioned on IRC, prepared statements are at the session level on PostgresSQL so we can just store them on the connection object.

comment:12 by Omer Katz, 10 years ago

akaariai says we can just subclass the Query object and replace the parameters with the prepared statements placeholders.

comment:13 by Shai Berger, 10 years ago

On Oracle, the concept of prepared statements is linked to a connection-level statement cache. Prepared statements are not handled with different SQL, but with different API calls -- specifically, cx_Oracle provides a prepare method on its cursors. This method is not currently used by Django's Oracle backend.

comment:14 by aron45, 9 years ago

Cc: aron45 added

comment:15 by Jeongsoo, Park, 4 years ago

Cc: Jeongsoo, Park added
Note: See TracTickets for help on using tickets.
Back to Top