#3214 closed Bug (fixed)
[patch] raw sql file doesn't recognize quotes correctly
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Release blocker | Keywords: | rawsql migrations |
Cc: | shaun@…, sam@…, andrehcampos@…, aseering@…, walter+django@…, vlastimil.zima@…, Christophe Pettus, sbarre, vlastimil@…, 4glitch@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Suppose I have an app foo with a model Bar.
in foo/sql/bar.postgresql_psycopg2.sql I have:
CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$ '''this will have problems if it contains an embedded ; ''' $$;
This generates the following error on syncdb:
sql /* cranedata.db.search.sql.searchmodel.postgresql_psycopg2 * * Setup for 'SearchModel': * want to have a trigger on SearchModels which sets up a trigger * on models searched, such that when rows of these models are * modified, text fields are automatically indexed. */ sql CREATE OR REPLACE FUNCTION test() RETURNS trigger LANGUAGE plpythonu AS $$ '''this will have problems if it contains an embedded ; params () Failed to install initial SQL data for search.SearchModel model: unterminated dollar-quoted string a t or near "$$ '''this will have problems if it contains an embedded ;" at character 355
Attachments (2)
Change History (54)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
The problem is the following, in django.core.management:341:
# Some backends can't execute more than one SQL statement at a time, # so split into separate statements. statements = re.compile(r";[ \t]*$", re.M)
This may be a fine workaround if the backend really doesn't deal with multiple statements, but it breaks things that have ';' in quotes. Easiest fix would be for the db to support a property: "SUPPORTS_MULTIPLE_STATEMENTS_IN_EXECUTE" or whatever, and only split if necessary.
by , 18 years ago
Attachment: | sql-statement-iterator.diff added |
---|
patch, introduces 'sql-statement-iterator' which correctly deals with quotes
comment:3 by , 18 years ago
Has patch: | set |
---|
I've attached a patch for this bug: with patch management.py will use utility "sql_statement_iterator" which splits sql into statements more carefully, respecting quotes and comments.
Also, it will change freestanding '%' to double -- otherwise these cause an error in management.py, as python DB2.0 spec treats all non-doubled percent signs as substitution targets, which isn't wanted in this situation.
With this fix I'm able to use pg_dump (with appropriate flags to use 'insert' rather than 'copy') to dump old tables, then cut and paste resulting inserts (and set sequence) into initial data.
utils/sqltools.py, where "sql_statement_iterator" lives, has doctests. Ideally I would have written unit tests for both this and new management, but, ... well maybe I'll get to it after I deploy next edition of our website :).
comment:4 by , 18 years ago
Keywords: | rawsql added |
---|---|
Needs tests: | set |
Summary: | raw sql file doesn't recognize quotes correctly → [patch] raw sql file doesn't recognize quotes correctly |
Triage Stage: | Unreviewed → Design decision needed |
comment:5 by , 18 years ago
Cc: | added |
---|
I understand this is been relegated to Tumbolia for the moment. If it ends up "accepted", please do ping me, and if I have any time at all, I'll write whatever unit tests are required. (Note: already includes doctest of sql_statement_iterator itself.)
comment:6 by , 17 years ago
Cc: | added |
---|
comment:7 by , 17 years ago
As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:
CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$ BEGIN NEW.date_created := current_timestamp; -- RETURN NEW; -- END; $$ LANGUAGE plpgsql;
comment:8 by , 17 years ago
Version: | → SVN |
---|
comment:9 by , 17 years ago
Note that the patch requires a slight reorg. to work now that management.py has been changed....
Again, if the powers that be accept this, I'll do my best to get it into shape, as I'm still using it myself.
comment:10 by , 16 years ago
Cc: | added |
---|
by , 16 years ago
Attachment: | sql-statement-iterator-updated.diff added |
---|
Updated version of shaunc's patch, that works with Django SVN as of r8255
comment:11 by , 16 years ago
Cc: | added |
---|
This bug currently also causes multiline strings to fail to parse, at least for me with PostgreSQL 8.3. This means that Sam Morris's workaround (above) doesn't work for me; as far as I know, there's no raw-SQL way of inserting stored procedures into a database? I could use a Signals hook; but that doesn't feel particularly DRY to me...
I've submitted an updated patch that works for me with Django SVN. I'm not yet very familiar with the new management-module layout; let me know if I did something silly.
I'm quite interested in seeing this bug resolved; I'm also glad to do some coding if it would help.
Django devs: It seems that there's the will and expertise to fix this bug. Is there any chance someone could spare a few moments to make a decision on it? (or, does anyone know of an alternate workaround?) Thanks; I'd certainly appreciate it!
comment:12 by , 16 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Due to variations between all the different SQL variants accepted by the backends we support (and others provide externally), as well as the complexity of SQL (and we aren't going to include a full SQL parser), it's really chasing rainbows to try and make complex stuff work here.
I think this is a situation we just aren't going to handle directly, since the hooks now exist to do it on a case-by-case basis in client code. If you want to pass in complex SQL, catch the post-syncdb signal and work directly with a django.db.backend.cursor
. You will know precisely what database you're talking to and can use the appropriate syntax and calling mechanisms.
comment:13 by , 16 years ago
Grumble grumble...
IMO, it is perfectly reasonable to say "its not our business to understand SQL". But then you shouldn't be trying to break the SQL down into statements. Rather, pass it through as a chunk. If this causes problems, then -- if you want to -- patch (hack) it in the backends that have problems. Or users of those back ends should get their db engines fixed.
-- Shaun Cutts
follow-up: 16 comment:14 by , 16 years ago
Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.
follow-up: 17 comment:15 by , 16 years ago
Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.
Also, mtredinnick: Shaun's solution sounds pretty simple to me. Has it been tried? If not, would you consider accepting a patch that implements it?
comment:16 by , 16 years ago
Replying to mtredinnick:
Yes, if the solution was the simple we would be doing it. Not all database wrappers permit multiple statements in one call. So the current stuff works for them and your stuff is possible. Win for everybody.
What I meant to suggest is: do just what you're doing now, but only do it if the db wrapper doesn't support multiple statements. (seconding aseering's question) Would you consider a patch that did this?
follow-up: 18 comment:17 by , 15 years ago
Replying to aseering@mit.edu:
Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.
I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.
Would it make sense to add a
django.backends.*.DatabaseFeatures.accepts_multiple_statements
attribute and do the statements.split()
dance only when needed?
comment:18 by , 15 years ago
Resolution: | wontfix |
---|---|
Status: | closed → reopened |
Triage Stage: | Design decision needed → Accepted |
Replying to akaihola:
Replying to aseering@mit.edu:
Curious: Does anyone watching this ticket, know offhand which DB backends/engines actually don't support multiple statements? I tend to agree with Shaun that the best long-term fix is to add support into the lower-level code.
I tested PostgreSQL, SQLite and MySQL on Ubuntu 9.04, and SQLite was the only one not supporting multiple statements.
Would it make sense to add a
django.backends.*.DatabaseFeatures.accepts_multiple_statementsattribute and do the
statements.split()
dance only when needed?
Reopening (and accepting) on the basis of this suggestion. This sounds workable to me, although it's the python binding version that is important, not the database version itself.
follow-up: 20 comment:19 by , 15 years ago
Caveat - I'm accepting the "run bulk SQL as a single block if possible" idea. I'd rather see a 'execute_bulk_sql' call on the DatabaseOperations class than a boolean flag. Making it an overridable method on the backend means that the regular expressions/splitting technique can be modified on a per-backend basis.
comment:21 by , 15 years ago
Cc: | added |
---|
As a workaround, multi-line SQL statements have to have someting other than whitespace between their semicolons and newline characters. For example:
> CREATE OR REPLACE FUNCTION create_post () RETURNS trigger AS $$ > BEGIN > NEW.date_created := current_timestamp; -- > RETURN NEW; -- > END; > $$ LANGUAGE plpgsql;
Is that a bug or a feature?
I was about to file this patch because I was getting quite puzzling "Command out of sync error"s because I had a bit of comment after a statement:
--- django/core/management/sql.py.orig 2010-02-25 12:32:04.660976707 +0100 +++ django/core/management/sql.py 2010-02-25 12:38:06.240954572 +0100 @@ -174,7 +174,7 @@ # Some backends can't execute more than one SQL statement at a time, # so split into separate statements. - statements = re.compile(r";[ \t]*$", re.M) + statements = re.compile(r";[ \t]*(?:--.*)?$", re.M) # Find custom SQL, if it's available. sql_files = [os.path.join(app_dir, "%s.%s.sql" % (opts.object_name.lower(), settings.DATABASE_ENGINE)),
But that fix will break your feature.
comment:22 by , 14 years ago
Cc: | added |
---|
comment:23 by , 14 years ago
Severity: | normal → Normal |
---|---|
Type: | defect → Bug |
comment:26 by , 13 years ago
Cc: | added |
---|
comment:27 by , 12 years ago
Status: | reopened → new |
---|
comment:28 by , 11 years ago
Cc: | added |
---|
comment:29 by , 11 years ago
Can we get this fixed at some point please? It seems quite straightforward now and it causes daily pain to so many people.
comment:30 by , 11 years ago
Sure - it will be fixed as soon as someone presents a patch. There's a call to action in Comment 18 and Comment 19, but nobody has presented a patch that implements that approach AFAICT from the rest of the ticket discussion.
If someone wants this, you're going to have to write the patch.
comment:31 by , 11 years ago
Needs tests: | unset |
---|
https://github.com/django/django/pull/1533
Open for review!
follow-up: 33 comment:32 by , 11 years ago
Question raised by Mark on the PR: With the migrations support, are .sql files even needed any more?
comment:33 by , 11 years ago
Replying to claudep:
Question raised by Mark on the PR:
With the migrations support, are .sql files even needed any more?
I have not check the entire migration and south documentation, but what about custom restraints, triggers and functions?
comment:34 by , 11 years ago
Cc: | added |
---|
comment:35 by , 11 years ago
Cc: | added |
---|
comment:36 by , 11 years ago
Patch needs improvement: | set |
---|
Repeating Andrew's answer on the pull request:
They should go away, there's a RunSQL operation you can use in migrations instead (that accepts either single statements or mutiple ones with its own internal splitter regex). I didn't get around to removing the support for them on syncdb-type apps, but they'll be ignored if you have migrations.
We should therefore test (sql like this) and document the new process replacing initial sql before closing this ticket (or close this ticket and open new one).
comment:37 by , 11 years ago
How will the migrations framework handle Database-backend-specific SQL data?
comment:38 by , 11 years ago
Yep, this same bug will still apply to the splitter inside RunSQL (though I believe it's marginally better than the old Django one).
timo: Database-specific SQL data will only work with apps without migrations; those with migrations are expected to achieve the same result using a migration to load in the data, which is much more portable and forwards-compatible.
comment:39 by , 11 years ago
At least parts of this must have been fixed in #22401, see 071c9337750b296d198cced56f3ffad0e176afb6.
comment:41 by , 11 years ago
Has patch: | unset |
---|---|
Keywords: | migrations added |
Patch needs improvement: | unset |
Severity: | Normal → Release blocker |
This ticket never tells which backend support multiple statements in cursor.execute()
and which don't. It would be interesting to check that on the four officially supported databases before going any further...
If indeed some backends need splitting, in order to close this ticket, we must implement a similar approach as #22401 in RunSQL._split_sql
.
Splitting occurs only when explicitly setting multiple=True
, which Django itself never does, even in the test suite. We should add a test for that (and skip it when sqlparse isn't available.
The patch will be easier if we include it in 1.7 because we won't need a deprecation path. Therefore, marking as a release blocker.
comment:42 by , 11 years ago
SQLite: needs splitting
PostgreSQL: does not need splitting
MySQL: does not need splitting, but has issues when some multiline SQL results are not read (cursor.nextset()
has to be used to "consume" results).
Oracle: ?
comment:43 by , 11 years ago
It would be interesting to avoid the dependency on sqlparse for backends that do not need it. Can we introduce a feature flag "execute_supports_multiple_statements"?
comment:44 by , 11 years ago
In my tentative pull request linked in comment:31, I used the approach suggested by Russell in comment:19 (a custom backend method).
comment:46 by , 11 years ago
Oracle seems to choke on \n
:
DatabaseError: ORA-00911: invalid character
comment:47 by , 11 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I think this is good to go: https://github.com/django/django/pull/2608
I ran the "test_run_sql" test on all four core database backends with and without sqlparse. It's skipped in three cases and passes in five cases.
comment:48 by , 11 years ago
Has patch: | set |
---|
comment:49 by , 11 years ago
Thanks Aymeric, nice patch!
There are however test failures in initial_sql_regress, I had to apply this:
--- a/tests/initial_sql_regress/tests.py +++ b/tests/initial_sql_regress/tests.py @@ -27,7 +27,10 @@ class InitialSQLTests(TestCase): """ connection = connections[DEFAULT_DB_ALIAS] custom_sql = custom_sql_for_model(Simple, no_style(), connection) - self.assertEqual(len(custom_sql), 9) + if connection.features.requires_sqlparse_for_splitting: + self.assertEqual(len(custom_sql), 9) + else: + self.assertEqual(len(custom_sql), 1) with connection.cursor() as cursor: for sql in custom_sql: cursor.execute(sql)
Apart from that, I think it's RFC.
comment:50 by , 11 years ago
Thanks a lot for the review!
I don't think the assert on the number on statements is useful. I'll just remove it.
What matters is that each statement creates an object and that 9 objects are created.
comment:51 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
IMHO the sql file should be just fed all at once to a cursor, rather than django attempting to parse it and feeding in pieces.
The python DB API 2.0 doc is a bit hazy on this point, but:
seems to show that multiple statements count as one "operation", at least using postgresql_psycopg2