Opened 16 years ago
Closed 7 years ago
#11487 closed Bug (needsinfo)
Oracle encoding bug when saving more than 4000 characters
Reported by: | Marcos Daniel Petry | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | oracle database nclob clob |
Cc: | Erin Kelly, Goldan, shai@…, felixx | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I am working on a project where I have to store a large amount of
content, (html) in a record from a table, I am using Oracle as the
database
What is very strange ... is put a text reasonably small, (around 3000
characters) and it works correctly, save without problems, but
doubling this size, this content is saved completely changed.
This patch works well on: Django 1.1 (svn), debian, and Oracle 10g
with cx_oracle 4.4.1
Attachments (10)
Change History (51)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
milestone: | 1.1 |
---|
Please upload a proper patch; finding the change by eye can be tricky. If you need help, check out the contributor's guide or ask someone in IRC. Thanks!
by , 16 years ago
by , 16 years ago
comment:3 by , 16 years ago
milestone: | → 1.1 |
---|---|
Needs tests: | set |
comment:4 by , 16 years ago
Summary: | Oracle does not register in the batabase items with more than 4000 characters → Oracle encoding bug when saving more than 4000 characters |
---|---|
Triage Stage: | Unreviewed → Accepted |
Version: | 1.0 → SVN |
Glad you were able to get this working. Would you mind testing out for me the same change but using Database.CLOB rather than Database.LONG_STRING? The reason is I'm concerned that LONG_STRING might cause problems in other configurations. Also, the patch in #10566 already changes it from NCLOB to CLOB, so I'd prefer a solution that both tickets can agree on.
Also, since I can't reproduce this myself, would you please write up a test case for this? It doesn't need to be complicated. Just inserting a large string, reading it back, and checking that it's the same would be fine. The main thing is we need to be sure the string used will actually trigger the bug.
Thanks!
comment:5 by , 16 years ago
Using CLOB also works perfectly! I've added the file with the diff (base.2.diff) and source code (base.2.py)
follow-up: 7 comment:6 by , 16 years ago
Please upload the diffs from the root of the source tree, not just for the single file. Also there's no need to keep uploading the source file, just the diff is enough.
comment:7 by , 16 years ago
Replying to Alex:
Please upload the diffs from the root of the source tree, not just for the single file. Also there's no need to keep uploading the source file, just the diff is enough.
Sorry Alex, I'm still adjusting to the process of submission of patches
I believe that now the file is ok: base.3.diff
follow-up: 9 comment:8 by , 16 years ago
I belive it should say "as _a_ CLOB" (just a grammar nitpick ;))
comment:9 by , 16 years ago
Replying to dg:
I belive it should say "as _a_ CLOB" (just a grammar nitpick ;))
I just attached a patch with the correct grammar.
comment:11 by , 16 years ago
This ticket is still open because no code has been committed to the Django repository.
comment:12 by , 16 years ago
And it hasn't been committed yet because it still needs a regression test. Tests aren't optional.
comment:13 by , 16 years ago
Cc: | added |
---|
comment:14 by , 16 years ago
Cc: | added |
---|
comment:15 by , 16 years ago
Can you give an example of string that triggers the bug?
I tried to replicate it but with no success, and without string triggering it I cannot verify if the test I tried to write is proper.
Using Oracle 11, Python 2.6, Django SVN (and 1.0.2).
comment:16 by , 16 years ago
Sorry for the anonymous comment from yesterday.
I managed to replicate the bug using Oracle 10, Python 2.6.2, Django SVN, cx_Oracle 4.4.1, which might indicate the improper behavior is connected with the version of Oracle/ cx_Oracle.
I wrote a simple test case as suggested by ikelly, if there is something that should be corrected in it please let me know.
follow-ups: 18 19 comment:17 by , 16 years ago
Needs tests: | unset |
---|
Thanks, rafax. I've actually tested with the same set of versions and not reproduced the bug, so there must be something more to it. I suspect it has something to do with the database encodings.
About the test case, it appears to be saving a string of exactly 4000 characters, not more than 4000 characters. Is that intentional? Also, does it still replicate if you use an NCLOB column instead of a CLOB column? Since that is the default for Django, that is what we should be testing.
comment:18 by , 16 years ago
Replying to ikelly:
Thanks, rafax. I've actually tested with the same set of versions and not reproduced the bug, so there must be something more to it. I suspect it has something to do with the database encodings.
I was able to reproduce the bug using Oracle 11, Python 2.5.4, and cx_Oracle 4.4.1. However, when using cx_Oracle 5.0.1 it's not a problem. I'm doing this on Windows, so this may be why I can reproduce.
About the test case, it appears to be saving a string of exactly 4000 characters, not more than 4000 characters. Is that intentional? Also, does it still replicate if you use an NCLOB column instead of a CLOB column? Since that is the default for Django, that is what we should be testing.
I thought that at first, but if you evaluate len(''.join([unicode(x) for x in xrange(4000)]))
you see it is really 14,890 characters long. I also updated the test to use a NCLOB
column, and the test case still fails without the patch.
Ian, look over my patch and make sure it doesn't break anything else -- I've run the full test suite but I want a second run on a different system to make sure there are no unintended side-effects.
comment:19 by , 16 years ago
Replying to ikelly:
Thanks, rafax. I've actually tested with the same set of versions and not reproduced the bug, so there must be something more to it. I suspect it has something to do with the database encodings.
Or with the OS you run - I was able to reproduce on my work machine running Windows XP (with default CP-1250 encoding), while on my personal notebook running Ubuntu (with UTF based locale) it works OK.
About the test case, it appears to be saving a string of exactly 4000 characters, not more than 4000 characters. Is that intentional?
I thought it will certainly be longer than 4000 this way, in fact something like
>>> len(''.join(unicode(x) for x in xrange(2000))) 6890
should be enough.
comment:20 by , 16 years ago
11487_fix.diff also fixes this test failure:
====================================================================== FAIL: Doctest: regressiontests.model_regress.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "d:\u\kmt\django\trunk\django\test\_doctest.py", line 2180, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for regressiontests.model_regress.models.__test__.API_TESTS File "D:\u\kmt\django\trunk\tests\regressiontests\model_regress\models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "D:\u\kmt\django\trunk\tests\regressiontests\model_regress\models.py", line ?, in regressiontests.model_regress.models.__test__.API_TESTS Failed example: len(a4.article_text) Expected: 5000 Got: 2500
for my test install of Oracle (also on a Windows box). I've seen this failure ever since I set up a test Oracle install, but never found time to dig into what might be causing it.
comment:21 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:22 by , 16 years ago
comment:23 by , 14 years ago
Cc: | added |
---|---|
Keywords: | nclob clob added |
Resolution: | fixed |
Status: | closed → reopened |
This issue seems to be a bit more complicated.
I'm using Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production and Django development version.
I'm experiencing the following error when inserting more than 4000 characters (I'm trying to add a new flatpage):
DatabaseError at /admin/flatpages/flatpage/1/ ORA-12704: character set mismatch
I've tried to change the string in oracle/base.py (self.input_size = Database.CLOB, see attachment long_string.diff) to LONG_STRING and NCLOB. All three variants produce errors. I'm struggling with the issue for several days, so I've tested on various cx_Oracle versions.
I've made tests on three cx_Oracle versions: 4.4.1, 5.0.4 and 5.0.4 with WITH_UNICODE compilation flag.
I've made tests on two strings: cyrillic string of length 10350 symbols (the string can be found on http://dumpz.org/37744/text/) and a simple latin string of 16004 symbols "a".
During the tests, I was getting three types of errors:
- (hieroglyphs) The text is being saved without errors, but it is displayed as hieroglyphs (e.g. 慡)
- (mismatch) The text is not saved, and the following error is returned:
DatabaseError at /admin/flatpages/flatpage/1/ ORA-12704: character set mismatch
- (UnicodeDecodeError) The text is saved, but when trying to display it, an exception is raised:
UnicodeDecodeError at /admin/flatpages/flatpage/1/ 'utf16' codec can't decode bytes in position 44-45: illegal encoding
The distribution of types of errors is the following:
- Cyrillic string:
1.1. cx_Oracle 4.4.1.
CLOB: mismatch,
LONG_STRING: hieroglyphs,
NCLOB: hieroglyphs.
1.2. cx_Oracle 5.0.4.
CLOB: mismatch,
LONG_STRING: UnicodeDecodeError,
NCLOB: UnicodeDecodeError.
1.3. cx_Oracle 5.0.4 + WITH_UNICODE.
CLOB: mismatch,
LONG_STRING: UnicodeDecodeError,
NCLOB: mismatch.
- Latin string:
2.1. cx_Oracle 4.4.1.
CLOB: mismatch,
LONG_STRING: hieroglyphs,
NCLOB: hieroglyphs.
2.2. cx_Oracle 5.0.4.
CLOB: mismatch,
LONG_STRING: hieroglyphs,
NCLOB: hieroglyphs.
2.3. cx_Oracle 5.0.4 + WITH_UNICODE.
CLOB: mismatch,
LONG_STRING: hieroglyphs,
NCLOB: mismatch.
No problems occur if I save a short cyrillic string (10 symbols) on any versions.
A possible workaround is to change all NCLOB fields in the database to CLOBs. It seems to resolve the issue (I have not performed full testing, just a simple one), but I'm not aware of side effects.
by , 14 years ago
Attachment: | long_string.diff added |
---|
A string that is being patched (variants: LONG_STRING and NCLOB)
comment:24 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:25 by , 14 years ago
Easy pickings: | unset |
---|
Same problem here and changing column to CLOB resolved my issue with ORA-12704: character set mismatch
comment:28 by , 12 years ago
Status: | reopened → new |
---|
comment:30 by , 12 years ago
Cc: | removed |
---|
comment:31 by , 12 years ago
Cc: | added |
---|
comment:32 by , 12 years ago
Cc: | removed |
---|
comment:33 by , 12 years ago
We're seeing this when putting more than 4 message.success(..) on a page. It seems the success messages go onto the session, and the session_data gets too big.
The len() of the string going into the SESSION_DATA field is 2120, but that's a unicode string, so I _strongly suspect_ the byte length of the data going into the django_session.session_data column is twice that. Looking at the HEX values of django_session.session_data already in the database, we're seeing 0x6C00 for "1", for example.
We are using TextFields in our models with no problem, even with large chunks of text, and even when creating a new row. We see the sql being run to save the session is
u'INSERT INTO "DJANGO_SESSION" ("SESSION_KEY", "SESSION_DATA", "EXPIRE_DATE") SELECT %s, %s, %s FROM DUAL'
from a call to cursor.execute(sql, params)
at line 937 of django\db\models\sql\compiler.py in execute_sql()
We note that there's a bulk_insert_sql() in django.db.backends.oracle.base.py (line 411) that's doing inserts from selects off dual.
We're using Oracle 11g, Django 1.5, Python 2.7.3, the python running under Windows 7 Professional (64 bit), Service Pack 1.
comment:34 by , 12 years ago
and the NLS database parameters of interest are:
NLS_CHARACTERSET AL32UTF8
NLS_NCHAR_CHARACTERSET AL16UTF16
and django_session.session_data is nclob
comment:35 by , 11 years ago
Needs tests: | set |
---|
I'm not an Oracle expert to comment on the fix, but I don't see a test that's integrated as part of Django's test suite so marking as "needs tests".
comment:36 by , 11 years ago
The test is at backends.OracleChecks.test_long_string and was added in [11285]. I note that it claims to test strings longer than 4000 characters but actually tests a string of exactly 4000 characters; this is probably not important as the Oracle limitation is actually 4000 bytes, and the string in question is more than 4000 bytes in any encoding.
The test is probably not reliable though, as the details of reproducing this issue seem to be dependent on the configuration of the database.
comment:37 by , 11 years ago
Sorry, I was referring to a test for the patch that was added as an attachment since the ticket was reopened (long_string.diff
).
comment:38 by , 9 years ago
Is there any update on this issue? we are still experiencing this issue with 2000 characters. It happens usually when it has unicode characters in it. I read about the work around to solve this issue but its not an ideal solution, Does anyone have a fix for this.
comment:39 by , 9 years ago
We are currently experiencing the same issue while inserting large strings (between 2000-4000 characters) into the Oracle database and it looks like a few tickets were opened which have been closed as duplicate of this.The NLS_NCHAR_CHARACTERSET on our database is AL16UTF16 which assigns 2 bytes for a char .From digging in a little deeper, it looks like a string is mapped to cx_Oracle.STRING which is then mapped to either a VARCHAR, NVARCHAR or LONG in Oracle and the conversion to long in case of long values is causing the error. It looks like the issue with 4000 characters was fixed by setting the input size to cx_Oracle.CLOB when it reached the character limit. Using 2000( for utf-16) seems to work fine and solve the problem. Would setting the comparison value to 1000 (taking into consideration other encoding formats) before setting it to CLOB be the fix for this issue?
Environment: Oracle 10g, Django 1.7.7, cx_Oracle 5.2
comment:41 by , 7 years ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
Apparently this hasn't been a problem lately, and the original description and suggested solution have been made largely irrelevant.
If you have a similar problem you can reproduce, please re-open, or, more usefully, just open a new ticket.
I simply changed the line 364 by placing the following contents:
self.input_size = Database.LONG_STRING