Opened 18 years ago

Closed 17 years ago

Last modified 13 years ago

#3878 closed (fixed)

(JSON)-serializing utf8 data fails

Reported by: alex@… Owned by: Malcolm Tredinnick
Component: Core (Serialization) Version: 0.96
Severity: Keywords: utf8 unicode-branch
Cc: django@…, reza@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If i try to serialize data from the database (for example using fixtures), which is utf8-encoded, the JSON output will contain
unicode-escapes (\uXXXX) which will not be loaded back allright.

Example:

>>> obj = Blah()
>>> obj.test = "ö"
>>> obj.save()
./manage.py dumpdata > blah.json
./manage.py loaddata blah
>>> obj = Blah.objects.all()[0]
>>> print obj.test
blök

Attachments (1)

xml_serializer_error.txt (2.2 KB ) - added by Saik 18 years ago.
uft8 problem with xml serializer

Download all attachments as: .zip

Change History (13)

comment:1 by Gábor Farkas <gabor@…>, 18 years ago

i haven't checked the django-fixture-code, but this problem is very similar to a problem with simplejson,
so probably it is the cause:

with the simplejson serializer.
like this example:

>>> from django.utils.simplejson import dumps,loads
>>>
>>> byte_text = '\xe7\x8c\xab' # the utf-8 representation of the japanese 'cat' character
>>> uni_text = byte_text.decode('utf-8')
>>> uni_text
u'\u732b'
>>>
>>> print loads(dumps(byte_text))
u'\xe7\x8c\xab'

and of course this is wrong.
but:

>>> print loads(dumps(uni_text))
u'\u732b'

is ok.

so in short, when working with simplejson and non-ascii characters,
then all strings that go into dumps have to be unicode-strings (not bytestrings)

comment:2 by mrts, 18 years ago

Has patch: set
Needs tests: set
Version: SVN0.96

I fixed this with the following simple patch:

--- Django-0.96/django/utils/simplejson/encoder.py      2007-01-31 00:34:15.000000000 +0200
+++ /usr/lib/python2.4/site-packages/django/utils/simplejson/encoder.py 2007-04-09 18:04:29.000000000 +0300
@@ -247,7 +247,7 @@ class JSONEncoder(object):
                 encoder = encode_basestring_ascii
             else:
                 encoder = encode_basestring
-            yield encoder(o)
+            yield encoder(o.decode('utf-8'))
         elif o is None:
             yield 'null'
         elif o is True:

comment:3 by Gábor Farkas, 18 years ago

i haven't tested the patch, but unfortunately there's a problem with it:

you're assuming that the bytestring-data the user has is encoded in UTF-8.

and that's not always true.

(another approach would be to use settings.DEFAULT_CHARSET,
but that one is still not 100% correct)

but, to "bring" also good news, a django-branch has been created to switch
it completely to unicode. with that done, this problem wouldn't be there.

comment:4 by Malcolm Tredinnick, 18 years ago

Owner: changed from Jacob to Malcolm Tredinnick
Triage Stage: UnreviewedAccepted

This will be easiest to fix in the unicode branch. It's on the TODO list there. It's intended to be a short-lived sprinting branch, so I think it's best to leave this to be fixed there and then merged back.

The good news is that on that branch, your fix is absolutely the right idea, although we have some helper functions to make it easier.

Leaving the ticket open so that we remember to ensure it really is fixed.

by Saik, 18 years ago

Attachment: xml_serializer_error.txt added

uft8 problem with xml serializer

comment:5 by anonymous, 18 years ago

Saik: use the patch given above. Report back if you still have problems.

comment:6 by James Wheare, 18 years ago

Cc: django@… added

comment:7 by Malcolm Tredinnick, 18 years ago

Summary: (JSON)-serializing utf8 data fails[unicode] (JSON)-serializing utf8 data fails

comment:8 by Malcolm Tredinnick, 18 years ago

(In [5248]) unicode: Made the serializers unicode-aware. Refs #3878, #4227.

comment:9 by Malcolm Tredinnick, 18 years ago

Keywords: unicode-branch added
Summary: [unicode] (JSON)-serializing utf8 data fails(JSON)-serializing utf8 data fails

This was fixed in the unicode branch in [5248] (without changing simplejson.py at all, since that already works well with bytestrings and unicode). I'll close this ticket when the branch is merged back into trunk.

comment:10 by anonymous, 18 years ago

Cc: reza@… added

comment:11 by Malcolm Tredinnick, 17 years ago

Resolution: fixed
Status: newclosed

(In [5609]) Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

comment:12 by Martin v. Löwis, 13 years ago

In [16948]:

Dummy merge.
Merged revisions 5609-5612,5614-5626,5629-5632,5636,5638-5646,5649-5654,5658-5660,5662-5700 via svnmerge from
https://code.djangoproject.com/svn/django/trunk

........

r5609 | mtredinnick | 2007-07-04 14:11:04 +0200 (Mi, 04 Jul 2007) | 5 lines


Merged Unicode branch into trunk (r4952:5608). This should be fully
backwards compatible for all practical purposes.


Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702

........

r5610 | mtredinnick | 2007-07-04 14:25:43 +0200 (Mi, 04 Jul 2007) | 3 lines


Fixed Javascript syntax from [5608] that was causing a problem in Opera. Fixed
#4365.

........

r5611 | mtredinnick | 2007-07-04 14:31:19 +0200 (Mi, 04 Jul 2007) | 3 lines


Fixed #4766 -- Added Russian support to Javascript slug creation. Thanks,
boobsd@….

........

r5612 | mtredinnick | 2007-07-04 14:48:12 +0200 (Mi, 04 Jul 2007) | 2 lines


Fixed some ReST errors.

........

r5614 | mtredinnick | 2007-07-05 03:25:05 +0200 (Do, 05 Jul 2007) | 3 lines


Form encoding should be changed only via HttpRequest, not on GET and POST
directly.

........

r5615 | mtredinnick | 2007-07-05 05:25:11 +0200 (Do, 05 Jul 2007) | 2 lines


Fixed #4717 -- Updated Catalan translation. Thanks, marc.garcia@….

........

r5616 | mtredinnick | 2007-07-05 05:29:18 +0200 (Do, 05 Jul 2007) | 3 lines


Fixed #4753 -- Updated Spanish translation. Also move translators' names out of
PO file and into AUTHORS.

........

r5617 | mtredinnick | 2007-07-05 12:27:22 +0200 (Do, 05 Jul 2007) | 3 lines


Added a test that shows the problem in #4470. This fails only for the mysql_old
backend. Refs #4470.

........

r5618 | mtredinnick | 2007-07-05 13:08:40 +0200 (Do, 05 Jul 2007) | 3 lines


Added CACHE_MIDDLEWARE_SECONDS to global settings and documentation (it's
used by the cache middleware). Refs #1015.

........

r5619 | mtredinnick | 2007-07-05 13:10:27 +0200 (Do, 05 Jul 2007) | 5 lines


Fixed #1015 -- Fixed decorator_from_middleware to return a real decorator even
when arguments are given. This looks a bit ugly, but it's fully backwards
compatible and all the extra work is done at import time, so it shouldn't have
any real performance impact.

........

r5620 | russellm | 2007-07-05 14:54:42 +0200 (Do, 05 Jul 2007) | 2 lines


Fixed minor typo in assertion message.

........

r5621 | gwilson | 2007-07-06 06:04:42 +0200 (Fr, 06 Jul 2007) | 2 lines


Fixed #4779 -- Fixed a couple typos in the test_client_regress tests that surfaced when typo was corrected in [5620]. Thanks ferringb@….

........

r5622 | mtredinnick | 2007-07-06 08:53:27 +0200 (Fr, 06 Jul 2007) | 2 lines


Fixed #4781 -- Typo fix. Pointed out by Simon Litchfield.

........

r5623 | mtredinnick | 2007-07-06 10:04:04 +0200 (Fr, 06 Jul 2007) | 4 lines


Fixed #4770 -- Fixed some Unicode conversion problems in the mysql_old backend
with old MySQLdb versions. Tested against 1.2.0, 1.2.1 and 1.2.1p2 with only
expected failures.

........

r5624 | mtredinnick | 2007-07-06 10:35:25 +0200 (Fr, 06 Jul 2007) | 3 lines


Fixed #4782 -- Updated Slovenian translation. Thanks, Gasper Koren. Also moved
contributor names into AUTHORS file.

........

r5625 | mtredinnick | 2007-07-06 12:21:14 +0200 (Fr, 06 Jul 2007) | 4 lines


Fixed #4776 -- Fixed a problem with handling of upload_to attributes. The new
solution still works with non-ASCII filenames. Based on a patch from
mike.j.thompson@….

........

r5626 | russellm | 2007-07-07 04:16:23 +0200 (Sa, 07 Jul 2007) | 2 lines


Added some uncredited authors that worked on the Oracle branch.

........

r5629 | mtredinnick | 2007-07-07 19:15:54 +0200 (Sa, 07 Jul 2007) | 8 lines


Changed HttpRequest.path to be a Unicode object. It has already been
URL-decoded by the time we see it anyway, so keeping it as a UTF-8 bytestring
was causing unnecessary problems.


Also added handling for non-ASCII URL fragments in feed creation (the portion
that was outside the control of the Feed class was messed up).

........

r5630 | mtredinnick | 2007-07-07 20:24:27 +0200 (Sa, 07 Jul 2007) | 4 lines


Fixed #4772 -- Fixed reverse URL creation to work with non-ASCII arguments.
Also included a test for non-ASCII strings in URL patterns, although that
already worked correctly.

........

r5631 | mtredinnick | 2007-07-07 20:39:23 +0200 (Sa, 07 Jul 2007) | 3 lines


Corrected misleading comment from [5619]. Not sure what I was smoking at the
time.

........

r5632 | mtredinnick | 2007-07-08 02:39:32 +0200 (So, 08 Jul 2007) | 5 lines



Fixed reverse URL lookup using functions when the original URL pattern was a
string. This is now just as fragile as it was prior to [5609], but works in a
few cases that people were relying on, apparently.

........

r5636 | mtredinnick | 2007-07-08 13:22:53 +0200 (So, 08 Jul 2007) | 4 lines


Fixed #4798-- Made sure that function keyword arguments are strings (for the
keywords themselves) when using Unicode URL patterns.

........

r5638 | gwilson | 2007-07-10 04:34:42 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4817 -- Removed leading forward slashes from some urlconf examples in the documentation.

........

r5639 | gwilson | 2007-07-10 04:45:11 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4814 -- Fixed some whitespace issues in tutorial01, thanks John Shaffer.

........

r5640 | gwilson | 2007-07-10 05:26:26 +0200 (Di, 10 Jul 2007) | 2 lines


Fixed #4812 -- Fixed an octal escape in regular expression that is used in the isValidEmail validator, thanks batchman@….

........

r5641 | mtredinnick | 2007-07-10 14:02:06 +0200 (Di, 10 Jul 2007) | 3 lines


Fixed #4823 -- Fixed a Python 2.3 incompatibility from [5636] (it was even
demonstrated by existing tests, so I really screwed this up).

........

r5642 | mtredinnick | 2007-07-10 14:03:36 +0200 (Di, 10 Jul 2007) | 3 lines


Fixed #4804 -- Fixed a problem when validating choice lists with non-ASCII
data. Thanks, django@….

........

r5643 | mtredinnick | 2007-07-10 14:33:55 +0200 (Di, 10 Jul 2007) | 4 lines


Fixed #3760 -- Added the ability to manually set feed- and item-level id
elements in Atom feeds. This is fully backwards compatible. Based on a patch
from spark343@….

........

r5644 | mtredinnick | 2007-07-11 08:55:12 +0200 (Mi, 11 Jul 2007) | 3 lines


Fixed #4815 -- Fixed decoding of request parameters when the input encoding is
not UTF-8. Thanks, Jordan Dimov.

........

r5645 | mtredinnick | 2007-07-11 09:00:27 +0200 (Mi, 11 Jul 2007) | 3 lines


Fixed #4802 -- Updated French translation. Combined contribution from
baptiste.goupil@… and rocherl@….

........

r5646 | mtredinnick | 2007-07-11 09:12:50 +0200 (Mi, 11 Jul 2007) | 2 lines


Fixed #4753 -- Small update to Spanish translation from Mario Gonzalez.

........

r5649 | jacob | 2007-07-12 02:33:44 +0200 (Do, 12 Jul 2007) | 1 line


Fixed #4615: corrected reverse URL resolution examples in tutorial 4. Thanks for the patch, simeonf.

........

r5650 | adrian | 2007-07-12 06:43:29 +0200 (Do, 12 Jul 2007) | 1 line


Added 'New in Django development version' note to docs/syndication_feeds.txt changes from [5643]

........

r5651 | adrian | 2007-07-12 06:44:45 +0200 (Do, 12 Jul 2007) | 1 line


Edited changes to docs/tutorial04.txt from [5649]

........

r5652 | adrian | 2007-07-12 07:23:47 +0200 (Do, 12 Jul 2007) | 1 line


Added helpful error message to SiteManager.get_current() if the user hasn't set SITE_ID

........

r5653 | adrian | 2007-07-12 07:28:04 +0200 (Do, 12 Jul 2007) | 1 line


Added RequestSite class to sites framework

........

r5654 | adrian | 2007-07-12 07:29:32 +0200 (Do, 12 Jul 2007) | 1 line


Improved syndication feed framework to use RequestSite if the sites framework is not installed -- i.e., the sites framework is no longer required to use the syndication feed framework. This is backwards incompatible if anybody has subclassed Feed and overridden init(), because the second parameter is now expected to be an HttpRequest object instead of request.path

........

r5658 | russellm | 2007-07-12 09:45:35 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4459 -- Added 'raw' argument to save method, to override any pre-save processing, and modified serializers to use a raw-save. This enables serialization of DateFields with auto_now/auto_now_add. Also modified serializers to invoke save() directly on the model baseclass, to avoid any (potentially order-dependent, data modifying) behavior in a custom save() method.

........

r5659 | russellm | 2007-07-12 13:24:16 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #3770 -- Remove null=True tag from OneToOne serialization test. OneToOne fields can't have a value of null.

........

r5660 | russellm | 2007-07-12 13:27:38 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #3768 -- Disabled NullBooleanField PK serialization test. We can't and don't test null PK values.

........

r5662 | russellm | 2007-07-12 14:33:24 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4837 -- Updated Debian packaging details. Thanks for the suggestion, Yasushi Masuda <whosaysni@…>.

........

r5663 | russellm | 2007-07-12 14:44:05 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4808 -- Added Chilean regions in localflavor. Thanks, Marijn Vriens <marijn@…>.

........

r5664 | russellm | 2007-07-12 14:48:27 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4745 -- Updated docs to point out that 0 is not a valid SITE_ID when running the tests. Thanks for the suggestion, Lars Stavholm <stava@…>.

........

r5665 | russellm | 2007-07-12 14:50:02 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4763 -- Minor typo in cache documentations. Thanks, dan@….

........

r5666 | russellm | 2007-07-12 14:55:28 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4627 -- Added details on MacPorts packaging of Django. Thanks, Paul Bissex.

........

r5667 | russellm | 2007-07-12 15:23:11 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4640 -- Fixed import to stringfilter in docs. Proposed solution to move stringfilter into django.template.init introduces a circular import problem.

........

r5668 | russellm | 2007-07-12 15:32:00 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4722 -- Clarified discussion about PYTHONPATH in modpython docs. Thanks for the suggestion, Collin Grady <cgrady@…>.

........

r5669 | russellm | 2007-07-12 15:37:59 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4755 -- Modified newforms MultipleChoiceField to use list comprehension, rather than iteration.

........

r5670 | russellm | 2007-07-12 15:41:27 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4764 -- Added reference to Locale middleware in middleware docs. Thanks, dan@….

........

r5671 | russellm | 2007-07-12 15:55:19 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4768 -- Converted timesince and dateformat to use explicit floor division (pre-emptive avoidance of Python 3000 compatibility problem), and removed a redundant millisecond check. Thanks, John Shaffer <jshaffer2112@…>.

........

r5672 | russellm | 2007-07-12 16:00:13 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4775 -- Added some missing Hungarian accents to the urlify.js LATIN_MAP. Thanks, Pistahh <szekeres@…>.

........

r5673 | russellm | 2007-07-12 16:05:16 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4502 -- Clarified reference to view in tutorial. Thanks for the suggestion, Carl Karsten <carl@…>.

........

r5674 | russellm | 2007-07-12 16:11:41 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4522 -- Clarified the allowed filter arguments on the time and date filters. Thanks for the suggestion, admackin@….

........

r5675 | russellm | 2007-07-12 16:21:51 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4525 -- Fixed mistaken documentation on arguments to runfcgi. Thanks, Johan Bergstrom <bugs@…>.

........

r5676 | russellm | 2007-07-12 16:41:32 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4538 -- Split the installation instructions to differentiate between installing a distribution package and installing an official release. Thanks to Carl Karsten for the idea, and Paul Bissex for the patch.

........

r5677 | russellm | 2007-07-12 17:26:37 +0200 (Do, 12 Jul 2007) | 2 lines


Fixed #4526 -- Modified the test Client login method to fail when a user is inactive. Thanks, marcin@….

........

r5678 | russellm | 2007-07-13 07:03:33 +0200 (Fr, 13 Jul 2007) | 2 lines


Fixed #3505 -- Added handling for the error raised when the user forgets the comma in a single element tuple when defining AUTHENTICATION_BACKENDS. Thanks for the help identifying this problem, Mario Gonzalez <gonzalemario@…>.

........

r5679 | mtredinnick | 2007-07-13 10:52:07 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #2591 -- Fixed a problem with inspectdb with psycopg2 (only). Patch from
Gary Wilson.

........

r5680 | mtredinnick | 2007-07-13 11:09:59 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4807 -- Fixed a couple of corner cases in decimal form input validation.
Based on a suggestion from Chriss Moffit.

........

r5681 | mtredinnick | 2007-07-13 11:14:51 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4839 -- Added repr methods to URL classes that show the pattern they
contain. Thanks, Thomas Güttler.

........

r5682 | mtredinnick | 2007-07-13 12:56:30 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4842 -- Added slightly more robust error reporting. Thanks, Thomas
Güttler.

........

r5683 | mtredinnick | 2007-07-13 13:05:01 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4846 -- Fixed some Python 2.3 encoding problems in the admin interface.
Based on a patch from daybreaker12@….

........

r5684 | mtredinnick | 2007-07-13 14:03:20 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4861 -- Removed some duplicated logic from the newforms RegexField by
making it a subclass of CharField. Thanks, Collin Grady.

........

r5685 | mtredinnick | 2007-07-13 15:15:35 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4865 -- Replaced a stray generator comprehension with a list
comprehension so that we don't break Python 2.3.

........

r5686 | mtredinnick | 2007-07-13 16:13:35 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4469 -- Added slightly more informative error messages to max- and
min-length newform validation. Based on a patch from A. Murat Eren.

........

r5687 | mtredinnick | 2007-07-13 16:14:47 +0200 (Fr, 13 Jul 2007) | 2 lines


Added author credit for [5686]. Refs #4469.

........

r5688 | mtredinnick | 2007-07-13 16:33:46 +0200 (Fr, 13 Jul 2007) | 3 lines


Fixed #4484 -- Fixed APPEND_SLASH handling to handle an empty path value.
Thanks, VesselinK.

........

r5689 | mtredinnick | 2007-07-13 16:40:39 +0200 (Fr, 13 Jul 2007) | 2 lines


Fixed #4556 -- Stylistic changes to [5500]. Thanks, glin@….

........

r5690 | gwilson | 2007-07-13 22:36:01 +0200 (Fr, 13 Jul 2007) | 2 lines


Refs #2591 -- Removed int conversion and try/except since the value in the single-item list is already an int. I overlooked this in my original patch, which was applied in [5679].

........

r5691 | adrian | 2007-07-13 23:20:07 +0200 (Fr, 13 Jul 2007) | 1 line


Documented the 'commit' argument to save() methods on forms created via form_for_model() or form_for_instance()

........

r5692 | mtredinnick | 2007-07-14 07:27:22 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4869 -- Added a note that syncdb does not alter existing tables. Thanks,
James Bennett.

........

r5693 | mtredinnick | 2007-07-14 14:48:24 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4863 -- Removed comment references to a no-longer present link. Pointed
out by Thomas Güttler.

........

r5694 | mtredinnick | 2007-07-14 15:14:28 +0200 (Sa, 14 Jul 2007) | 2 lines


Fixed #4862 -- Fixed invalid Javascript creation in popup windows in admin.

........

r5695 | mtredinnick | 2007-07-14 15:39:41 +0200 (Sa, 14 Jul 2007) | 2 lines


Fixed a problem with translatable strings from [5686].

........

r5696 | mtredinnick | 2007-07-14 16:47:14 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4731 -- Changed management.setup_environ() so that it no longer assumes
the settings module is called "settings". Patch from SmileyChris.

........

r5697 | mtredinnick | 2007-07-14 16:50:35 +0200 (Sa, 14 Jul 2007) | 3 lines


Fixed #4870 -- Removed unneeded import and fixed a docstring in an example.
Thanks, Collin Grady.

........

r5698 | adrian | 2007-07-14 18:58:54 +0200 (Sa, 14 Jul 2007) | 1 line


Edited docs/db-api.txt changes from [5658]

........

r5699 | adrian | 2007-07-14 19:04:30 +0200 (Sa, 14 Jul 2007) | 1 line


Negligible capitalization fix in test/client.py docstring

........

r5700 | russellm | 2007-07-15 06:41:59 +0200 (So, 15 Jul 2007) | 2 lines


Clarified the documentation on the steps that happen during a save, and how raw save affects those steps.

........

Note: See TracTickets for help on using tickets.
Back to Top