#18023 closed Cleanup/optimization (fixed)
Stop bundling simplejson and deprecate the module
Reported by: | Alex Ogier | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Core (Serialization) | Version: | dev |
Severity: | Release blocker | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consensus on the django-dev mailing list seemed to be to stop bundling simplejson, replace it with just a simplejson -> json fallback shim, and issue a PendingDeprecationWarning as of 1.5.
I had to leave the implementation of simplejson in django/utils/simplejson/__init__.py
because though it makes much more sense to move to django/utils/simplejson.py
like most other utils modules, doing so causes circular self-imports when we try to import simplejson
from the system.
Attachments (4)
Change History (17)
by , 13 years ago
Attachment: | remove_simplejson.diff added |
---|
comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 13 years ago
Sorry I wasn't clear. This patch uses simplejson when it is installed. The only thing it does is stop django from redistributing simplejson, because when we drop 2.5 support in django 1.5 we can depend on there being a json module in the standard library.
comment:3 by , 13 years ago
@ogier about the circular import thing, can't you just use from __future__ import absolute_import
?
comment:4 by , 13 years ago
Needs documentation: | set |
---|
A couple notes on documenting
Looks like docs will need updating in:
- topics/class-based-views.txt
- topics/serialization.txt
Should the warning text endorse the stdlib module, or just suggest you need to use a different json lib?
You will ultimately need to include the release note text in the patch, but I don't think that file has yet been created in SVN - so you can wait or take a stab at creating the initial file as part of this patch.
comment:5 by , 13 years ago
I ran some performance tests to see if the conclusions in http://bugs.python.org/issue6013 really matter. If we are deprecating simplejson we should recognize the performance impact it will have. All modules have their C extensions compiled.
CPython 2.6.7:
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))" 1000 loops, best of 3: 583 usec per loop $ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))" 1000 loops, best of 3: 375 usec per loop $ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))" 10000 loops, best of 3: 28.8 usec per loop $ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))" 10000 loops, best of 3: 34.6 usec per loop
CPython 2.7.2:
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))" 10000 loops, best of 3: 25.1 usec per loop $ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))" 10000 loops, best of 3: 42.5 usec per loop $ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))" 10000 loops, best of 3: 28.5 usec per loop $ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))" 10000 loops, best of 3: 35.2 usec per loop
Here's some data from a real application with a deep model:
CPython 2.6.7 without simplejson:
>>> import timeit >>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]') >>> print t.timeit(1000) 1.6712679863
CPython 2.6.7 with simplejson:
>>> import timeit >>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]') >>> print t.timeit(1000) 1.54401898384
So, it seems in python 2.6, simplejson is faster by at least an order of magnitude. But on the whole it doesn't matter all that much: actually serializing models is dominated by time spent elsewhere (probably the python serialization part of the process) meaning its only around 8% slower and besides, the difference goes away in 2.7.
So, while I was initially hesitant to deprecate the shim because of performance concerns I think I've come around. If someone depends critically on the speedup that simplejson provides, they can always elect not to use Django's serializers. I will go ahead and remove all references to simplejson in Django core and update the docs.
by , 13 years ago
Attachment: | remove_simplejson_absolute.diff added |
---|
Alternate patch, uses from __future__ import absolute_import
so that we can have a simplejson.py
by , 13 years ago
Attachment: | use_json_over_simplejson.diff added |
---|
by , 13 years ago
Attachment: | simplejson_docs.diff added |
---|
comment:6 by , 13 years ago
Owner: | changed from | to
---|
comment:8 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
follow-up: 10 comment:9 by , 13 years ago
Has patch: | unset |
---|---|
Needs documentation: | unset |
Resolution: | fixed |
Severity: | Normal → Release blocker |
Status: | closed → reopened |
This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.
Right now I don't know what the correct resolution is. In all cases it should be documented here.
comment:10 by , 13 years ago
Replying to aaugustin:
This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.
Right now I don't know what the correct resolution is. In all cases it should be documented here.
Here's what I believe is an accurate summary of the situation:
First issue: str vs. unicode
The simplejson API is documented as returning unicode strings for all JSON keys and string values. http://simplejson.readthedocs.org/en/latest/index.html#simplejson.JSONDecoder
In fact, the optional C library (installed by default when installing simplejson) makes an API-changing optimization: if the value is entirely ASCII characters, a str
instance is returned instead of a unicode
instance. https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c#L527
The Python standard library json
package was forked from simplejson at version 1.9, and included this optimization. However, it was considered a bug and fixed in python 2.7. http://bugs.python.org/issue11982
The net result is that it is possible that some code that runs on manually installed versions of simplejson or python <= 2.6 depends on this behavior. I think that Python core's approach is correct: we should consider this a bug, because it is behavior that directly contradicts the documentation. It's worth pointing out that any code that depends on this behavior is already broken on python 2.7.
Second issue: namedtuple_as_object
This is a systemic problem with any shim that is supposed to be transparent. In this case, namedtuple_as_object
is a keyword argument to simplejson.JSONEncoder
that was added in simplejson 2.2. The latest version of simplejson that was merged into the python standard library is 2.0.9 and Django 1.4 shipped with simplejson 2.0.7 bundled. This means that nowhere in stock Django are there any references to this keyword. This wouldn't be a huge problem, except that Django subclasses django.utils.simplejson.JSONEncoder
as django.core.serializers.json.DjangoJSONEncoder
, and naturally doesn't know about this keyword argument.
This is a deeper problem than just passing along unrecognized keyword arguments to the base class in __init__()
to make sure that DjangoJSONEncoder
never breaks. If we want to deprecate the django.utils.simplejson
shim, then DjangoJSONEncoder
must become a subclass of json.JSONEncoder
. This has the unfortunate effect that calls to simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder)
will all fail (this is the simplejson
on PyPI I'm talking about). Basically DjangoJSONEncoder
will be incompatible with simplejson, which is a backwards incompatibility.
The problem is that even leaving in the shim and subclassing as we do isn't a good solution. It means that DjangoJSONEncoder needs to not only be backwards compatible through all supported Python versions, but they also must be forwards compatible with the latest simplejson releases, which have already been shown to diverge in backwards incompatible ways. (Adding an argument to a method on a base class breaks contravariance.)
If that paragraph seemed confusing, think of it this way: if DjangoJSONEncoder
is a json.JSONEncoder
then passing it to simplejson.dumps()
is dangerous. If DjangoJSONEncoder
is a simplejson.JSONEncoder
then passing it to json.dumps()
is dangerous.
comment:11 by , 13 years ago
I agree - we should just consider simplejson as buggy on all counts, and document the incompatibilities in release notes.
comment:12 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:13 by , 12 years ago
I would like to revisit this, as I (django-annoying) have been bitten by the bug above (the simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder)
one).
Right now, even just using django.utils.simplejson
with DjangoJSONEncoder
, as above, breaks. Changing the import statement in django.core.serializers
to from django.utils import simplejson as json
fixes this, as it imports from the same module as Django's provided JSON library, which is, at least, consistent.
I think this switch should be made, to stop modules using Django-provided serializers from breaking, at least.
If it doesn't eat too much time, could you also upload the alternate patch which uses simplejson when it is installed?