Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#18023 closed Cleanup/optimization (fixed)

Stop bundling simplejson and deprecate the module

Reported by: Alex Ogier Owned by: Aymeric Augustin
Component: Core (Serialization) Version: dev
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consensus on the django-dev mailing list seemed to be to stop bundling simplejson, replace it with just a simplejson -> json fallback shim, and issue a PendingDeprecationWarning as of 1.5.

I had to leave the implementation of simplejson in django/utils/simplejson/__init__.py because though it makes much more sense to move to django/utils/simplejson.py like most other utils modules, doing so causes circular self-imports when we try to import simplejson from the system.

Attachments (4)

remove_simplejson.diff (48.3 KB ) - added by Alex Ogier 13 years ago.
remove_simplejson_absolute.diff (48.7 KB ) - added by Alex Ogier 13 years ago.
Alternate patch, uses from __future__ import absolute_import so that we can have a simplejson.py
use_json_over_simplejson.diff (16.2 KB ) - added by Alex Ogier 13 years ago.
simplejson_docs.diff (6.1 KB ) - added by Alex Ogier 13 years ago.

Download all attachments as: .zip

Change History (17)

by Alex Ogier, 13 years ago

Attachment: remove_simplejson.diff added

comment:1 by Claude Paroz, 13 years ago

Triage Stage: UnreviewedAccepted

If it doesn't eat too much time, could you also upload the alternate patch which uses simplejson when it is installed?

comment:2 by Alex Ogier, 13 years ago

Sorry I wasn't clear. This patch uses simplejson when it is installed. The only thing it does is stop django from redistributing simplejson, because when we drop 2.5 support in django 1.5 we can depend on there being a json module in the standard library.

comment:3 by Simon Charette, 13 years ago

@ogier about the circular import thing, can't you just use from __future__ import absolute_import?

comment:4 by Preston Holmes, 13 years ago

Needs documentation: set

A couple notes on documenting

Looks like docs will need updating in:

  • topics/class-based-views.txt
  • topics/serialization.txt

Should the warning text endorse the stdlib module, or just suggest you need to use a different json lib?

You will ultimately need to include the release note text in the patch, but I don't think that file has yet been created in SVN - so you can wait or take a stab at creating the initial file as part of this patch.

comment:5 by Alex Ogier, 13 years ago

I ran some performance tests to see if the conclusions in http://bugs.python.org/issue6013 really matter. If we are deprecating simplejson we should recognize the performance impact it will have. All modules have their C extensions compiled.

CPython 2.6.7:

$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))" 
1000 loops, best of 3: 583 usec per loop
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
1000 loops, best of 3: 375 usec per loop

$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))"
10000 loops, best of 3: 28.8 usec per loop
$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
10000 loops, best of 3: 34.6 usec per loop

CPython 2.7.2:

$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))"                         
10000 loops, best of 3: 25.1 usec per loop
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"      
10000 loops, best of 3: 42.5 usec per loop

$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))"
10000 loops, best of 3: 28.5 usec per loop
$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
10000 loops, best of 3: 35.2 usec per loop

Here's some data from a real application with a deep model:

CPython 2.6.7 without simplejson:

>>> import timeit
>>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]')
>>> print t.timeit(1000)
1.6712679863

CPython 2.6.7 with simplejson:

>>> import timeit
>>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]')
>>> print t.timeit(1000)
1.54401898384

So, it seems in python 2.6, simplejson is faster by at least an order of magnitude. But on the whole it doesn't matter all that much: actually serializing models is dominated by time spent elsewhere (probably the python serialization part of the process) meaning its only around 8% slower and besides, the difference goes away in 2.7.

So, while I was initially hesitant to deprecate the shim because of performance concerns I think I've come around. If someone depends critically on the speedup that simplejson provides, they can always elect not to use Django's serializers. I will go ahead and remove all references to simplejson in Django core and update the docs.

by Alex Ogier, 13 years ago

Alternate patch, uses from __future__ import absolute_import so that we can have a simplejson.py

by Alex Ogier, 13 years ago

by Alex Ogier, 13 years ago

Attachment: simplejson_docs.diff added

comment:6 by Aymeric Augustin, 13 years ago

Owner: changed from nobody to Aymeric Augustin

comment:8 by Aymeric Augustin, 13 years ago

Resolution: fixed
Status: newclosed

comment:9 by Aymeric Augustin, 12 years ago

Has patch: unset
Needs documentation: unset
Resolution: fixed
Severity: NormalRelease blocker
Status: closedreopened

This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.

Right now I don't know what the correct resolution is. In all cases it should be documented here.

Last edited 12 years ago by Aymeric Augustin (previous) (diff)

in reply to:  9 comment:10 by Alex Ogier, 12 years ago

Replying to aaugustin:

This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.

Right now I don't know what the correct resolution is. In all cases it should be documented here.

Here's what I believe is an accurate summary of the situation:

First issue: str vs. unicode

The simplejson API is documented as returning unicode strings for all JSON keys and string values. http://simplejson.readthedocs.org/en/latest/index.html#simplejson.JSONDecoder

In fact, the optional C library (installed by default when installing simplejson) makes an API-changing optimization: if the value is entirely ASCII characters, a str instance is returned instead of a unicode instance. https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c#L527

The Python standard library json package was forked from simplejson at version 1.9, and included this optimization. However, it was considered a bug and fixed in python 2.7. http://bugs.python.org/issue11982

The net result is that it is possible that some code that runs on manually installed versions of simplejson or python <= 2.6 depends on this behavior. I think that Python core's approach is correct: we should consider this a bug, because it is behavior that directly contradicts the documentation. It's worth pointing out that any code that depends on this behavior is already broken on python 2.7.

Second issue: namedtuple_as_object

This is a systemic problem with any shim that is supposed to be transparent. In this case, namedtuple_as_object is a keyword argument to simplejson.JSONEncoder that was added in simplejson 2.2. The latest version of simplejson that was merged into the python standard library is 2.0.9 and Django 1.4 shipped with simplejson 2.0.7 bundled. This means that nowhere in stock Django are there any references to this keyword. This wouldn't be a huge problem, except that Django subclasses django.utils.simplejson.JSONEncoder as django.core.serializers.json.DjangoJSONEncoder, and naturally doesn't know about this keyword argument.

This is a deeper problem than just passing along unrecognized keyword arguments to the base class in __init__() to make sure that DjangoJSONEncoder never breaks. If we want to deprecate the django.utils.simplejson shim, then DjangoJSONEncoder must become a subclass of json.JSONEncoder. This has the unfortunate effect that calls to simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder) will all fail (this is the simplejson on PyPI I'm talking about). Basically DjangoJSONEncoder will be incompatible with simplejson, which is a backwards incompatibility.

The problem is that even leaving in the shim and subclassing as we do isn't a good solution. It means that DjangoJSONEncoder needs to not only be backwards compatible through all supported Python versions, but they also must be forwards compatible with the latest simplejson releases, which have already been shown to diverge in backwards incompatible ways. (Adding an argument to a method on a base class breaks contravariance.)

If that paragraph seemed confusing, think of it this way: if DjangoJSONEncoder is a json.JSONEncoder then passing it to simplejson.dumps() is dangerous. If DjangoJSONEncoder is a simplejson.JSONEncoder then passing it to json.dumps() is dangerous.

comment:11 by Luke Plant, 12 years ago

I agree - we should just consider simplejson as buggy on all counts, and document the incompatibilities in release notes.

comment:12 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: reopenedclosed

In [610746f6cb89c13b364c27fc41ecc9592cbe1605]:

Fixed #18023 -- Documented simplejson issues.

Thanks Luke Plant for reporting the issue and Alex Ogier for thoroughly
investigating it.

comment:13 by Stavros Korokithakis, 12 years ago

I would like to revisit this, as I (django-annoying) have been bitten by the bug above (the simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder) one).

Right now, even just using django.utils.simplejson with DjangoJSONEncoder, as above, breaks. Changing the import statement in django.core.serializers to from django.utils import simplejson as json fixes this, as it imports from the same module as Django's provided JSON library, which is, at least, consistent.

I think this switch should be made, to stop modules using Django-provided serializers from breaking, at least.

EDIT: Actually, strike that. If you're deprecating the Django simplejson module, you wouldn't want to refer to it. I'll comment below with a better suggestion.

Last edited 12 years ago by Stavros Korokithakis (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top