Opened 12 years ago
Closed 11 years ago
#19980 closed Bug (fixed)
Signer broken when SECRET_KEY contains non-ASCII bytes
Reported by: | Vincent Driessen | Owned by: | MattBlack |
---|---|---|---|
Component: | Core (Other) | Version: | 1.5 |
Severity: | Normal | Keywords: | signer |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This happens on Django 1.5 only.
When the SECRET_KEY
setting contains non-ASCII bytes (which is very likely for randomly generated keys), the following code breaks
from django.conf import settings from django.core.signing import Signer settings.SECRET_KEY = b'\xe7' # binary key with non-ASCII chars s = Signer() print(s.sign('foo'))
Relevant stack trace:
Traceback (most recent call last): File "minimal_example.py", line 7, in <module> s.sign('foo') File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 175, in sign return str('%s%s%s') % (value, self.sep, self.signature(value)) File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 169, in signature signature = base64_hmac(self.salt + 'signer', value, self.key) File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/core/signing.py", line 75, in base64_hmac return b64_encode(salted_hmac(salt, value, key).digest()) File "/Users/nvie/.virtualenvs/foo/lib/python2.7/site-packages/django/utils/crypto.py", line 48, in salted_hmac key = hashlib.sha1((key_salt + secret).encode('utf-8')).digest() UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128)
This is due to the fact that unicode and bytes are concatenated and, as a result, the bytes are implicitly decoded, which is an invalid operation, hence the UnicodeDecodeError. The error, thus, is perfectly valid.
I think the source of the bug is that, internally, the Signer
works with text-based keys, salts and values, where it should work with byte streams instead.
Change History (9)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
The random string that Django produces in startproject
never contains non-ascii characters. Now of course anyone is free to set it to any random content. However, do we gain anything in allowing SECRET_KEY to be a bytestring? In the spirit of Python 3, I would be more in favour of documenting that SECRET_KEY is a string (not a bytestring), and if you include in it any non-ASCII chars on Python 2, you should prefix it by u''
.
comment:3 by , 12 years ago
I would argue that, given that this is used in cryptographic contexts, the key should be bytes, not a string. At the lowest level, all crypto happens on bytes anyway, and with strings you always rely on an encoding. Even though we can assume utf-8 fairly reasonably, this still is an assumption, and basically only used to convert the unicode string to a byte string. Why then, let us not pass in the bytes explicitly, independent of an encoding?
Also, this patch would make it compatible with the Django<1.5 behaviour.
Even more so, this would make it similar to other web frameworks. For example, in Flask, the SECRET_KEY
value is recommended to be generated completely random, using os.urandom()
. Thus, a byte string.
If consensus is around using a unicode string nonetheless, this should be explicitly documented at the very least, and maybe even type-checked.
comment:4 by , 12 years ago
I agree it makes more sense for SECRET_KEY to be a bytestring, especially with the argument about os.random
but we also need compatibility with unicode, especially as we cross the Python 2/3 barrier and as projects migrate.
So I think we should document that it can be either a string or bytestring, and it will be converted using UTF8 if it is a string.
In the patch, I would like to see a test that shows we are generating the same signature as before for unicode strings - this will require a hard coded signature generated with the previous signing function.
comment:5 by , 12 years ago
In Django < 1.5
In Django < 1.5, unicode strings aren't expected as keys, nor as signature input. Keys or inputs must be byte strings, or happen to contain only ascii chars, in which case the implicit str(key)
works accidentally. Since the default django-admin.py startproject
command will never contain any non-ascii chars, most people won't even notice when transitioning to Django 1.5.
Proof:
>>> import django >>> from django.core.signing import Signer >>> django.VERSION (1, 4, 3, 'final', 0) >>> s = Signer(u'H\xebll\xf8, w\xf3rld!') >>> s.signature('foo') ... UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 33: ordinal not in range(128)
An ascii-chars-only unicode key works, though:
>>> s1 = Signer(u'hello world') # unicode key >>> s1.signature('foo') 'DvqCK1YYys43rI7wbM-yOakczOc' >>> s2 = Signer(b'hello world') # bytestring key >>> s2.signature('foo') 'DvqCK1YYys43rI7wbM-yOakczOc' >>> s1.signature('foo') == s2.signature('foo') True
Signing unicode strings doesn't work, though:
>>> s1.signature(u'foo') # ascii-only unicode strings work 'DvqCK1YYys43rI7wbM-yOakczOc' >>> s1.signature(u'H\xebll\xf8, w\xf3rld!') ... UnicodeEncodeError: 'ascii' codec can't encode character u'\xeb' in position 1: ordinal not in range(128)
To Django 1.5.1
Actually, this "bug" plays in our advantage now. Since we want the behaviour to change such that Signer
can work with both unicode _and_ bytestrings, and we want to just interpret any unicode key or input as its utf8-encoded bytestring counterpart, we aren't changing the historical behaviour. For all ascii-only unicode strings u
, the following holds: u.encode('ascii') == u.encode('utf8')
. Since those were the only valid inputs prior to Django 1.5, nothing changes for existing users. The only thing we're adding is that non-ascii unicode strings will work additionally, for people that prefer working with bytestrings as secret keys.
The only remaining issue remains the following: when using python3
, the output of the .sign()
method should be in native string format, i.e. bytes in python2
and unicode in python3
. The output of that function is a string consisting of two parts: (1) the original input and (2) the signature, separated by a colon. Since the original input to .sign()
can be a bytestring, this might pose a problem in python3
when .sign()
tries to return a unicode string anyway.
Example showcasing the trouble:
>>> import sys >>> sys.version >>> tuple(sys.version_info) (3, 3, 0, 'final', 0) >>> import django >>> django.VERSION (1, 5, 0, 'final', 0) >>> from django.core.signing import Signer >>> s = Signer('foo') >>> s.sign(b'\xe7') ... django.utils.encoding.DjangoUnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data. You passed in b'\xe7' (<class 'bytes'>)
It might be a theoretical case, maybe, but I think it's important that all bytestrings and unicode strings can be validly fed to .sign()
—no exception. Maybe it's an idea to add a new method .sign_bytes()
, which always returns the strict bytes signature, and have .sign()
use that and decode its value back to unicode in the case of python3
. We can document that .sign()
will always return a unicode string in python3
, and thus may fail for some inputs. If people want to be able to sign _any_ input, they should use .sign_bytes()
, which works for all inputs, but will return bytes.
Another solution would be to have .sign()
return whatever type is fed to it as input:
>>> type(s.sign(b'foo')) == bytes >>> type(s.sign(u'foo')) == str # unicode
What do you think? I could take a stab at this if we agree on the potential solution.
comment:6 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Marking this as accepted, in principle both the secret key and the salt_key should be bytes, not strings, so there should be no need to encode them.
comment:7 by , 12 years ago
Has patch: | set |
---|
pull request from the original reporter:
https://github.com/django/django/pull/878
comment:8 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I took a stab at a possible solution, which works for me. Can others step in to see if this would break other cases? (The test suite ran fine, though.)
https://github.com/nvie/django/compare/django:1.5...nvie:bugfix/signer-with-binary-key