#26139 closed Bug (invalid)
UUIDField value must be declared "binary" to avoid MySQL warning
Reported by: | Guilhem Bichot | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | MySQL |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I'm a developer of the MySQL DBMS at Oracle.
A user of Django (Giovanni Go) reported this:
http://bugs.mysql.com/bug.php?id=79317
He writes
"We use UUID stored as binary(16) PK to identify users.
We're seeing an issue where MySQL is trying to validate binary as UTF8
starting with MySQL 5.6.27 when trying to insert (and presumingly
select) users. It will insert the row, but since Django treats MySQL
warnings as exception, Django will bail out.
Setup: Python 2.7, Django 1.6.6, MySQL-python 1.2.5, Ubuntu 14.04/OSX
Django will treat any MySQL warnings as exception if you have DEBUG
turned on."
Then he gives a simple Python, Django-less program to show the issue.
I don't have a Django installation, but the problem is apparently that:
- Django generates a UUID object per the user's request
- it inserts it into a MySQL table column, with a simple INSERT:
INSERT INTO table (column) VALUES('cryptic chars here');
those can be really cryptic as a UUID string is made of any bytes from
0x00 to 0xFF.
- MySQL expects the string to be valid utf8; it has expected this for
years, but only recently did it start warning when the string is not
valid utf8 (this is the relevant change:
https://github.com/mysql/mysql-server/commit/33a2e5abd ).
- In fact, the string is not valid utf8: certain sequences of bytes do
not match utf8 characters. The string is binary. Then it would make
sense for the client to declare it; which is done this way:
INSERT INTO table (column) VALUES(_binary 'cryptic chars here');
The _binary prefix says "what I'm giving you next is binary":
http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html .
Another option would be to use the x syntax in the INSERT:
https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html .
INSERT INTO table (column) VALUES(x'hex codes of cryptic chars here');
Is there any chance to modify Django's code to address this issue?
In the report, please ignore any comments starting from "[25 Jan 2:14]
Change History (4)
comment:1 by , 9 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Keywords: | MySQL added |
Summary: | Django's UUID inserted into MySQL => Warning → UUIDField value must be declared "binary" to avoid MySQL warning |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
comment:2 by , 9 years ago
I thought that on non-PostgreSQL databases, the value of UUIDs were always saved as strings (.hex
representation). Is it not the case?
follow-up: 4 comment:3 by , 9 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Yes, Claude is correct. Django's UUIDField
uses char(32)
-- the binary(16)
representation is a third-party implementation. The warnings I'm seeing for Django's test suite are for BinaryField
(created #26140).
comment:4 by , 7 years ago
Replying to Tim Graham:
Yes, Claude is correct. Django's
UUIDField
useschar(32)
-- thebinary(16)
representation is a third-party implementation. The warnings I'm seeing for Django's test suite are forBinaryField
(created #26140).
I've been researching this; is there a reason we _don't_ use BINARY(16)
for MySQL? It should be massively more efficient in storage and index space. I think if I subclass the existing UUIDField
, this should be pretty close to working (I think I need to override __init__
to set the max_length
to 16:
class MySQLUUIDField(UUIDField): def get_internal_type(self): return "BinaryField" def get_db_prep_value(self, value, connection, prepared=False): if value is None: return None if not isinstance(value, uuid.UUID): value = self.to_python(value) if connection.features.has_native_uuid_field: return value return value.bytes def to_python(self, value): if value is not None and not isinstance(value, uuid.UUID): try: return uuid.UUID(bytes=value) except (AttributeError, ValueError): raise exceptions.ValidationError( self.error_messages['invalid'], code='invalid', params={'value': value}, ) return value
I'm taking some liberties and not accounting for every edge case, but I think that's enough to work with the existing MySQL backend as written, though ideally I'd like the backend to presume BINARY(16)
and the field definition could act accordingly. Looking at Oracle and sqlite, those DBs use a BLOB
type for binary data, so I'm not sure if there is any benefit or not using that type for a UUID.
While the report mentions Django 1.6, I've seen these warnings when running the tests for the Django master branch (1.10.dev at this time).