Opened 14 years ago
Last modified 8 years ago
#14904 new Bug
TextField with unique (or in unique_together) constraint breaks for large inputs in Postgres
Reported by: | jorn | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.2 |
Severity: | Normal | Keywords: | postgresql, index, textfield |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
i have a model like this one:
class foo(Model): a = CharField(max_length=8) b = TextField() class Meta: unique_together = (("a", "b"),)
Now this leads to a sql create table statement like the following in Postgresql:
CREATE TABLE "myapp_foo" ( "id" serial NOT NULL PRIMARY KEY, "a" varchar(8) NOT NULL, "b" text NOT NULL, UNIQUE ("a", "b") );
The problem here is that b is a textfield which can get arbitrarily big (that's what they are there for), while values in a btree index can not:
If the input for b is above a certain size postgresql will complain:
django.db.utils.DatabaseError: index row size 3032 exceeds maximum 2712 for index "myapp_a_key" HINT: Values larger than 1/3 of a buffer page cannot be indexed. Consider a function index of an MD5 hash of the value, or use full text indexing.
The problem here is that the used unique constraint as of Postgres 8.4.5 uses a btree internally, which as stated above doesn't work with values above a certain size.
The solution for this problem would be: If a TextField has a / is part of a unique constraint or has an explicit index_db=True: _always_ index an md5 or better sha256 sum of TextField columns only (note that sha256 is in pgcrypto module only).
In this case:
UNIQUE ("a", md5("b"))
As mentioned this also applies to the universal index_db=True field argument.
Change History (4)
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:3 by , 13 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
Have the same problem with mysql,
_mysql_exceptions.OperationalError: (1071, 'Specified key was too long; max key length is 767 bytes')
With
name = models.CharField(null=True, blank=True, max_length=300) user = models.ForeignKey(User,null=True, blank=True) class Meta : unique_together = (('name', 'user'),)
comment:4 by , 8 years ago
Have also observed this with Django 1.8 and postgres 9.3
I'm linking references for fixes / workarounds found in a quick search:
- http://stackoverflow.com/questions/20725671/how-to-truncate-column-in-order-to-create-indexes
- https://www.postgresql.org/message-id/AANLkTikG_nHARKr9qeQXpS7Q6QXgJvuFUi6Wxpd0o7H7%40mail.gmail.com
- https://www.postgresql.org/docs/current/static/indexes-opclass.html
At the moment, it looks like any workaround involves writing some raw SQL. Digging through Django's code, indices are created by django.db.backends.base.schema.BaseDatabaseSchemaEditor._sql_create_index() That's pretty far outside the public API.
It would be helpful to have a complete simple sample case description especially a hint at how much text content should be assigned to the field for this condition triggering, and thus can be reproduced and/or tested against other postgresql versions.