Opened 10 years ago
Last modified 4 years ago
#24082 new Bug
Unique=True on TextField or CharField should not create an index
Reported by: | djbug | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.7 |
Severity: | Normal | Keywords: | db-indexes |
Cc: | Shai Berger, Simon Charette, emorley@…, Mariusz Felisiak, Phil Krylov, Semyon Pupkov, Can Sarıgöl, Peter Thomassen | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I've experienced this with PostgreSQL but I suspect it could be true for other databases too.
PostgreSQL docs say:
there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.
Further the docs say:
The index covers the columns that make up the [...] unique constraint [...] and is the mechanism that enforces the constraint.
However this model in Django with unique=True
on a TextField
creates an index on top of the unique constraint.
class Book(models.Model): name = models.TextField(unique=True)
creates following table & constraints in PostgreSQL:
CREATE TABLE book ( id integer NOT NULL, name text NOT NULL, ); ALTER TABLE ONLY book ADD CONSTRAINT book_name_key UNIQUE (name); CREATE INDEX book_name_like ON book USING btree (name text_pattern_ops);
Please correct me if I'm wrong. My conclusion is that databases enforce unique constraint by way of an index. Adding another index is a waste. There's some mention of this fact in an old bug report (comment 3 & comment 6 ) but it looks like the issue got dropped.
I've also verified this with the following create table statement in PostgreSQL (no explicit index). A SELECT
on name
uses an index scan instead of a sequential scan (which means there's an implicit index). So in this case, Django doesn't need to add a CREATE INDEX
statement.
CREATE TABLE book ( id serial primary key, name text UNIQUE );
However, if the justification to add a second index is to use text_pattern_ops
( Bug Report 12234 ) then it might be more efficient to interpret a unique=True
in the above table as
CREATE TABLE book ( id serial primary key, name text ); CREATE UNIQUE INDEX book_name_like ON book USING btree (name text_pattern_ops);
i.e. no UNIQUE
constraint in the table, only a UNIQUE INDEX
.
Change History (19)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
comment:2 by , 10 years ago
Description: | modified (diff) |
---|
comment:3 by , 10 years ago
Description: | modified (diff) |
---|
follow-up: 5 comment:4 by , 10 years ago
follow-up: 8 comment:5 by , 10 years ago
the issue here really is whether
unique=True
impliesdb_index=True
. This holds for most field types, so I think most people would expect it to hold for text fields as well.
Could you explain why should unique=True
imply creation of an index explicitly ? AFAIK, this would duplicate the index in PostgreSQL (& a few other databases according to the old bug reports mentioned above). If we remove this explicit index, there's no drop in performance because internally the db maintains an index for a unique constraint.
You should get the behavior you expect, probably, with
unique=True, db_index=False
. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextField
documentation.
I've tried your suggestion. For a TextFIeld
, the following two generate the same constraint & index:
TextField( unique=True, db_index=False)
and
TextField( unique=True)
follow-up: 7 comment:6 by , 10 years ago
In my opinion it would be better to not have unique=True
imply db_index=True
. How the database enforces the constraint seems more like an implementation detail than something Django should guarantee.
However, this is the currently documented behavior ("Note that when unique
is True
you don’t need to specify db_index
, because unique
implies the creation of an index"), so there would be a backwards-compatibility problem in changing it.
The fact that you can't prevent the creation of the extra index with an explicit db_index=False
does feel like a bug, though, and something that could be fixed without a deprecation cycle.
As for the suggestion to use a single text_pattern_ops
unique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."
follow-up: 9 comment:7 by , 10 years ago
Replying to marfire:
However, this is the currently documented behavior ("Note that when
unique
isTrue
you don’t need to specifydb_index
, becauseunique
implies the creation of an index"), so there would be a backwards-compatibility problem in changing it.
The fact that you can't prevent the creation of the extra index with an explicit
db_index=False
does feel like a bug, though, and something that could be fixed without a deprecation cycle.
Agreed about backward compatibility & having db_index=False
as a fix.
As for the suggestion to use a single
text_pattern_ops
unique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."
A single text_pattern_ops
index works for both LIKE
& =
queries. But it won't work for other operators like < <= etc.
. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.
comment:8 by , 10 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
First of all, +1 everything marfire said.
Replying to djbug:
Imposing uniqueness on text columns is almost always nonsense.
For PostgreSQL,
CharField
maps tovarchar(n)
andTextField
maps totext
. According to the official docs, there's no performance difference between the two. Most DBA recommendtext
overvarchar(n)
(Ref 1) (Ref 2).
Well, I rephrase: s/text column/TextField
/. TextField
s are for big blobs of text, not identifiers.
That said, the above discussion about unique & index is true for
CharField
too as far as Django is concerned.
the issue here really is whether
unique=True
impliesdb_index=True
. This holds for most field types, so I think most people would expect it to hold for text fields as well.
Could you explain why should
unique=True
imply creation of an index explicitly ?
It generally does not. An index is created explicitly only to support LIKE
operations (as you noted, #12234).
You should get the behavior you expect, probably, with
unique=True, db_index=False
. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextField
documentation.
I've tried your suggestion. For a
TextFIeld
, the following two generate the same constraint & index:
TextField( unique=True, db_index=False)and
TextField( unique=True)
I'm accepting based on this (and documentation should be added as well).
comment:9 by , 10 years ago
Replying to djbug:
A single
text_pattern_ops
index works for bothLIKE
&=
queries. But it won't work for other operators like< <= etc.
. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.
I disagree; it is much easier to notice a superfluous index, or an explicitly-removed one (once db_index=False
works), than one that is partly-missing (will work for anything except LIKE). If you're at the point where a superfluous index is hurting your performance, you're likely to be paying much closer attention to your database anyway; a missing index will hurt you much earlier.
comment:10 by , 10 years ago
In reply to shaib:
TextFields are for big blobs of text, not identifiers.
I feel that's not the case with PostgreSQL. Whatever I've read so far pushes me to use TextField
for all types of text.
I'll add one more thing to the above discussion:
Based on my tests, the text_pattern_ops
does not use index scan with < <= > >=
operators. So the point about requiring 2 index is questionable. It's possible that I'm doing something wrong and would love to have someone verify this before coming to a conclusion.
According to PG docs, the following operators are supported by text_pattern_ops
:
"btree";"text_pattern_ops";"=(text,text)" "btree";"text_pattern_ops";"~<~(text,text)" "btree";"text_pattern_ops";"~<=~(text,text)" "btree";"text_pattern_ops";"~>=~(text,text)" "btree";"text_pattern_ops";"~>~(text,text)"
comment:11 by , 9 years ago
Cc: | added |
---|
comment:12 by , 9 years ago
Keywords: | db-indexes added |
---|
comment:13 by , 8 years ago
Cc: | added |
---|
comment:14 by , 8 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:15 by , 7 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:16 by , 7 years ago
Cc: | added |
---|
comment:17 by , 6 years ago
Cc: | added |
---|
comment:18 by , 6 years ago
Cc: | added |
---|
comment:19 by , 4 years ago
Cc: | added |
---|
Imposing uniqueness on text columns is almost always nonsense (and doesn't work at all on Oracle). If it weren't for backwards compatibility, I'd suggest we flag it as an error.
That said, the issue here really is whether
unique=True
impliesdb_index=True
. This holds for most field types, so I think most people would expect it to hold for text fields as well.You should get the behavior you expect, probably, with
unique=True, db_index=False
. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in theTextField
documentation.