Opened 10 years ago

Last modified 4 years ago

#24082 new Bug

Unique=True on TextField or CharField should not create an index

Reported by: djbug Owned by:
Component: Database layer (models, ORM) Version: 1.7
Severity: Normal Keywords: db-indexes
Cc: Shai Berger, Simon Charette, emorley@…, Mariusz Felisiak, Phil Krylov, Semyon Pupkov, Can Sarıgöl, Peter Thomassen Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by djbug)

I've experienced this with PostgreSQL but I suspect it could be true for other databases too.

PostgreSQL docs say:

there's no need to manually create indexes on unique columns; doing so would just duplicate the automatically-created index.

Further the docs say:

The index covers the columns that make up the [...] unique constraint [...] and is the mechanism that enforces the constraint.

However this model in Django with unique=True on a TextField creates an index on top of the unique constraint.

class Book(models.Model):
    name = models.TextField(unique=True)

creates following table & constraints in PostgreSQL:

CREATE TABLE book (
    id integer NOT NULL,
    name text NOT NULL,
);

ALTER TABLE ONLY book ADD CONSTRAINT book_name_key UNIQUE (name);
CREATE INDEX book_name_like ON book USING btree (name text_pattern_ops);

Please correct me if I'm wrong. My conclusion is that databases enforce unique constraint by way of an index. Adding another index is a waste. There's some mention of this fact in an old bug report (comment 3 & comment 6 ) but it looks like the issue got dropped.

I've also verified this with the following create table statement in PostgreSQL (no explicit index). A SELECT on name uses an index scan instead of a sequential scan (which means there's an implicit index). So in this case, Django doesn't need to add a CREATE INDEX statement.

CREATE TABLE book (
    id serial primary key,
    name text UNIQUE
);

However, if the justification to add a second index is to use text_pattern_ops ( Bug Report 12234 ) then it might be more efficient to interpret a unique=True in the above table as

CREATE TABLE book (
    id serial primary key,
    name text
);

CREATE UNIQUE INDEX book_name_like ON book USING btree (name text_pattern_ops);

i.e. no UNIQUE constraint in the table, only a UNIQUE INDEX.

Change History (19)

comment:1 by djbug, 10 years ago

Description: modified (diff)

comment:2 by djbug, 10 years ago

Description: modified (diff)

comment:3 by djbug, 10 years ago

Description: modified (diff)

comment:4 by Shai Berger, 10 years ago

Imposing uniqueness on text columns is almost always nonsense (and doesn't work at all on Oracle). If it weren't for backwards compatibility, I'd suggest we flag it as an error.

That said, the issue here really is whether unique=True implies db_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.

You should get the behavior you expect, probably, with unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in the TextField documentation.

in reply to:  4 ; comment:5 by djbug, 10 years ago

Imposing uniqueness on text columns is almost always nonsense.

For PostgreSQL, CharField maps to varchar(n) and TextField maps to text. According to the official docs, there's no performance difference between the two. Most DBA recommend text over varchar(n) (Ref 1) (Ref 2). That said, the above discussion about unique & index is true for CharField too as far as Django is concerned.

the issue here really is whether unique=True implies db_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.

Could you explain why should unique=True imply creation of an index explicitly ? AFAIK, this would duplicate the index in PostgreSQL (& a few other databases according to the old bug reports mentioned above). If we remove this explicit index, there's no drop in performance because internally the db maintains an index for a unique constraint.

You should get the behavior you expect, probably, with unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in the TextField documentation.

I've tried your suggestion. For a TextFIeld, the following two generate the same constraint & index:

TextField( unique=True, db_index=False)

and

TextField( unique=True)


Last edited 10 years ago by djbug (previous) (diff)

comment:6 by Kevin Christopher Henry, 10 years ago

In my opinion it would be better to not have unique=True imply db_index=True. How the database enforces the constraint seems more like an implementation detail than something Django should guarantee.

However, this is the currently documented behavior ("Note that when unique is True you don’t need to specify db_index, because unique implies the creation of an index"), so there would be a backwards-compatibility problem in changing it.

The fact that you can't prevent the creation of the extra index with an explicit db_index=False does feel like a bug, though, and something that could be fixed without a deprecation cycle.

As for the suggestion to use a single text_pattern_ops unique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."

in reply to:  6 ; comment:7 by djbug, 10 years ago

Replying to marfire:

However, this is the currently documented behavior ("Note that when unique is True you don’t need to specify db_index, because unique implies the creation of an index"), so there would be a backwards-compatibility problem in changing it.

The fact that you can't prevent the creation of the extra index with an explicit db_index=False does feel like a bug, though, and something that could be fixed without a deprecation cycle.

Agreed about backward compatibility & having db_index=False as a fix.

As for the suggestion to use a single text_pattern_ops unique index to serve double duty, the PostgreSQL documentation makes it clear that that will not suffice to support all of Django's lookup types: "Note that you should also create an index with the default operator class if you want queries involving ordinary <, <=, >, or >= comparisons to use an index. Such queries cannot use the xxx_pattern_ops operator classes."

A single text_pattern_ops index works for both LIKE & = queries. But it won't work for other operators like < <= etc.. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.

in reply to:  5 comment:8 by Shai Berger, 10 years ago

Cc: Shai Berger added
Triage Stage: UnreviewedAccepted

First of all, +1 everything marfire said.

Replying to djbug:

Imposing uniqueness on text columns is almost always nonsense.

For PostgreSQL, CharField maps to varchar(n) and TextField maps to text. According to the official docs, there's no performance difference between the two. Most DBA recommend text over varchar(n) (Ref 1) (Ref 2).

Well, I rephrase: s/text column/TextField/. TextFields are for big blobs of text, not identifiers.

That said, the above discussion about unique & index is true for CharField too as far as Django is concerned.

the issue here really is whether unique=True implies db_index=True. This holds for most field types, so I think most people would expect it to hold for text fields as well.

Could you explain why should unique=True imply creation of an index explicitly ?

It generally does not. An index is created explicitly only to support LIKE operations (as you noted, #12234).


You should get the behavior you expect, probably, with unique=True, db_index=False. If this indeed gives you what you want, I'd resolve this ticket by adding an admonition about this in the TextField documentation.

I've tried your suggestion. For a TextFIeld, the following two generate the same constraint & index:

TextField( unique=True, db_index=False)

and

TextField( unique=True)


I'm accepting based on this (and documentation should be added as well).

in reply to:  7 comment:9 by Shai Berger, 10 years ago

Replying to djbug:

A single text_pattern_ops index works for both LIKE & = queries. But it won't work for other operators like < <= etc.. But, I feel that, it would be an overkill to create 2 indexes by default. It should be left to the user to decide if they want double index or a single one with whichever operator class they want.

I disagree; it is much easier to notice a superfluous index, or an explicitly-removed one (once db_index=False works), than one that is partly-missing (will work for anything except LIKE). If you're at the point where a superfluous index is hurting your performance, you're likely to be paying much closer attention to your database anyway; a missing index will hurt you much earlier.

comment:10 by djbug, 10 years ago

In reply to shaib:

TextFields are for big blobs of text, not identifiers.

I feel that's not the case with PostgreSQL. Whatever I've read so far pushes me to use TextField for all types of text.

I'll add one more thing to the above discussion:

Based on my tests, the text_pattern_ops does not use index scan with < <= > >= operators. So the point about requiring 2 index is questionable. It's possible that I'm doing something wrong and would love to have someone verify this before coming to a conclusion.

According to PG docs, the following operators are supported by text_pattern_ops:

"btree";"text_pattern_ops";"=(text,text)"
"btree";"text_pattern_ops";"~<~(text,text)"
"btree";"text_pattern_ops";"~<=~(text,text)"
"btree";"text_pattern_ops";"~>=~(text,text)"
"btree";"text_pattern_ops";"~>~(text,text)"

Last edited 10 years ago by djbug (previous) (diff)

comment:11 by Simon Charette, 9 years ago

Cc: Simon Charette added

comment:12 by Tim Graham, 9 years ago

Keywords: db-indexes added

comment:13 by Ed Morley, 8 years ago

Cc: emorley@… added

comment:14 by Mariusz Felisiak, 8 years ago

Cc: Mariusz Felisiak added
Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

comment:15 by Mariusz Felisiak, 7 years ago

Owner: Mariusz Felisiak removed
Status: assignednew

comment:16 by Phil Krylov, 7 years ago

Cc: Phil Krylov added

comment:17 by Semyon Pupkov, 6 years ago

Cc: Semyon Pupkov added

comment:18 by Can Sarıgöl, 6 years ago

Cc: Can Sarıgöl added

Hi, I'm working on a PR which is related to this issue. I pushed a commit that contains a solution effort.

Version 0, edited 6 years ago by Can Sarıgöl (next)

comment:19 by Peter Thomassen, 4 years ago

Cc: Peter Thomassen added
Note: See TracTickets for help on using tickets.
Back to Top