#14820 closed (fixed)
Use `TextField` instead of `PositiveIntegerField` in docs and examples for generic relations.
Reported by: | Tai Lee | Owned by: | Gabriel Hurley |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Keywords: | generic relation genericforeignkey object_id type textfield sprintdec2010 | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The docs state that the object_id
field on a generic model that is linked back to with GenericForeignKey
fields must be the same type, usually PositiveIntegerField
. This is not entirely true since [12353] which coerces the primary key value to the same type as the object_id
field on the generic model.
This means that TextField
should be the preferred type for object_id
fields on generic models, because it will allow models with integer and non-integer (anything that can be coerced to a string) to link back to the generic model in question.
Django actually uses TextField
(I presume, for this reason) for its own internal generic models (LogEntry
and BaseCommentAbstractModel
), so it makes sense that we should tell users how to avoid this old limitation and set a best practice by example.
Attachments (2)
Change History (22)
comment:1 by , 14 years ago
Owner: | changed from | to
---|
comment:2 by , 14 years ago
Needs documentation: | set |
---|
comment:3 by , 14 years ago
I'm not convinced that the docs should be amended as suggested. While I agree that for maximum flexibility a TextField
will serve best, in the vast majority of cases a PositiveIntegerField
is appropriate.
At best I could see a note in the docs indicating that generic reusable apps should use a TextField
given that object_id
's may be arbitrary strings.
The argument here comes down to the meaning of the word "usually". The "usual" case is that people have not overridden the default integer id behavior and a PositiveIntegerField
is appropriate. Thus I am -1 on a change to the docs that says *all* cases should use a TextField
for object_id
.
comment:4 by , 14 years ago
Has patch: | set |
---|---|
Needs documentation: | unset |
I've uploaded a patch which uses TextField
in the example instead of PositiveIntegerField
and explains the limitation properly, which is to say that the object_id
field doesn't have to be the same type as the primary key fields on the related models, but that it must be possible to coerce the primary key values to the same type as the object_id
field with the field's get_db_prep_value
method.
If you agree that TextField
serves best for maximum flexibility, and the existing "majority of cases" function equally as well with TextField
, but additionally the second most common case (CharField
primary keys) also works, why wouldn't we want users to go with TextField
from the get go?
comment:6 by , 14 years ago
Owner: | removed |
---|
comment:7 by , 14 years ago
How about adding a section on GFKs in reusable apps at the bottom, instead? Also, while the statement about values coercible values is certainly accurate, it's not very helpful to the reader. Docs on get_db_prep_value
won't help much either, so some example would be good.
comment:8 by , 14 years ago
milestone: | → 1.3 |
---|---|
Patch needs improvement: | set |
Triage Stage: | Unreviewed → Accepted |
Being realistic, whatever ends up in the example there is going to be copied and pasted and become the official recommendation despite whatever is said around it.
I suppose one of my problems with recommending the use of a TextField
is that you're implicitly trading wider compatibility for a performance penalty (varying by DB): text
columns often use more memory, can be less performant when returned by DB queries, and can have indexing woes depending on the database configuration... the best practice performance-wise is to use a field of the appropriate type (CharField
, IntegerField
, or PositiveIntegerField
) if you know for a fact that the values will always be of that type.
What it all boils down to is that there isn't one right answer here.
If someone feels they have enough to say to write a whole section on best practices for using GFK's in reusable apps I'd love to see it.
If not, then I'd rather see this patch in reverse: keep the primary recommendation as "use a field of the appropriate type" and add a big solid note that authors of reusable apps may want to use a TextField
in order to support both integer and non-integer PK's. The point is definitely important, though; I think it could have an admonition
directive to call attention to it.
And in the end, I firmly believe there are more people reading this section of the docs just trying to make their own Django app work than there are folks trying to write reusable apps that are having bugs because their GFK didn't support a non-integer PK.
by , 14 years ago
Attachment: | 14820-textfield-generic-relations-r14785.diff added |
---|
comment:9 by , 14 years ago
Keywords: | sprintdec2010 added |
---|---|
Patch needs improvement: | unset |
I've updated the patch as per your feedback. I don't have the setup to compile the docs and view the resulting HTML version, so I'm not 100% certain that I've formatted it correctly with regards to the admonition
and versionchanged
directives.
I changed the example to use IntegerField
instead of PositiveIntegerField
because the default for models is actually IntegerField
, and PositiveIntegerField
just adds some extra validation to that which isn't really relevant here.
I also use CharField
in the the example scenario, as I believe it should not have the same database performance issues as TextField
while still offering compatibility with integer and character primary keys, but also still mention that TextField
provides maximum flexibility when you don't know which models will be related to yours, but with TextField
comes the performance considerations you mentioned.
comment:10 by , 14 years ago
I suggest also to add a note about adding an index to object_id. I just realized that slow queries in my app were related to the fact that this column was not indexed. I'm sure that most applications will benefit from this column being indexed.
comment:11 by , 14 years ago
This really is a bit of a sticky issue.
In the vast majority of cases, PositiveIntegerField
is the correct choice [1]
. In particular, it's going to be faster, support indexing, and generally use less memory.
TextField
is really the wrong choice most of the time. Most backends implement it using "blob" storage or similar, which uses significantly more memory for smaller items, doesn't support efficient indexing, and is almost always less performant than the other options.
CharField
is an interesting compromise. It is much closer to (Positive)IntegerField
in terms of performance, but again, this varies on a backend basis. More problematically, it requires a specified length, which is going to vary significantly depending on the ID scheme in use. If we pick a "safe" recommended value of say, 128 characters, we waste significant database space on some backends, especially if the actual addressing scheme in use is a monotonically increasing positive integer (and we still don't support users who, for whatever reason, want to use 129 character values).
I agree that encouraging third-party developers to build reusable applications which will work across a broad variety of databases is a very good goal. In particular, as Django becomes easier to use with non-traditional databases, this sort of RDBMS-centric thinking will crop up more often. However, in this case I think that leaving the example with a PositiveIntegerField
is probably the best option. It encourages new developers toward the choice which will be most performant (and compatible) in the current majority of cases. I also strongly support the addition of the pullout highlighting the issue, but cannot endorse generically recommending a Text
or CharField
in all situations.
I'd like to hear Alex Gaynor's thoughts on this issue.
[1]
because on some backends, IntegerField
actually is signed (not just checked in Django), and negative values are not a standard usage here.
comment:12 by , 14 years ago
Thanks for your feedback, PaulM.
As far as I can tell, using PositiveIntegerField
is the same as IntegerField
, except for some additional form field validation when the model is used in a ModelForm
class. I'm happy to leave the examples as using PositiveIntegerField
if it really makes a practical difference anywhere, but I'm not sure that it does.
There won't be any performance benefit, because it's still the same integer field at the database backend. Only when people use the model in a ModelForm
class AND a database backend that doesn't support negative integer primary keys (could you elaborate on this point?), they'll get some additional validation on that form field to prevent them typing in negative values. On the flip side, anyone using the model in a ModelForm
and a database backend that DOES support negative integer primary keys will get some additional validation that they might not want.
At the end of the day, I think the form field validation for object_id
fields is a non-issue. I think that in most cases, people will not assign values directly through model forms, and if they do they'll probably add their own validation to ensure that the value references an actual object in the target model (not just that it is positive). Especially if integer fields on some database backends do and some don't support negative integers, we should leave this validation out of it.
All I really care about though is adding a note that character and text fields can be used for greater compatibility, for authors of generic apps that won't know what types of objects people want to link their models through generic relations. If changing the examples from IntegerField
back to PositiveIntegerField
will get this change committed, I'm all for it :)
comment:13 by , 14 years ago
Setting aside my hate for GenericForeignKeys
for a moment. I'd say leave it as a PositiveIntegerField
but throw a note below it.
comment:14 by , 14 years ago
My bigger concern goes back to the fact that what goes into this section of the docs becomes a sort of "official recommendation". TextField
is off the table, and I don't mind CharField
, but Paul's concern about having to choose a length for a CharField
is a valid one, and I don't have an answer there. What's the right compromise between "generic" and performant here?
comment:15 by , 14 years ago
The compromise is to say, in no uncertain terms, "there's no such thing as a right answer here". Generic foreign keys are, by their very nature, hideously inefficient. It is absolutely application dependent what constitutes an 'appropriate' datatype, or an appropriate size if a CharField is warranted. I don't think we can do any better than to say "These examples use a PositiveIntegerField, which will be sufficient for storing any normal Django model with an integer primary key. However, you need to check your own requirements..."
by , 14 years ago
Attachment: | 14820-textfield-generic-relations-r15449.diff added |
---|
comment:16 by , 14 years ago
I've updated the patch to use PositiveIntegerField
in the examples. Any objections, changes or additions to the admonition underneath?
comment:17 by , 14 years ago
Sorry to chime in again, but I would still propose to add db_index=True in the TaggedItem example (object_id = models.PositiveIntegerField(db_index=True)). It seems to me good practice to add index on fields used to join tables, isn't it?
I'll update with a patch shortly.