Opened 4 years ago

Last modified 3 months ago

#32577 assigned New feature

Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD`

Reported by: Tomasz Wójcik Owned by: Tomasz Wójcik
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: uuid
Cc: Diego Lima, Brian Helba, raydeal, johnthagen, Michael Kuron, Alireza Safari, Petr Přikryl, Wilson E. Husin, Luiz Braga, Dmytro Litvinov, Olivier Dalang, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tomasz Wójcik)

Default auto field introduced in #31007, as seen in the docs, is BigAutoField.

class BigAutoField(AutoFieldMixin, BigIntegerField):
    def get_internal_type(self):
        return 'BigAutoField'

    def rel_db_type(self, connection):
        return BigIntegerField().db_type(connection=connection)

Many projects prefer UUID field over big int for pk. So far we had to use a mixin for that but I am under the impression DEFAULT_AUTO_FIELD exists to lift this burden.

I'd expect something like this to work (additional adjustments might be needed)

from django.db import models
from django.db.models.fields import AutoFieldMixin


class UUIDAutoField(AutoFieldMixin, models.UUIDField):
    def get_internal_type(self):
        return 'UUIDAutoField'

    def rel_db_type(self, connection):
        return models.UUIDField().db_type(connection=connection)

But if I use this for DEFAULT_AUTO_FIELD I get the error

ValueError: Primary key 'common.fields.UUIDAutoField' referred by DEFAULT_AUTO_FIELD must subclass AutoField.

I noticed two things I want to share.

First, the default auto field BigAutoField is not subclassing AutoField so it's not consistent with the error message.

Second, AutoField is subclassing IntegerField. If I understand this correctly, users are limited to subclass IntegerField for DEFAULT_AUTO_FIELD. If so, there's no way to implement UUIDAutoField as DEFAULT_AUTO_FIELD.

I understand auto fields need to share a common interface but having to subclass IntegerField is a big constraint.

I'm happy to open a PR once I know what's the desired functionality. Possible solutions I see:

  • enforce subclassing of AutoFieldMixin instead AutoField
  • I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk. Maybe adding UUIDAutoField to django fields would make sense. That way it'd have to enforce subclass of AutoField or UUIDAutoField

Mentioned also in here, here and here.

Django==3.2rc1

Change History (35)

comment:1 by Tomasz Wójcik, 4 years ago

Summary: Don't enforce `DEFAULT_AUTO_FIELD` to subclass from `AutoField`Add support for `UUIDAutoField` `DEFAULT_AUTO_FIELD`

comment:2 by Tomasz Wójcik, 4 years ago

Description: modified (diff)

comment:3 by Carlton Gibson, 4 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedNew feature

OK, yes, part of the discussion was possible extension to other kinds if field, so we can accept this as a new feature.

(To be clear, I consider it out-of-scope for the original ticket, and so not a release blocker for 3.2)

comment:4 by Tomasz Wójcik, 4 years ago

I understand it's out of scope for 3.2 as you already have a RC though I think docs could mention it has to subclass AutoField.

The type of this implicit primary key can now be controlled via the DEFAULT_AUTO_FIELD setting and AppConfig.default_auto_field attribute. No more needing to override primary keys in all models.

This could be more specific as it doesn't apply to UUID pk projects.


I'm glad it's accepted as a feature request, thanks. I'm looking forward to hearing how you want this to be tackled. And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.

comment:5 by Diego Lima, 4 years ago

Cc: Diego Lima added

comment:6 by Ian Foote, 4 years ago

I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

in reply to:  6 ; comment:7 by Tomasz Wójcik, 4 years ago

Replying to Ian Foote:

I don't think this field has to be very generic because DBs pk types are very limited. As far as I know, only ints and UUIDs make sense for pk.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

Sure, my phrasing wasn't the best. What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

in reply to:  7 ; comment:8 by Antonin Grêlé, 4 years ago

Replying to Tomasz Wójcik:

What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

in reply to:  8 comment:9 by Tomasz Wójcik, 4 years ago

Replying to Antonin Grêlé:

Replying to Tomasz Wójcik:

What I meant is I think users only use numeric types and uuids for generic app-wide primary keys so I don't think it's necessary to allow other types for AutoField, such as string. So instead of refactoring the entire AutoField, another type - UUIDAutoField could be added.

Could I argue that IDs based on human-readable strings, like coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

Popular use case for a "readable uuid" like that is a randomly generated username. It's usually not a pk, just has a unique constraint, because you often perform many operations against users table and you don't want this table to be slow on joins. For other cases, why does it need to be readable at all?

To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.

To my knowledge, varchar is a completely different story. It's way bigger and more difficult to sort, because it's very difficult to compare.

comment:10 by Mariusz Felisiak, 4 years ago

Note: Fixing this ticket will also resolve #17337.

comment:11 by Nick Pope, 4 years ago

I've been involved in some of the changes and discussions related to AutoField improvements and DEFAULT_AUTO_FIELD and wanted to clarify a few things, as well as what needs to be done to achieve the creation of UUIDAutoField.


Some points about DEFAULT_AUTO_FIELD which was added in b5e12d490af3debca8c55ab3c1698189fdedbbdb:

  • The current implementation restricts to values that are a subclass of AutoField. (See here.)
  • It was named DEFAULT_AUTO_FIELD to indicate that it referred to the automatically generated id field.
    • This was instead of DEFAULT_AUTOFIELD which would imply that it could only ever be related to AutoField.
    • Another suggestion had been DEFAULT_MODEL_PK, but we weren't ready to allow other fields without AutoField-like semantics.
  • We could lift the restriction to allow other non-AutoField types, but the concept of an automatically generated id field needs to be decoupled from AutoField a bit.

Some history and thoughts about AutoField:

  • Much of the logic related to the automatically generated id field is hardwired to look for AutoField subclasses. This needs to be fixed.
  • AutoField has essentially always been like an IntegerField, but historically copy-pasted some of that rather than inherit from IntegerField directly.
  • BigAutoField inherited from AutoField, but overrode with parts of BigIntegerField, but it didn't originally inherit from BigIntegerField.
  • As a result, these *AutoFieldss didn't support the system checks or validators of their *IntegerField equivalents.
  • I addressed some of this in 21e559495b8255bba1e8a4429cd083246ab90457 for #29979:
    • Moved the logic about automatically generated fields into AutoFieldMixin.
    • Fixed the *AutoField classes to inherit from AutoFieldMixin and their appropriate *IntegerField type.
    • Added AutoFieldMeta to maintain backward compatibility for isinstance(..., AutoField) checks and provide a hook point for deprecation of it in the future.

One more thing to mention about an automatically generated id field is that the value is generated by the database and returned - it must not be generated in Python.


What needs to be done to get to UUIDAutoField? Well, here is what I think the path is...

  • Add a db_generated/is_auto (or another better name) flag to AutoFieldMixin. (See my comment. I'm not convinced #24042 should have been closed.)
  • Replace isinstance(..., AutoField) checks with the new flag. (I seem to recall that isinstance(..., AutoFieldMixin) wasn't really favoured.)
    • This will also open up DEFAULT_AUTO_FIELD to a wider range of non-integer automatically generated field types.
  • Deprecate isinstance (and issubclass?) checks on AutoField within AutoFieldMeta. (The metaclass can be scrapped when the deprecation period ends.)
  • Implement support for .db_default so that we can specify a function for generating a UUID on the database rather than in Python, e.g. RandomUUID.
    • Work is ongoing in this PR to address #470 which will implement this.
    • It is also possible that this PR to address #30032 is an acceptable alternative path.
  • Make AutoFieldMixin part of the public API and document it so that others can use it for their own non-integer, non-UUID implementations.
  • Finally, implement UUIDAutoField itself. This is a start, there will be more pieces to the puzzle:
class UUIDAutoField(AutoFieldMixin, UUIDField):

    def __init__(self, *args, **kwargs):
        kwargs['db_default'] = RandomUUID()
        super().__init__(*args, **kwargs)

    def deconstruct(self):
        name, path, args, kwargs = super().deconstruct()
        del kwargs['db_default']
        return name, path, args, kwargs

    def get_internal_type(self):
        return 'UUIDAutoField'

    def rel_db_type(self, connection):
        return UUIDField().db_type(connection=connection)

It is likely that AutoFieldMixin.get_db_prep_value() would need attention because of connection.ops.validate_autopk_value().

Note that this field would also be limited to databases that have a function to generate a UUID.

As Mariusz said, implementing a non-integer auto field such as UUIDAutoField would close #17337.


Here are some responses to some of the other things mentioned in the comments above.

First, the default auto field BigAutoField is not subclassing AutoField so it's not consistent with the error message.

This is a little confusing as it is done a bit magically for now - see the bit about AutoFieldMeta above.
There was, however, a bug for subclasses of SmallAutoField and BigAutoField which was fixed by 45a58c31e64dbfdecab1178b1d00a3803a90ea2d.

While I'm not sure it would make sense in an AutoField, string primary keys are possible. For example, you could use a language code (en, fr, de, etc) as a primary key.

Yes, you can create a textual primary key, but you have to override id for every model, e.g.

id = models.CharField(max_length=64, primary_key=True)

It is unlikely that it'd make sense as an "AutoField" unless you can generate the value on the database-side.

Could I argue that IDs based on human-readable strings, like ​coolname, are a valid use case?

It's easier to reference '"soft-cuddly-shrew-of-expertise" than "a1e1b1c5-5d5e-4eef-9ee9-e74e87ce71d6" or "2467584".

Would this be seen as bad practice and we should use a slugfield that's not the primary key for something like this instead?

Again, unless this were generated by the database, you'd probably be better off doing something like:

def generate_coolname():
    ...  # Implementation here.

class MyModel(models.Model):
    id = models.CharField(default=generate_coolname, max_length=64, primary_key=True)

Guaranteeing uniqueness is always a fun challenge...

To my understanding there's no significant performance hit from using UUID as a primary key. It is 4x larger than int32 (128bit). So also index is bigger, cache will store less data etc.

I dredged up some articles on this topic I recalled seeing a while back:

And as I said, I'm happy to open a PR as I want to use it with UUIDs in my next project.

You can use UUID primary keys today. You just have to define a custom primary key so that the auto field isn't generated:

class MyModel(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)

Granted, this is less convenient, but it is possible and documented.

in reply to:  11 comment:12 by Samuel Bishop, 3 years ago

Replying to Mariusz Felisiak:

Note: Fixing this ticket will also resolve #17337.

This would be an interesting development, and it will be great to make some progress on the nonrel front.


Replying to Nick Pope:

  • It was named DEFAULT_AUTO_FIELD to indicate that it referred to the automatically generated id field.
    • This was instead of DEFAULT_AUTOFIELD which would imply that it could only ever be related to AutoField.
    • Another suggestion had been DEFAULT_MODEL_PK, but we weren't ready to allow other fields without AutoField-like semantics.
  • We could lift the restriction to allow other non-AutoField types, but the concept of an automatically generated id field needs to be decoupled from AutoField a bit.

While the situation currently doesn't support it, having a DEFAULT_MODEL_PK which accepts any field which sets primary_key=True and unique=True would be very very useful as there are lots of use cases, such as legacy and interoperability scenarios, where having the primary key generation logic in python is going to be necessary, and its good to hear that the situation is "were not ready" not "we don't want to do this".

  • Add a db_generated/is_auto (or another better name) flag to AutoFieldMixin. (See my comment. I'm not convinced #24042 should have been closed.)

Reading through that, I agree, it feels like closing the ticket was based on the smallest possible interpretation of the issue.

Note that this field would also be limited to databases that have a function to generate a UUID.

This is another one of reasons why I think DEFAULT_MODEL_PK (and thus lifting the need to have the database generate the primary key value), should be the ultimate goal, as someone maintaining a ULID Field library for use as a primary key, even if I go the extra mile and write/maintain multiple independent database functions/stored procedures (and all the extra management code to ensure they are loaded, updated, etc) for each supported database (currently Sqlite, MySQL, MariaDB, PostgreSQL, CockroachDB, with MSSQL on the horizon), I would "still" like to support generating the ULID value in python code for the broadest possible compatibility.

comment:13 by Brian Helba, 3 years ago

Cc: Brian Helba added

comment:14 by raydeal, 3 years ago

Cc: raydeal added

comment:15 by johnthagen, 3 years ago

Cc: johnthagen added

comment:16 by raydeal, 3 years ago

In my opinion allowing users to use uuid as DEFAULT_AUTO_FIELD isn't good unless top db providers (Oracle, MS SQL, Postgres, SQLite) will support it in db as a kind of sequence and they will introduce uuid as auto primary key type.

DEFAULT_AUTO_FIELD is dedicated to create id field in every model. I can imagine some situation where UUID is better than INT as Primary Key, but in most cases it isn't. If it was, the top db providers would implement it a long time ago.

UUID is random generated value by default and it doesn't guarantee unique values like a sequence based on INT does. Generation a value is slower than from sequence.

AutoField in Django that can be DEFAULT_AUTO_FIELD uses db sequence that gives 100% unique values.

If UUIDAutoField is created in Django based on @Nick Pope proposal (RandomUUID), it will be worth to add in documentation that it doesn't qurantee that value is unique.

But, definitely, it is worth to allow users to create a custom AutoField based on any other field type (as reported in #17337), maybe except Text/Boolean types, and use it as PK. Anyone can then create their own UUIDAutoField and take responsibility for uniques.

Version 0, edited 3 years ago by raydeal (next)

comment:17 by raydeal, 3 years ago

Owner: changed from nobody to raydeal
Status: newassigned

comment:18 by Michael Kuron, 2 years ago

Cc: Michael Kuron added

comment:19 by raydeal, 2 years ago

Owner: raydeal removed
Status: assignednew

comment:20 by Alireza Safari, 2 years ago

Cc: Alireza Safari added

comment:21 by Mathieu Poussin, 2 years ago

"So yes, having a sequence PK (that stays hidden) and a GUID UK (that customers see) can be a good option"

It depends, on centralized databases this may be fine because everything is local, however on distributed systems (for example cockroachdb), having to handle incremental integer is a much slower (and exponentially slower with more nodes), because you basically need to coordinate all the nodes that can write data to stop, get the last sequence value, insert your line, increment the sequence and then unlock it (so you lock insert on the table during this time, even on others nodes from the cluster).

Example: https://github.com/cockroachdb/cockroach/issues/41258

in reply to:  21 comment:22 by raydeal, 2 years ago

Replying to Mathieu Poussin:

It depends, on centralized databases this may be fine because everything is local, however on distributed systems (for example cockroachdb), having to handle incremental integer is a much slower (and exponentially slower with more nodes), because you basically need to coordinate all the nodes that can write data to stop, get the last sequence value, insert your line, increment the sequence and then unlock it (so you lock insert on the table during this time, even on others nodes from the cluster).

Example: https://github.com/cockroachdb/cockroach/issues/41258

Good point, in distributed system it could be slower.
Maybe it is only cocroachdb problem? Why algorithm of coordination of nodes for sequence is slow? is it much slower then uuid or only little?

There is more questions to ask:

  • What happen if in more than one node uuid will be the same for different data? do they have mechanism to sort it out?
  • Is there more "writes" or "reads" from db? is "writing" really too slow in my project?
  • why chose distributed database (cockroachdb) instead PostgresSQL (with sharding)?

and more questions, depending on project, to chose db, design correct db structure with correct field types, and avoid headache in future.

Last edited 2 years ago by raydeal (previous) (diff)

comment:23 by Petr Přikryl, 20 months ago

Cc: Petr Přikryl added

comment:24 by Wilson E. Husin, 20 months ago

Cc: Wilson E. Husin added

comment:25 by Luiz Braga, 18 months ago

Cc: Luiz Braga added

comment:26 by Dmytro Litvinov, 13 months ago

Cc: Dmytro Litvinov added

comment:27 by Simon Charette, 9 months ago

Small note that we might see a renewed interest for this feature with the upcoming support for UUIDv7 in Postgres.

Not that it changes a lot of the arguments made here against allowing this to be default but it seems that UUIDv7, due to fact it creates less sparse and fragmented btrees, performs reasonably well when compared to bigint.

comment:28 by Olivier Dalang, 8 months ago

Cc: Olivier Dalang added

comment:29 by Tomasz Wójcik, 4 months ago

After 3 years I've decided to give it a try. It's very WIP, but I'd appreciate if someone was able to advise if this approach is feasible.

My reasoning:

  • I still haven't found if all those if isinstance(obj, AutoField) can be migrated to AutoFieldMixin check, my tests seem to pass
  • if it has to be generated in the DB, then PostgreSQL is the only DB (I think), that supports generating UUIDs, without any DB extensions
  • if that's correct, then I'm not convinced if this implementation of the Postgres-only UUIDv4-db-generated-pk field should be part of Django or rather an external library. As a dev I wouldn't mind, but it might be confusing for others that are not using PostgreSQL and are looking for a similar solution

PR

Please advise.

comment:30 by Ülgen Sarıkavak, 4 months ago

Cc: Ülgen Sarıkavak added

comment:31 by Natalia Bidart, 4 months ago

Has patch: set
Version: 3.2dev

in reply to:  29 ; comment:32 by Natalia Bidart, 4 months ago

Patch needs improvement: set

Replying to Tomasz Wójcik:

After 3 years I've decided to give it a try. It's very WIP, but I'd appreciate if someone was able to advise if this approach is feasible.

Thank you for giving this a try!

Please note that the PR has test errors and lint issues, these should be addressed before progressing with the review. Once those are solved, please update the flags in this ticket following the contributing guidelines so this PR gets added back to the "branches needing review" section of the Django Developer Dahsboard.

comment:33 by Tomasz Wójcik, 4 months ago

Keywords: uuid added
Owner: set to Tomasz Wójcik
Patch needs improvement: unset
Status: newassigned

in reply to:  32 comment:34 by Tomasz Wójcik, 4 months ago

Hey Natalia Bidart, thanks for getting back to me. The CI is green, I can see it in the "Patches pending review" section of the dashboard. See you in the PR!

comment:35 by Mariusz Felisiak, 3 months ago

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top