Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#33649 closed Bug (invalid)

bulk_create() with ignore_conflicts=True and ForeignKey fails

Reported by: Markus Friedrich Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: bulk_create ForeignKey ignore_conflicts
Cc: Hannes Ljungberg Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mariusz Felisiak)

Creating both objects of a many-to-one relation using bulk_create together with ignore_conflicts=True fails since Django >=3.2!

To enable many-to-one relations in bulk_create the "to_field" of the ForeignKey is defined via a UUIDField with a uuid.UUID4 default value.

Without the ignore_conflicts=True option of bulk_create it works well in all Django versions.
But if ignore_conflicts=True is needed since some of the objects using in the bulk_create may already be saved to the DB then it works only for Django < 3.2.
With Django >= 3.2 (including 4.0.4) it fails with:
ValueError: bulk_create() prohibited to prevent data loss due to unsaved related object 'reporter'.

Find attached a test app which reproduces this issue with a minimal example, see
mysite/polls/models.py and mysite/polls/tests.py

Attachments (1)

mysite.zip (8.8 KB ) - added by Markus Friedrich 3 years ago.

Download all attachments as: .zip

Change History (6)

by Markus Friedrich, 3 years ago

Attachment: mysite.zip added

comment:1 by Mariusz Felisiak, 3 years ago

Cc: Hannes Ljungberg added
Component: UncategorizedDatabase layer (models, ORM)
Description: modified (diff)
Resolution: invalid
Status: newclosed
Summary: bulk_create with ignore_conflicts=True and ForeignKey failsbulk_create() with ignore_conflicts=True and ForeignKey fails

Thanks for this report. This behavior was intentionally changed in 10f8b82d195caa3745ba37d9424893763f89653e to prevent possible data loss. Using ignore_conflicts=True tells the database to ignore failure to insert any rows that fail constraints such as duplicate unique values, but also disables setting the primary key on each model instance (as documented). Therefore a reporter instance doesn't have a primary key set. You need to attached already saved reporter to the article before bulk_create().

comment:2 by Markus Friedrich, 3 years ago

Thanks for you very fast answer and the provided links.

I know that ignore_conflicts=True disables setting the primary key for all DB's and that some DB's are not able to set the primary key using bulk_create at all.
That's why the ForeignKey of Article does not use the primary key (id) of Reporter as to_field but used the uuid field of Reporter. This uuid field is unique=True and its value is a UUID4 value which is even set if the object is not already saved. This enables the use of bulk_create even for DB's which are not able to set a primary auto inc key.

I found now that may test case even fails with the same error with >=3.2 if ignore_conflicts is removed if a DB is used which cannot set the pk in bulk_create. (e.g. older sqlite3 (I have tested with sqlite3-3.27))
This means with this error message you have intentionally completely removed the functionality to use bulk_create together with a ForeignKey on all databases which are not able to set the pk in bulk_create!
If this is really intended I'm OK with it and need to workaround this. (even thought I do not understand why there should be a data-loss problem in this case (with a uuid as to_field of the ForeignKey))

(I also do not understand your last sentence: I do not have any already saved reporter in my test case (but in my real case). But as detected now it even fails without the ignore_conflicts, see above)

in reply to:  2 comment:3 by Mariusz Felisiak, 3 years ago

This means with this error message you have intentionally completely removed the functionality to use bulk_create together with a ForeignKey on all databases which are not able to set the pk in bulk_create!

Why? This error message is only raised when a related object is not saved. The current behavior of bulk_create() and bulk_update() reflects the save() behavior.

If this is really intended I'm OK with it and need to workaround this. (even thought I do not understand why there should be a data-loss problem in this case (with a uuid as to_field of the ForeignKey))

As far as I'm aware there is still a data loss possibility 🤔, because the assigned UUID may not exists in the database at the end. We cannot guarantee that the assigned UUID will be stored in the database as it comes from an unsaved object.

(I also do not understand your last sentence: I do not have any already saved reporter in my test case (but in my real case). But as detected now it even fails without the ignore_conflicts, see above)

My advise is to save reporter instances before attaching them to an article.

comment:4 by Markus Friedrich, 3 years ago

Why? This error message is only raised when a related object is not saved.

The related object is saved since bulk_create() is called on it. This save is just not recognized by Django either because

  • the used database does not support setting the pk by bulk_create (as documented) or
  • bulk_create disables the setting of the pk if ignore_conflicts=True is used (as documented)

So the error is wrong in this case. And in my view at least for lower level functions like bulk_create (which are for performance tuning) no error should be raised if its unclear for Django if there is really an error. Its up to the caller to ensure that things are right, at least for lower level functions.

Version 0, edited 3 years ago by Markus Friedrich (next)

in reply to:  4 comment:5 by Mariusz Felisiak, 3 years ago

So the error is wrong in this case.

Rows are saved in the database, but object instances don't have primary keys, so they are not associated with saved rows. Again, you can fix this by re-fetching new objects from the database before assigning them to a related model.

And in my view at least for lower level functions like bulk_create (which are for performance tuning) no error should be raised if its unclear for Django if there is really an error. Its up to the caller to ensure that things are right, at least for lower level functions.

I don't agree, bulk_create() and bulk_update() are supported and documented APIs. I see no reason to leave users on their own in this particular case.

You can start a discussion on DevelopersMailingList if you don't agree, where you'll reach a wider audience and see what other think. Thanks.

Note: See TracTickets for help on using tickets.
Back to Top