#28641 closed New feature (needsinfo)
Improvements to QuerySet.bulk_create()
Reported by: | Дилян Палаузов | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
https://docs.djangoproject.com/en/1.11/ref/models/querysets/#bulk-create does not state what bulk_create returns, so that the implementation can be optimizerd by making bulk_create() return None.
As QuerySet.get_or_create() ensures, that after the operation an object exists in the database, bulk_create() can do the same on Postgresql using "ON CONFLICT DO NOTHING", possibly taking a parameter. I guess currently bulk_create fails as whole, if any INSERT cannot be performed, so that a developer has to do "for x in Y: x.get_or_create()" instead of buld_create([x for x in Y]).
INSERT (for bulk_create) can also use RETURNING to get the ids of the newly created objects which can be used then to send signals. Is there any particular reason, why on Postgresql bulk_create() does not send pre_save / post_save signals?
Change History (5)
comment:1 by , 7 years ago
comment:2 by , 7 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Type: | Uncategorized → New feature |
comment:3 by , 7 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Summary: | QuerySet.bulk_create() → Improvements to QuerySet.bulk_create() |
Hi, please try to keep a ticket focused on one issue. I'm going to close this ticket for now because it's too unfocused but I'll leave some ideas here about your proposals:
- Most likely we're not going to break backwards compatibility and change the return type of
bulk_create()
(there would need to be a really compelling reason to do that). Perhaps the return type should be added to the documentation.
- I'm not sure if it's common to use
bulk_create()
in situations where a race condition might be present. Most likely a discussion on the DevelopersMailingList is needed to form a consensus about your proposal regarding "ON CONFLICT DO NOTHING."
- There's some discussion in the comments of #19527 about a version of
QuerySet.bulk_create()
that sends signals. Again, this isn't something to change lightly as it has an impact on performance.
If you want to pursue these issues, I'd say the next step is to try searching Trac using a Google search with site:code.djangoproject.com and related keywords to find past discussion. You can also search the archives of the DevelopersMailingList. Then you would make a post to the DevelopersMailingList that summarizes past discussion of the issue(s) and includes your proposal about how to move forward. If the discussion yields a consensus, then we can open a more focused ticket (or tickets) with the improvement(s).
Thanks!
comment:5 by , 5 years ago
Hi Tim.
Based on my long-term experience the inconsistency in sending signals is a source of many logical errors. They're hard to debug, because signals are implicit.
Please consider an example:
signals.post_save.connect(fn, sender=SomeModel) signals.post_delete.connect(fn, sender=SomeModel) [ ... ] to_create = [ ... ] # list of model instances to create to_update = SomeModel.objects.filter(...) # queryset of models to update SomeModel.objects.bulk_create(to_create) for obj in to_update: obj.field1 = ... obj.save()
It's not obvious whether nor when fn()
will be called. In fact it will be called for to_update
objects, but not for to_create
.
The code must be changed in a way, where fn()
must be called explicitely for each instance created by bulk_create
:
signals.post_save.connect(fn, sender=SomeModel) signals.post_delete.connect(fn, sender=SomeModel) [ ... ] to_create = [ ... ] # list of model instances to create to_update = SomeModel.objects.filter(...) # queryset of models to update SomeModel.objects.bulk_create(to_create) for obj in to_create: fn(obj) for obj in to_update: obj.field1 = ... obj.save()
The code is now more explicit. You see when and where fn()
is called. But there is a catch - fn()
is also called during obj.save() in update section.
In terms of code readability it's even worse than first example.
In such case a developer will decide to remove signals, but by that way he will introduce regression and another software failure, because he don't know what and where emits them. A developer will conclude that signals are bad by design.
We all know that signals are helpful, because they're extension-like interfaces for 3rd-party apps (like contrib.admin). But they must be consistent - every save (create, update) must emit (post_)save signal, and every delete must emit (post_)delete signal. Without that there is a use case, where you can't do what developer really need: do something after each save of the model for in-project and 3rd-party apps at the same time.
Part 1: Send post_save signal in bulk_create: