Opened 10 years ago
Closed 9 years ago
#24997 closed New feature (fixed)
Allow bulk_create with proxy inheritance
Reported by: | William Schwartz | Owned by: | William Schwartz |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | proxy, inheritance, bulk_create, queryset, insert |
Cc: | ravi.s@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Overview
The documentation for the Queryset method bulk_create
states:
It does not work with child models in a multi-table inheritance scenario.
bulk_create
also does not work with proxy inheritance, which does not involve multiple tables. Rather than changing the documentation to reflect that neither type of inheritance is supported, it would be better to allow proxy inheritance to work.
Origin of the problem
It makes sense not to allow bulk_create
with multiple tables. The comment in bulk_create
explains it best:
So this case is fun. When you bulk insert you don't get the primary keys back (if it's an autoincrement), so you can't insert into the child tables which references this. There are two workarounds, 1) this could be implemented if you didn't have an autoincrement pk, and 2) you could do it by doing O(n) normal inserts into the parent tables to get the primary keys back, and then doing a single bulk insert into the childmost table. Some databases might allow doing this by using RETURNING clause for the insert query. We're punting on these for now because they are relatively rare cases.
bulk_create
prevents inherited models from using it with the following naive test:
if self.model._meta.parents: raise ValueError("Can't bulk create an inherited model")
This test does not discriminate between multi-table inheritance and proxy inheritance. However, simply modifying the code to read
if self.model._meta.parents and not self.model._meta.proxy: raise ValueError("Can't bulk create an inherited model with multiple tables")
does not fix the problem because it results in more exceptions.
Reproducing this error
I did the following using Django 1.8.2 and Python 3.4.3 on Mac OS 10.9.5.
django-admin startproject bulk_create_proxy cd bulk_create_proxy ./manage.py startapp app
Add 'app'
to the INSTALLED_APPS
list in the resultant settings.py
. Save the following code in app/models.py
.
from django.db import models class Parent(models.Model): name = models.CharField(max_length=10) class Child(Parent): class Meta(object): proxy = True
Save the following code in app/tests.py
.
from django.test import TestCase from app.models import Parent, Child class TestBulkCreate(TestCase): def assert_can_bulk_create(self, model): model.objects.all().delete() model.objects.bulk_create([ model(name='a'), model(name='b'), model(name='c')]) self.assertEqual(model.objects.count(), 3) def test_bulk_create_parent(self): self.assert_can_bulk_create(Parent) def test_bulk_create_child(self): self.assert_can_bulk_create(Child)
Finish up with the following commands.
./manage.py makemigrations app ./manage.py migrate ./manage.py test
You should get the following test output. (I have elided file paths; ones starting with ...django
come from Django.)
bulk_create_proxy ➜ ./manage.py test Creating test database for alias 'default'... E. ====================================================================== ERROR: test_bulk_create_child (app.tests.TestBulkCreate) ---------------------------------------------------------------------- Traceback (most recent call last): File "...bulk_create_proxy/app/tests.py", line 18, in test_bulk_create_child self.assert_can_bulk_create(Child) File "...bulk_create_proxy/app/tests.py", line 11, in assert_can_bulk_create model(name='a'), model(name='b'), model(name='c')]) File "...django/db/models/manager.py", line 127, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "...django/db/models/query.py", line 374, in bulk_create raise ValueError("Can't bulk create an inherited model") ValueError: Can't bulk create an inherited model ---------------------------------------------------------------------- Ran 2 tests in 0.003s FAILED (errors=1)
Change History (12)
comment:1 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Bug → New feature |
comment:2 by , 10 years ago
I'll try to work this up into a real patch, but what I found was that the following changes relative to the code at git tag 1.8.2 make the test cases described in this ticket pass.
--- django/db/models/query.py 2015-06-17 17:29:38.000000000 -0400 +++ django/db/models/query.py 2015-06-17 17:31:26.000000000 -0400 @@ -370,13 +370,13 @@ # this by using RETURNING clause for the insert query. We're punting # on these for now because they are relatively rare cases. assert batch_size is None or batch_size > 0 - if self.model._meta.parents: + if self.model._meta.parents and not self.model._meta.proxy: raise ValueError("Can't bulk create an inherited model") if not objs: return objs self._for_write = True connection = connections[self.db] - fields = self.model._meta.local_concrete_fields + fields = self.model._meta.concrete_fields objs = list(objs) self._populate_pk_values(objs) with transaction.atomic(using=self.db, savepoint=False):
comment:3 by , 10 years ago
One possible approach is that we change bulk_create()'s approach to be "save the objects to the database using fastest method available". We could add a couple of flags, too, for example signals=True/False and with_pk=True/False. The defaults would be False and False.
Then we would check if the given database can do a fast bulk-insert, or if we need to switch to a loop with obj.save(force_insert=True).
I guess the biggest problem is that it might be a bit surprising if bulk_create doesn't do the insert in bulk. But even then, the current alternative is to do the insert by a loop of save() calls, and that is exactly what you will get from the updated bulk_create.
For example in fixture loading it would be very convenient if one could always call bulk_create(), and get the fastest possible bulk insert for the given model class and database backend combination.
comment:4 by , 10 years ago
I think that as a user, I would be surprised if bulk_create
did not execute in single (or very small number of, depending on batch_size
) INSERTs. I can imagine stackoverflow threads asking "How to force bulk_create to use single insert," with answers to the effect of, "Back in the day, it always executed in a single INSERT but now it's a bit of a guessing game". I'm not saying that it would be a guessing game, I'm just saying that having bulk_create
decide what to do will feel opaque to users. The current way it works has the advantage of being fully explicit.
Further, I'm not sure I see such an expansion of bulk_create
's duties as being germane to allowing proxy models to use the method, except if your goal is also to allow multi-table models to use it. But the latter consideration seems like it should be a separate ticket.
comment:5 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 10 years ago
Work ongoing at https://github.com/wkschwartz/django/tree/ticket_24997.
I've created a "practice" pull request at https://github.com/wkschwartz/django/pull/1. Before submitting a real pull request to Django, I'll squash my commits.
comment:7 by , 10 years ago
I have also created a forward port for Django 1.9 (based on the master branch) at https://github.com/wkschwartz/django/tree/ticket_24997_19. (It's simply a cherry-pick from the other branch.) The documentation in this version of the patch still shows the bug being fixed in Django 1.8.3. The purpose of this version of the patch is merely to make it easy to forward-port the fix.
I've created another "practice" pull request at https://github.com/wkschwartz/django/pull/2.
All commits are now squashed.
Tests pass on Sqlite and Postgresql.
Unless someone says otherwise beforehand, I'll submit both pull requests tomorrow morning.
comment:8 by , 10 years ago
Per our supported version policy, I don't think this would qualify for a backport to 1.8. I'd suggest to take a look at our patch review checklist before you submit the PR, but generally the patch looks okay.
comment:9 by , 10 years ago
Has patch: | set |
---|
Pull request submitted: https://github.com/django/django/pull/4886
Note I've updated it since @timgraham looked at it this morning. It now correctly handles the case where a proxy model inherits from a multi-table model.
comment:10 by , 10 years ago
Version: | 1.8 → master |
---|
comment:11 by , 9 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:12 by , 9 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
If it's feasible to implement, I don't see why we couldn't do it.