Opened 2 months ago
Closed 5 weeks ago
#35904 closed New feature (wontfix)
Speed up fixture loading by adding options bulk insert/create
Reported by: | JorisBenschop | Owned by: | |
---|---|---|---|
Component: | Testing framework | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
As per this forum discussion, I have created a patch to improve load times for the loaddata command under some circumstances.
Currently the “loaddata” management command uses the obj.save() method for each deserialized object within a fixture. This function first tries an UPDATE statement and, if that fails, tries an INSERT statement. By using the --force_insert a reduction of 50% of queries is achieved.
A second option is to use bulk_create for insertion of multiple records. This improves insertion speed by (n-1/n), or ~99% for insertion of 100 records.
These options are not meant to cover each use case, and therefore are set to optional.
Benchmark results
===============
test to insert 1000 records from a single fixture (using the Article model on Sqlite)
current: 0.116s
with --force_insert: 0.066s
with --bulk_create: 0.010s
test to insert 10000 records from a single fixture
current: 1.07s
with --force_insert: 0.39s
with --bulk_create: 0.104s
I expect larger models to have a more significant improvement even.
Change History (13)
comment:1 by , 2 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:2 by , 6 weeks ago
Summary: | Speed up fixture loading by bulk insert → Speed up fixture loading by adding options bulk insert/create |
---|---|
Type: | Uncategorized → New feature |
#35975 was a duplicate
Forum discussion: https://forum.djangoproject.com/t/feature-proposal-faster-fixture-loading-via-loaddata-command/36972
PR: https://github.com/django/django/pull/18889
comment:3 by , 6 weeks ago
Description: | modified (diff) |
---|---|
Has patch: | set |
Resolution: | wontfix |
Status: | closed → new |
comment:4 by , 6 weeks ago
Description: | modified (diff) |
---|
comment:5 by , 6 weeks ago
Description: | modified (diff) |
---|
comment:6 by , 6 weeks ago
As requested by Simon, I have re-opened the ticket and specified the expected improvements in a more exact manner. Steps to reproduce are covered in the tests that are in the PR. I am open to add code to the serde testing, if there is interest.
comment:7 by , 6 weeks ago
Description: | modified (diff) |
---|
comment:8 by , 6 weeks ago
Description: | modified (diff) |
---|
comment:9 by , 5 weeks ago
Is there any way i can progress this ticket? I addressed all the issues in the pr to my knowledge
comment:10 by , 5 weeks ago
Version: | 5.0 → dev |
---|
comment:11 by , 5 weeks ago
Hi Joris. The beige notice at the top of ticket advises the next step is for someone besides the author to accept the ticket. This is a new feature, so some engagement on the forum thread is desired and expected. It's been less than two weeks since the forum post was raised, which is in most cases a window too short to allow all voices to participate. I would advise allowing a little more time, especially around the holidays. Thanks for your dedication.
comment:12 by , 5 weeks ago
Hi Jacob, thank you so much for explaining this. I understand there are many tickets that ask for your attention. As a submitter, there is always a fine line between allowing time and losing momentum. By no means am i trying to rush you in any way, so i highly appreciate that you explain the expected timelines on this process.
comment:13 by , 5 weeks ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Closing as per my latest comment in the forum post, there is a lack of community traction for this and a third party app seems like the best next step for this feature request.
Hello Joris,
This sounds interesting particularly given features like test case serialized rollbacks (which are quite slow) are based on top of model serialization. It would have to be a distinct option as
bulk_create
doesn't fire signals which some setup might require.Just like any new feature requests though they should be discussed on the forum to reach a consensus before being accepted. Given this is a performance related new feature I suggest your proposal come equipped with some details about what kind of improvements users should expect (profiles, benchmarks instead of solely claiming it's fairly inefficient) backed by step to reproduce as well as a PoC that properly deals with other features of serde framework such as natural keys and a plan on how to deal with backends that don't support
ignore_conflicts
. It might even be a good opportunity to augment our performance tracking system with serde benchmarks.It that's the case then sharing this code as a standalone package (e.g.
django-fast-loaddata
) might be a good way to get traction on the above.Assuming there is interest in moving forward we can then re-open this issue.