#26400 closed Cleanup/optimization (wontfix)
QuerySet bulk_create method to handle generators to prevent loading all objects in memory at once
Reported by: | Alexander Sterchov | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | bulk_create |
Cc: | Triage Stage: | Someday/Maybe | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In my case I need to create huge amount of objects using bulk create method with batch_size parameter.
The problem is I don't have enough memory to store all objects in a list. Even if I transmit generator it would be converted to list anyway (https://github.com/django/django/blob/1.9.4/django/db/models/query.py#L438).
I want to implement a feature to handle generators properly without loading all objects in memory, but bulk_create method returns list of objects as a result. That is unacceptable on large amounts of data.
How can I properly implement the method: create a new one or add a parameter to actual method?
Change History (7)
comment:1 by , 9 years ago
follow-up: 4 comment:2 by , 9 years ago
It's not the first time I need that opportunity. I mean sure I can write that split thing one more time as I did in other projects, but I don't see any reason why the feature couldn't be a part of Django.
comment:3 by , 9 years ago
My first inclination was the same as Simon's but if you want to show what the changes to bulk_create()
(or a new method) would look like, we can run it by the DevelopersMailingList to get some other opinions.
comment:4 by , 9 years ago
Replying to likeon:
It's not the first time I need that opportunity. I mean sure I can write that split thing one more time as I did in other projects, but I don't see any reason why the feature couldn't be a part of Django.
We'd have to alter both the signature and the return type of bulk_create
in order to pass a flag enabling this feature and make sure not to return a list of the created objects. At this point I think this should be handled by another method/function.
I personally don't believe this use case is common enough to warrant an inclusion in Django but as Tim pointed out you could try leveraging support from the community on the developer mailing list.
comment:5 by , 9 years ago
Triage Stage: | Unreviewed → Someday/Maybe |
---|
comment:6 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Closing in absence of follow up or discussion.
comment:7 by , 8 years ago
#28231 is a follow up ticket requesting similar behavior. The current consensus seems to be to document the behavior rather than to change it.
I'm not sure this is worth including into Django, is there a reason you can't split your
bulk_create
calls into batches that fit into memory?You could even make this a manager method if required.