#22608 closed Cleanup/optimization (duplicate)
Migrations optimizer slow (due to double working?)
Reported by: | David S | Owned by: | nobody |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Andrew Godwin, paul.collins.iii@…, michaelvantellingen@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
makemigrations
is currently fairly conservative with splitting operations into separate migrations, e.g. when there are fk dependencies. after manually merging some of these i noticed a significant speedup, which surprised andrewgodwin
did some basic profiling and it looks like many smaller migration need to re-do some work. will try to look more into this, but wanted to raise a ticket to track
separate:
38 migrations over 9 apps (3, 14, 6, 3, 1, 1, 5, 2, 3)
profiling ./manage.py test app1
around executor.migrate(targets, plan, fake=options.get("fake", False))
17353606 function calls (17308702 primitive calls) in 39.818 seconds Ordered by: internal time List reduced from 505 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 152757 7.436 0.000 12.352 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:290(deconstruct) 155191 3.989 0.000 5.518 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:121(__init__) 22617 3.125 0.000 4.428 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:276(swappable_setting) 4561050/4537097 2.817 0.000 2.843 0.000 {getattr} 419 2.477 0.006 2.478 0.006 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 16993 1.477 0.000 27.452 0.002 /Users/david/dev/django_source/django/db/migrations/state.py:229(clone) 12854 0.833 0.000 1.117 0.000 /usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/collections.py:37(__init__) 852604/852602 0.796 0.000 1.643 0.000 {isinstance} 1558116 0.706 0.000 0.882 0.000 /Users/david/dev/django_source/django/db/models/options.py:282(_swapped) 3124169 0.655 0.000 0.666 0.000 {method 'get' of 'dict' objects}
manually merged into 1 migration per app:
3650282 function calls (3640151 primitive calls) in 9.895 seconds Ordered by: internal time List reduced from 495 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 321 0.959 0.003 0.959 0.003 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 19363 0.898 0.000 1.646 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:290(deconstruct) 20870 0.648 0.000 0.874 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:121(__init__) 3236 0.432 0.000 0.656 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:276(swappable_setting) 642018/638002 0.427 0.000 0.432 0.000 {getattr} 32654 0.331 0.000 0.331 0.000 {method 'items' of 'dict' objects} 5662 0.286 0.000 0.286 0.000 /Users/david/dev/django_source/django/db/models/base.py:34(subclass_exception) 6823 0.232 0.000 0.232 0.000 /Users/david/dev/django_source/django/utils/functional.py:15(curry) 180384 0.230 0.000 0.230 0.000 {hasattr} 3673 0.225 0.000 0.511 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:1584(__init__)
Change History (27)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
That's unfortunately slow. There's no way to turn off the optimizer - if there was, it would result in some really awful migrations with ridiculous dependency trees that would probably eventually move the slowness to migration running rather than having it in migrations.
I'm hopeful that there's a decent fix for this that just speeds things up - I suspect something along the lines of memoization or dynamic programming for can_optimize_through. I'm not going to look at this until 1.7's release blockers are done (this isn't that serious, unfortunately for you), but if you have some time I'd love to see the results of wrapping memoization around can_optimize_through and references_model (Django has stuff for this already you can use)
comment:4 by , 10 years ago
Cc: | added |
---|
comment:5 by , 10 years ago
So I put the lru_cache decorator on both functions, and got very little speed up until I started significantly upping the maxsize for each cache. Caching can_optimize_through had the largest effect, but execution still took several minutes
CacheInfo(hits=36255232, misses=261458, maxsize=10000000, currsize=261458) 4466699344 CacheInfo(hits=0, misses=174305, maxsize=100000, currsize=100000) 4466725712 CacheInfo(hits=0, misses=174305, maxsize=100000, currsize=100000) 4467086928 CacheInfo(hits=0, misses=174305, maxsize=100000, currsize=100000) 4467086992 CacheInfo(hits=0, misses=174305, maxsize=100000, currsize=100000) 4467087056 CacheInfo(hits=0, misses=174305, maxsize=100000, currsize=100000) ..... many many many more like this snipped
The first cache info are the stats from can_optimize_through, the rest are from the references_model calls on operations.CreateModel objects.
Ya this reduces the execution time a bit, from 600 seconds down to 445, but the amount of duplicate work going on in the function still overwhelms it.
comment:6 by , 10 years ago
One more bit of profiling. Seems there's also a performance effect in just running migrations themselves. Not makemigrations
, plain old migrate
Thankfully, future runs on migrate do complete fairly quickly (so detecting that a migration has been applied doesn't hang), but that first run with all the create models has this
In [16]: st.sort_stats('calls').print_stats(.01, 'django') 347866278 function calls (343597248 primitive calls) in 783.850 seconds Ordered by: call count List reduced from 6265 to 63 due to restriction <0.01> List reduced from 63 to 52 due to restriction <'django'> ncalls tottime percall cumtime percall filename:lineno(function) 43825144 9.022 0.000 9.231 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:283(_swapped) 43109503 37.593 0.000 76.663 0.000 /Users/paulcollins/personal/django/django/apps/config.py:163(get_models) 18811709 8.146 0.000 8.146 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:395(__lt__) 15940598 7.469 0.000 7.477 0.000 /Users/paulcollins/personal/django/django/conf/__init__.py:44(__getattr__) 8481848/4240926 35.813 0.000 35.814 0.000 /Users/paulcollins/personal/django/django/utils/deconstruct.py:11(__new__) 7890548/7888984 7.136 0.000 16.486 0.000 /Users/paulcollins/personal/django/django/utils/encoding.py:65(force_text) 7868194 28.469 0.000 619.572 0.000 /Users/paulcollins/personal/django/django/db/migrations/state.py:253(construct_fields) 7789220 105.012 0.000 108.728 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:123(__init__) 7765225 12.885 0.000 27.673 0.000 /Users/paulcollins/personal/django/django/utils/module_loading.py:15(import_string) 7765224 224.455 0.000 267.079 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:292(deconstruct) 7748778 9.231 0.000 9.231 0.000 /Users/paulcollins/personal/django/django/utils/encoding.py:55(is_protected_type) 4358494/4349576 4.213 0.000 74.373 0.000 /Users/paulcollins/personal/django/django/db/models/base.py:298(add_to_class) 4345041/4345040 4.058 0.000 8.686 0.000 /Users/paulcollins/personal/django/django/utils/functional.py:52(__get__) 4240920 0.848 0.000 0.848 0.000 /Users/paulcollins/personal/django/django/core/validators.py:239(__init__) 4236519 27.152 0.000 126.958 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:1010(__init__) 4236519 1.491 0.000 1.491 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:451(validators) 4026397 0.512 0.000 0.512 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:356(<genexpr>) 4026335 0.642 0.000 0.642 0.000 /Users/paulcollins/personal/django/django/db/models/base.py:337(<genexpr>) 4020334 0.638 0.000 0.638 0.000 /Users/paulcollins/personal/django/django/db/models/base.py:179(<genexpr>) 3948689 3.378 0.000 11.367 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:783(_get_choices) 3924040 6.756 0.000 53.242 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:572(contribute_to_class) 3924040 16.799 0.000 26.140 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:205(add_field) 3924040 6.162 0.000 9.011 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:565(set_attributes_from_name) 3922755 1.138 0.000 1.138 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:243(setup_pk) 3849324 0.585 0.000 0.585 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:583(get_attname) 3849324 2.124 0.000 2.709 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:586(get_attname_column) 2165556 11.071 0.000 40.730 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:1305(__init__) 2159422 4.478 0.000 70.027 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:1385(deconstruct) 1872034 0.455 0.000 0.455 0.000 /Users/paulcollins/personal/django/django/apps/config.py:142(check_models_ready) 763486 0.833 0.000 0.833 0.000 /Users/paulcollins/personal/django/django/dispatch/weakref_backports.py:47(__call__) 758037 2.113 0.000 2.113 0.000 /Users/paulcollins/personal/django/django/db/models/signals.py:20(_resolve_references) 266754 0.076 0.000 0.076 0.000 /Users/paulcollins/personal/django/django/utils/lru_cache.py:23(__hash__) 263798 0.347 0.000 0.347 0.000 /Users/paulcollins/personal/django/django/db/models/related.py:51(get_accessor_name) 217994 0.057 0.000 0.059 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:28(normalize_together) 216582 5.039 0.000 5.039 0.000 /Users/paulcollins/personal/django/django/db/models/base.py:34(subclass_exception) 213582 1.449 0.000 4.730 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:858(__init__) 213104 0.155 0.000 0.155 0.000 /Users/paulcollins/personal/django/django/db/migrations/state.py:279(<genexpr>) 211422 0.825 0.000 8.932 0.000 /Users/paulcollins/personal/django/django/db/models/fields/__init__.py:880(deconstruct) 210621 0.519 0.000 0.519 0.000 /Users/paulcollins/personal/django/django/utils/functional.py:15(curry) 180288 0.032 0.000 0.032 0.000 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1197(is_hidden) 178488 0.042 0.000 0.103 0.000 /Users/paulcollins/personal/django/django/db/models/options.py:518(<genexpr>) 177900 0.628 0.000 89.448 0.001 /Users/paulcollins/personal/django/django/utils/lru_cache.py:94(wrapper) 177900 0.732 0.000 0.949 0.000 /Users/paulcollins/personal/django/django/utils/lru_cache.py:26(_make_key) 177899 0.217 0.000 0.217 0.000 /Users/paulcollins/personal/django/django/utils/lru_cache.py:19(__init__) 177531 3.622 0.000 12.193 0.000 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1586(__init__) 177531 1.120 0.000 4.835 0.000 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1293(__init__) 177531 2.989 0.000 2.989 0.000 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1181(__init__) 177531 0.747 0.000 3.736 0.000 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1224(__init__) 176832 3.824 0.000 94.994 0.001 /Users/paulcollins/personal/django/django/db/models/fields/related.py:276(swappable_setting) 174312 2.177 0.000 100.941 0.001 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1361(deconstruct) 174312 1.432 0.000 102.373 0.001 /Users/paulcollins/personal/django/django/db/models/fields/related.py:1641(deconstruct) 173472 0.186 0.000 0.186 0.000 /Users/paulcollins/personal/django/django/db/migrations/writer.py:31(__new__)
This profile run was collected against 394053ce605a30c742d6270a80986b255adb3a30
comment:7 by , 10 years ago
On my system, I realized that each migration added to the Django test suite adds up 3.5 sec. to the initial test database initialization (both with SQLite and PostgreSQL).
comment:8 by , 10 years ago
In my profiling, I found that the most costly is rendering of the whole project state, at least two times for each migration, once for MigrationExecutor.detect_soft_applied
and one or two times in Operation.database_forwards
(27.5 secs cum time for 17 calls in Django's test suite, 8 migrations).
Andrew, it would be interesting to know if you tried to optimize this, and particularly if ProjectState.apps
could be reused/mutated or if this is a lost battle.
comment:9 by , 10 years ago
I suspect there is definitely some possible reuse, but when I tried some optimisations in that area before it got very hard to debug (as the bugs are side-effects of one thing affecting something else called four migrations later, for example).
I suspect we could at least cache the project states for each migration, if not start doing incremental renders (i.e. starting with the previous cached state not from zero) to make things faster (basically, bring in some memoization/dynamic programming). I just didn't have the time or energy to do that during the main release cycle.
comment:10 by , 10 years ago
Summary: | migrations slow (due to double working?) → Migrations optimizer slow (due to double working?) |
---|
comment:11 by , 10 years ago
Slowness about the migrate
process itself is separately tracked in #23745.
comment:12 by , 10 years ago
Please take a look at this related PR: https://github.com/django/django/pull/3540
comment:13 by , 10 years ago
This PR https://github.com/django/django/pull/3545 speeds up the optimizer by about 60% simply by pulling out the creation of two utility lists from function to module scope and caching the result of calling .lower() on the .name, .old_name, .new_name and .module_name attributes in Operation subclasses.
comment:14 by , 10 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
comment:15 by , 10 years ago
Version: | 1.7-beta-2 → master |
---|
comment:16 by , 10 years ago
Patch needs improvement: | unset |
---|
comment:19 by , 10 years ago
Has patch: | unset |
---|
Committed the PR, but no doubt more work can be done on this so leaving the ticket open.
comment:20 by , 10 years ago
It would be nice to get an up-to-date profiling report with a big project to know the progress done with recent commits.
comment:21 by , 10 years ago
my project, which spawned this ticket has certainly improved
when i raised the ticket:
17353606 function calls (17308702 primitive calls) in 39.818 seconds
vs
3650282 function calls (3640151 primitive calls) in 9.895 seconds
same code today (didn't re-run makemigrations
; they are mostly the same 38, + 10 new ones added in django
4268279 function calls (4178349 primitive calls) in 9.825 seconds
vs
3907185 function calls (3829169 primitive calls) in 9.213 seconds
(same old slow laptop)
details:
Wed Jan 7 22:37:59 2015 separate.dat 4268279 function calls (4178349 primitive calls) in 9.825 seconds Ordered by: internal time List reduced from 553 to 15 due to restriction <15> ncalls tottime percall cumtime percall filename:lineno(function) 28160 1.399 0.000 2.268 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:336(deconstruct) 471 1.292 0.003 1.292 0.003 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 5009 0.800 0.000 1.256 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:309(swappable_setting) 28672 0.738 0.000 1.036 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:137(__init__) 855100/849476 0.569 0.000 0.576 0.000 {getattr} 19325 0.305 0.000 0.448 0.000 /Users/david/dev/django_source/django/db/models/options.py:721(_expire_cache) 94 0.292 0.003 0.292 0.003 {method 'commit' of 'psycopg2._psycopg.connection' objects} 31534 0.259 0.000 5.680 0.000 /Users/david/dev/django_source/django/db/migrations/state.py:396(construct_fields) 407619 0.189 0.000 0.232 0.000 /Users/david/dev/django_source/django/db/models/options.py:348(swapped) 61368/188 0.177 0.000 0.785 0.004 /usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py:145(deepcopy) 690682 0.144 0.000 0.147 0.000 {method 'get' of 'dict' objects} 150051 0.133 0.000 0.133 0.000 {delattr} 144008/144006 0.128 0.000 0.299 0.000 {isinstance} 166461 0.127 0.000 0.127 0.000 {method 'update' of 'dict' objects} 50220 0.107 0.000 0.224 0.000 /Users/david/dev/django_source/django/apps/config.py:168(get_models)
squashed.dat% stats 15 Wed Jan 7 22:44:14 2015 squashed.dat 3907185 function calls (3829169 primitive calls) in 9.213 seconds Ordered by: internal time List reduced from 546 to 15 due to restriction <15> ncalls tottime percall cumtime percall filename:lineno(function) 392 1.457 0.004 1.458 0.004 {method 'execute' of 'psycopg2._psycopg.cursor' objects} 25440 1.202 0.000 1.980 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:336(deconstruct) 4832 0.773 0.000 1.226 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:309(swappable_setting) 26028 0.758 0.000 1.028 0.000 /Users/david/dev/django_source/django/db/models/fields/__init__.py:137(__init__) 773770/768262 0.514 0.000 0.520 0.000 {getattr} 21612 0.339 0.000 0.495 0.000 /Users/david/dev/django_source/django/db/models/options.py:721(_expire_cache) 28422 0.239 0.000 5.281 0.000 /Users/david/dev/django_source/django/db/migrations/state.py:396(construct_fields) 397942 0.185 0.000 0.227 0.000 /Users/david/dev/django_source/django/db/models/options.py:348(swapped) 52884/164 0.157 0.000 0.704 0.004 /usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py:145(deepcopy) 168181 0.146 0.000 0.146 0.000 {delattr} 619322 0.135 0.000 0.138 0.000 {method 'get' of 'dict' objects} 130099/130097 0.115 0.000 0.265 0.000 {isinstance} 151382 0.114 0.000 0.114 0.000 {method 'update' of 'dict' objects} 52222 0.110 0.000 0.236 0.000 /Users/david/dev/django_source/django/apps/config.py:168(get_models) 4371 0.089 0.000 0.459 0.000 /Users/david/dev/django_source/django/db/models/fields/related.py:1803(__init__)
comment:22 by , 10 years ago
Cc: | added |
---|
comment:23 by , 9 years ago
Here's a cProfile of manage.py makemigrations --empty myapp
on my i5 macbook with SSD.
- Django 1.9.1
- ~80 migrations
- ~300 tables
66325365 function calls (65778093 primitive calls) in 157.981 seconds
comment:24 by , 6 years ago
This might be of interest to the OP and people which contributed to this discussion particularly those which have posted reports of of their big Django projects with many migrations:
There is more work on this migrations efficiency front:
- Ticket: #29243
- Pull request at GitHub: https://github.com/django/django/pull/9804
- Post to the django-dev mailing list asking for test of the proposed changes (includes a profiling script): https://groups.google.com/d/msg/django-developers/PT86wX-oHKo/ZNRxt_E8BwAJ
comment:25 by , 6 years ago
Things have improved significantly since this ticket was opened 4 years ago.
makemigrations
past improvements include
- Multiple tickets to avoid or delay model rendering during operation's
state_forwards
. - #24554: Sped up migrations by rendering initial apps when they are first needed
- #29243: Improve efficiency of migration graph algorithm
the last remaining path that causes slowdown during the makemigrations
phase is the model rendering caused by the auto-detector. I just filed a ticket to cover that in #29899.
The main slowdown during the migrate
phase is the heavy model rendering required to pass fake models to the schema editor. Markus Holtermann have a long standing branch to completely stop rendering models to during the migration phase. That includes significant changes to the schema editor to be able to operate on model states instead of rendered model classes but it's a good step forward IMO. I just filed a ticket to track this optimization in #29898.
Not sure if there is anything actionable in this ticket at this point. I'd suggest we close it and open more focused ones instead.
comment:26 by , 6 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Just to add a bit of color to this report, I've got an app with about 200 models (it's for tracking some US tax data so please don't hate me).
Running makemigrations on this takes 10 minutes so I too pulled out the profiler just to get an idea of what was going on.
Just slightly different ways of slicing the same data.
This was for an app with no prior migrations and which only FK's out to the auth.user model.
If I do something horrible, like say checkout 59fec1ca9b3c426466f0c613a5ecf2badb992460 and go and modify django/db/migrations/optimizer.py so that can_optimize_through always returns false, well then building the migrations takes all of 8 seconds. If instead I go and poke around in django/db/migrations/operations/models.py and change the CreateModel.references_model method to always return False that drops the execution time back to reasonable as well. I haven't quite grokked what downstream effects that has yet, but I'm sure they're bad.
@andrew I hope this gives enough detail for you to see what's going on with the optimizer, otherwise is there I way I can turn it off for an app short of monkeypatching?