#24366 closed Cleanup/optimization (fixed)
Improve migration dependency graph speed
Reported by: | Markus Holtermann | Owned by: | Marten Kenbeek |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Markus Holtermann, Marten Kenbeek | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With a bout 750 migration files and a total of about 1250 dependency definitions I end up with the following cProfile stats when calling executor.migrate_plan(targets)
in the migrate command:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 929.770 929.770 <string>:1(<module>) 38 0.001 0.000 229.709 6.045 __init__.py:41(__init__) 63854649 100.666 0.000 101.193 0.000 __init__.py:58(__setitem__) 16077 0.009 0.000 0.009 0.000 __init__.py:83(__iter__) 38 0.000 0.000 0.000 0.000 _collections_abc.py:437(keys) 38 0.000 0.000 0.000 0.000 _collections_abc.py:462(__init__) 16077 0.007 0.000 0.016 0.000 _collections_abc.py:481(__iter__) 38 103.342 2.720 229.708 6.045 _collections_abc.py:581(update) 76 0.000 0.000 0.000 0.000 _weakrefset.py:70(__contains__) 38 0.000 0.000 0.001 0.000 abc.py:178(__instancecheck__) 38 0.001 0.000 229.710 6.045 datastructures.py:13(__init__) 63854687 25.172 0.000 25.172 0.000 datastructures.py:14(<genexpr>) 38 0.000 0.000 0.001 0.000 datastructures.py:28(__iter__) 1 0.051 0.051 929.770 929.770 executor.py:23(migration_plan) 12 0.000 0.000 0.000 0.000 executor.py:49(<genexpr>) 38 0.350 0.009 0.504 0.013 graph.py:109(ensure_not_cyclic) 38 251.357 6.615 929.097 24.450 graph.py:129(dfs) 38 0.622 0.016 929.719 24.466 graph.py:58(forwards_plan) 63909407 102.207 0.000 141.411 0.000 graph.py:67(<lambda>) 1 0.000 0.000 929.770 929.770 {built-in method exec} 38 0.000 0.000 0.000 0.000 {built-in method hasattr} 38 0.000 0.000 0.001 0.000 {built-in method isinstance} 38 0.000 0.000 0.000 0.000 {built-in method iter} 114 0.000 0.000 0.000 0.000 {built-in method len} 16077 0.527 0.000 0.527 0.000 {built-in method proxy} 63854661 259.600 0.000 259.600 0.000 {built-in method sorted} 408 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 38 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects} 26476 0.006 0.000 0.006 0.000 {method 'append' of 'list' objects} 63854611 10.834 0.000 10.834 0.000 {method 'appendleft' of 'collections.deque' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 63854611 26.316 0.000 26.316 0.000 {method 'extendleft' of 'collections.deque' objects} 63909419 39.204 0.000 39.204 0.000 {method 'get' of 'dict' objects} 38 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 38 0.000 0.000 0.000 0.000 {method 'keys' of 'dict' objects} 28690 0.010 0.000 0.010 0.000 {method 'pop' of 'list' objects} 2622 0.001 0.000 0.001 0.000 {method 'pop' of 'set' objects} 63854611 9.471 0.000 9.471 0.000 {method 'popleft' of 'collections.deque' objects} 26068 0.014 0.000 0.014 0.000 {method 'remove' of 'set' objects}
I think this applies to 1.7+, though I only tested it on the current master
Change History (16)
comment:1 by , 10 years ago
Cc: | added |
---|
comment:2 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 10 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 10 years ago
Sure. There are still a few kinks to work out (e.g. the CircularDependencyError
message throws another error), but I believe the generated graphs satisfies the requirements. There are still 5 test failures.
The biggest issue here is that migrations.test_graph.GraphTests.test_dfs
fails because it reaches the maximum recursion depth. This is expected and a result of switching (back) to a recursive design. I haven't looked into memory usage yet, I'm pretty sure memory usage is increased significantly, but I don't know how much.
comment:6 by , 10 years ago
The caching does wonders when performing searches in super/subtrees of previously performed searches, but the algorithm itself is a real downgrade from the previous one.
I'm also looking into partially ordered sets to see if that is any faster, I'll let you know if anything comes out of it.
comment:7 by , 10 years ago
Has patch: | set |
---|
My current results: 991 migrations, 2280 dependencies, and it takes about 15 seconds to generate a full migration plan with python 2.7, about 20 seconds with python 3.4. In both cases, about 75% of that time goes to the initialization of the internal OrderedDict
of the OrderedSet
in Node.ancestors
and Node.descendants
.
Rich ordering based on node.key
resulted in the exact same ordering as the previous algorithm.
comment:8 by , 10 years ago
Thanks for your effort. This looks really promising! I assume the test case marked as expected failure is due to the step back to an recursive approach? Do you think it would be possible to make your approach an iterative solution as well? If we don't loose caching we could get rid of a couple of more seconds due to expensive function calls.
comment:9 by , 10 years ago
Patch needs improvement: | set |
---|
comment:10 by , 10 years ago
Needs tests: | set |
---|
comment:11 by , 10 years ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
comment:12 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Preliminary results: from 25 minutes to 2 seconds with an implementation based on a
Node
class and a recursive, cachedget_ancestors()
method.The order isn't the exact same, but the following holds true: