#29899 closed Cleanup/optimization (fixed)
Adapt the auto-detector to detect changes from model states instead of model classes
Reported by: | Simon Charette | Owned by: | David Wobrock |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, David Wobrock | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Rendering models for the sole purpose of detecting changes between them takes a significant amount of time on large project.
Instead of comparing the fake models rendered from the state generated by applying state_forwards
of all existing migrations against the current model state generating model states from the current model classes and comparing both would shave off a model rendering phase.
This was extracted from the meta ticket #22608 which tracks slow parts of the migration framework.
An initial and dusty stab at this can be found here https://github.com/django/django/compare/master...charettes:autodetector-model-states
Note that this has a bit of overlap with #29898 because both optimizations require a way to resolve relations efficiently from model states.
Change History (16)
comment:1 by , 5 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 5 years ago
Hello David,
Am I understanding correctly that, by using the ModelState objects, instead of the rendered model objects/classes, we avoid calling the ProjectState.apps and ProjectState.concrete_apps (in the Autodetector.init) which are rendering the models? Therefore, we avoid work and speed up the makemigrations and migrate commands?
Exactly, accessing .apps
and concrete_apps
involves model rendering which is really slow.
But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?
Right but from_state
is generated by calling states_forwards
on all operations of on-disk migrations and none of these operations should access .apps
during states_forwards
which should make them relatively fast.
The one thing about the logic which I'm unsure of, is the "relations" property which is introduced in the ProjectState. The goal is, I guess, to substitute the "related_objects" that a model class' option contains.
That's it
However, I don't fully understand how the "proxies", "concrete" models and "swappable" ones work together.
You'll have to do a bit more of investigation on your side to figure this out I'm afraid but the gist is that proxies models are Meta.proxy = True
models which are always backed by a concrete table backed one. swappable refers to models that can be swapped by other ones. Currently only auth.User
uses this feature internally.
I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.
Yeah this code is tricky for sure.
What are those relations exactly? All possible defined FK (also m2m, ...) on the model itself? Or also fields FK that point to this model?
IIRC they are all forwards relationships which means related fields from the model.
And would it possible, even quickly, to explain how "swappable", "proxy" and "concrete" work together, I'd be grateful. I mean, I guess that we need to resolve the real/concrete model from a proxy model, but "swappable" is my main source of confusion here and I don't know how it should be handled :/ Like a proxy model because we resolve the concrete model when the User model is swappable?
I tried to explain it above but I suggest you start by reading the (limited) swappable model documentation and then dive into how they are currently handled. A lot of this logic is tribal knowledge share by the limited amount of people who worked on the migration framework over the years and its hard to define beyond what's coded and currently passes the suite. The basic idea is that swappable models could have been swapped by another one (e.g. a custom user model via settings.AUTH_USER_MODEL
) so it needs to be treated differently.
As mentioned in the description there's a lot of overlap with this ticket and #29898 which I tried to describe on the mailing list to a developer interested in working on the latter for GSoC. Both tickets require access to a ProjectState
forwards relationship cache (and maybe a reverse one as well in this case) and both would benefit from having ProjectState
define methods that perform state alterations currently baked into Operation.state_forwards
overrides. That would allow the auto-detector to simply call these operations if necessary when performing renames instead of maintaining tons of different maps of changes. It would also pave the way for ProjectState
/ModelState
to have their own .difference(other, questionner=None)
methods to break the monolithic autodetector
file into testable and extendable components which users have been asking for a while.
comment:3 by , 5 years ago
Patch needs improvement: | unset |
---|
comment:4 by , 5 years ago
Patch needs improvement: | set |
---|
comment:5 by , 4 years ago
Patch needs improvement: | unset |
---|
comment:6 by , 4 years ago
Patch needs improvement: | set |
---|
comment:7 by , 4 years ago
Patch needs improvement: | unset |
---|
comment:8 by , 4 years ago
Patch needs improvement: | set |
---|
comment:9 by , 4 years ago
Patch needs improvement: | unset |
---|
comment:10 by , 4 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
follow-up: 15 comment:14 by , 3 years ago
On this issue, does anyone remember if there was a reason that self.relations
was defined to have this form--
{remote_model_key: {model_key: [(field_name, field)]}}
as opposed to this form?
{remote_model_key: {model_key: {field_name: field}}}
While reviewing PR #14587 for #29898, I noticed the code would be a bit simpler in parts if it were the latter as opposed to the former, but I wasn't sure if there was a reason for a using a list of pairs instead of a dict.
comment:15 by , 3 years ago
Replying to Chris Jerdonek:
On this issue, does anyone remember if there was a reason that
self.relations
was defined to have this form--
{remote_model_key: {model_key: [(field_name, field)]}}
as opposed to this form?
{remote_model_key: {model_key: {field_name: field}}}
While reviewing PR #14587 for #29898, I noticed the code would be a bit simpler in parts if it were the latter as opposed to the former, but I wasn't sure if there was a reason for a using a list of pairs instead of a dict.
Probably because there was no need to update the list of fields. I was also thinking about changing this structure to a dict.
Hi there,
I got more or less my head around what is asked in the ticket and how it would improve the autodetector. However, I have some questions and I'd like to ask for some help and expertise on this issue.
I'll just shoot my questions:
But since the "from_state" is still generated from the migration graph, we will still apply all existing migrations iteratively to build it, right? So we cannot totally avoid exploring the migration graph?
And let's get down to some more technical business :) I started re-working the patch, it's still a work in progress, but I think most of the logic of the autodetector is migrated to using model states. You should be able to look at the patch commit-by-commit for convenience. I'll still need to work on tests, fix the existing ones and adding some new ones to ensure I introduced no regression. Please find my current progress here https://github.com/django/django/compare/master...David-Wobrock:ticket-29899?expand=1
I see that we are trying to fill the relations, for each model (identified by app_label+model_name), have the related models (identified by the same key) and the associated fields that have a relation. It's not easy to get your head around the 4 consecutive loops.
Thanks a lot in advance! I'm assigning myself the issue.