I like the general direction of introspection + migration, so I thought I'd go into some detail about the api itself, and suggest a few changes. Finally, if given possitive feedback, I would build this, if no one else is active on it.
Evolution File
The most basic change I'd make is to put, by default, all the migration code into one file right next to the models.py, called something like "models.evolution.py". Surely we could also split the file up, but I think it'd be a lot simpler and nicer if we didn't have a bunch of files being created. The api would consist of one function, with keyword options. The one function would be evolve()
, it 'evolves' the database to match the current app models (so evolution is done on the app level, as opposed to the model level). evolve()
would take 1 required keyword argument: version
, indicating what version of the model this evolution is for. So if the model is at, say version 5, and evolve(version=4)
is hit, that evolution would be ignored. evolve()
would also take keyword arguments create
, drop
, and rename
, which creates, drops, and renames tables/models. create
and drop
will take a list of strings that are names of the models to change, and rename
will take a mapping of string to string, old-model to new-model. It will also take pre
and post
, which may be assigned functions that will execute commands before and after the changes are made to the database.
Process
The user would perform the evolution with a two-step process. First they would run ./manage.py evolve [app]
, which would inspect each app, or just the specified one, for changes, and then append those changes to the "models.evolution.py". It would then explain, in a natural language (e.g. english) all the changes that are to be made. Perhaps at this stage it should also suggest backing up the database and/or offer to do it for the user. Finally the user would run ./manage syncdb as usual. This would now have the extra functionality of running all the "models.evolution.py" files in the apps. If there is no change, each evolve()
would return silently as their version
keyword would be lower than the current version. Otherwise, each evolve()
that has the higher version would run, making changes to the database and incrementing the current version.
Model Shadow
Each model file will also have a sort of shadow that is created each time the syncdb updates the database. This will store the current models and version. This way, changes can be detected if the user changes the models.py file. This could be implimented as either a static file, say called ".models.shadow.py" (easier), or in the database (tricky).
Pre and Post
Here is where custom work can be done to make any sort of change the user wishes. The functions given to evolve()
as keywords pre
and post
must take two arguments: models
, and cursor
. cursor
is simply the database cursor so that custom SQL commands can be run. models
is a module holding the models in the app. But here's the rub: for post
the models come from the current models.py file, but for pre
it comes from the model shadow, as that matches the state of the database before changes are made.
Example
Imagine we are creating a very simple blog application. We, naively decide to make one model called 'Blog', which is a blog entry. We want to give it title, body, and pub_date fields. So we create a model that looks like so:
class Blog(models.Model): title = models.CharField(maxlength=60) body = models.TextField() pub_date = models.DateTimeField(auto_add_now=True)
Later on, we realize that what we are calling a "Blog" object is really a blog "Entry" object. We also decide that we want to give it a "tag" field, so that we can apply tags to each entry. So we make the changes in the model, and add "tag = models.CharField(maxlength=20)". ./manage.py evolve realizes that the fields of what it sees as a newly created "Entry" model almost match the fields of the newly deleted "Blog" model, so it puts two and two together and realizes that it should rename "Blog" to "Entry", and add the tag field to it. It creates this script:
#### VERSION 2 ##### evolve( version = 2, rename = {'Blog': 'Entry'} )
Since this is the first update of our model, it will update to version 2. ./manage.py syncdb now creates a shadow file (or updates entries in a database shadow), and sets the version to 2.
Much later, after we've already created many blog entries, we realize that our tag field is not sufficient. It only allows us to add one tag! What we really need is a Tag model, and a ManyToManyField with our Entry model. So we update our models.py file:
class Tag(models.Model): name = models.CharField(maxlength=32) class Entry(models.Model): tags = models.ManyToManyField(Tag) title = models.CharField(maxlength=60) body = models.TextField() pub_date = models.DateTimeField(auto_add_now=True)
Then we run ./manage.py evolve
and it explains what it wants to do:
Applying this evolution will: Create the model Tag. Drop the Entry field "tag". Add the ManyToMany field "tags" to Entry refering to Tag.
It also appends to the evolution file making it:
#### VERSION 2 ##### evolve( version = 2, rename = {'Blog': 'Entry'} ) #### VERSION 3 ##### evolve( version = 3, create = ['Tag']
On the next sync, it will then create a table 'Tag', and it will automatically make the changes to Entry, as it is aware of what changes need to be made. Well, that sounds good on the surface, but that means that we will lose all of our already applied tags. We're going to have to move some data around to make a Tag object for each tag already on an entry. So before we syncdb we go in and update the models.evolution.py
file to look like this:
#### VERSION 2 ##### evolve( version = 2, rename = {'Blog': 'Entry'} ) #### VERSION 3 ##### tag_set = [] entry_to_tag = {} def presync(models, cursor): global tag_set, entry_to_tag # Get every unique tag in our entries tag_set = set([entry.tag for entry in models.Entry.objects.iterator()]) # Map each entry to a tag entry_to_tag = dict([(entry.id, entry.tag) for entry in models.Entry.objects.iterator()]) def postsync(models, cursor): # Create the Tag objects for tag in tag_set: models.Tag(name=tag).save() # Add a tag for each entry from the earlier mapping. for entry in models.Entry.objects.all(): entry.tags.add(models.Tag.get(name=entry_to_tag[entry.id]) entry.save() evolve( version = 3, create = ['Tag'], pre = presync, post = postsync )
First we create some global variables to hold our data through the database update process. Then we define a presync()
function that will run before the database is updated and uses the models from our shadow file. Finally we create a postsync()
function that performs the updates after the changes have been made to the database. Now, that's what we want to do, so we run ./manage.py syncdb, and all of our changes have been made, w00t!
Then some guy comes in and start messing with our app. He REALLY wants comments added, so he goes in and makes some changes, the bastard, and doesn't consult you. He adds a model:
class Comment(models.Model): body = models.CharField(maxlength=1028) maker = models.CharField(maxlength=50) address = models.CharField(maxlength=30)
He then deletes the evolution file (wtf!?), runs ./manage.py evolve and then runs syncdb. The evoltution file now looks like:
#### VERSION 4 ##### evolve( version = 4, create = ['Comment'] )
First of all, on the dbms we're running (mysql), a maxlength of 1028 is well beyond the 256 that will convert to a TextField anyway. So that should be changed. Second 'maker' and 'address' are terrible names. You find out they should be 'author' and 'email'. And that email field needs to be enlarged (maxlength>30) and made able to be blank, because some people won't leave their email. Gratefully, this project isn't distributed, so deleting the evolution file turns out to have no effect. So you make the changes:
class Comment(models.Model): body = models.TextField() author = models.CharField(maxlength=50) email = models.CharField(maxlength=128, blank=True)
./manage.py evolve then runs and says:
Applying this evolution will: Rename the Comment field "maker" to "author". Change the Comment field "body" to a TextField(). Add the Comment field "email" as a CharField(maxlength=128, blank=True). Drop the Comment field "address".
Well that's almost what you want. It was able to guess that "maker" needed to be changed to "author", because the two fields were identical CharField(maxlength=50), but "address" and "email" are not, so it tells you that it will drop "address", and add "email". Let's change things around a bit. You decide to delete the evolution file to erase the memory of his changes, and create a new one:
#### VERSION 5 ##### evolve( version = 5, rename = {'Comment.maker': 'Comment.author', 'Comment.address': 'Comment.email'} )
This is all that is needed to tell syncdb to rename the fields instead of dropping them. Their field types are automatically changed to reflect the new model.
Comments
A few things:
- There is no reason that you couldn't also break migration code up into smaller files, but I think this way is cleaner, and easier to deal with as a default.
- To add funcitonality for rolling back the changes, one would have to add "pre_rollback" and "post_rollback" keywords to the
evolve()
function. And we'd have to keep a record of all past models much like the shadow model. You could easily then, assume that on rollback: create -> drop, drop -> create, and rename is reversed. - If one could keep the shadow models in the database, that would be best, but I'm not sure the best way of doing that.
- Are there any other operations other than "create", "drop", and "rename", that might need to be expressed? Obvious things, such as changing a field options like "maxlength", which aren't ambiguous, don't need directives.
- pbx brings up that we might want to evolve from, say version 3 to version 4, even if version 5 is available. I'm not sure how to perform this with this system, other than commenting the version 5 code. Maybe there could be a
__target_version__
variable at the top of the evolution file that could be altered.