This site is now 100% read-only, and retired.

XML logo

Managing database updates
Posted by Steve on Fri 2 Feb 2007 at 03:17

One thing I'm curious about is how people doing incremental development on "live" sites handle changes in their database schemas.

The code behind this site sees semi-regular updates to the database tables which are used, slowly working towards a better structure than the historically bad one.

This means that code is updated, tested locally, then pushed out to the live site.

There is no facility for showing differences in the tables, etc. Just a static dump which is updated and stored in CVS - which is useful for a fresh installation, but not for people upgrading from older copies of the code.

Generally my development looks like this:

  • Plan the development of a new feature.
  • Make the required database changes, using "alter table", et al.
  • Write a test case or two.
  • Write the code.
  • Commit the code + test case(s).
  • Test.
  • Update the live site from the CVS repository, manually running the upgrade SQL after running "cvs upgrade", flushing the memcache, and reloading Apache if there have been any new rewrite rules installed.

This works for me because I control both the test environment, the CVS repository, and the live site.

But for anybody else running the code they'll going to have to reverse engineer any upgrade statements from the static table dumps. Which may be non-trivial.

For example the upgrade I'm going to run tomorrow will look like this:

# give each user an ID.

# Add the key on username, just because it is still heavily used.
ALTER TABLE users ADD KEY (`username`);

# create a generic preferences table.  Inefficient, but extensible.
CREATE TABLE `preferences` 
`id`        int(11) NOT NULL auto_increment,
 user_id    int(11),
 pref_name  varchar( 25 )  default '',
 pref_value varchar( 125 ) default '',
 PRIMARY KEY  (`id`),
 KEY( `user_id` )  ,
 KEY( `pref_name` )  

# migrate users stylesheet preferences.
insert into preferences (user_id,pref_name,pref_value)
   (SELECT id,"stylesheet", stylesheet from users WHERE stylesheet != '');
alter table users drop stylesheet;

What does that do? It:

  • Adds a new "ID" column to the users table.
  • Creates a "preferences" table which will be keyed on the new user id column.
  • Inserts the current stylesheet setting for each user who is using a custom CSS file
  • Removes the "stylesheet" column from the users table - where it should never have lived.

Why do I need the preferences table? So that I can store the users chosen text formatting option. Something I've been working on for a day or two. It is also the best place to store peoples advertising preferences, display options, etc.

I could imagine storing a database schema-version number somewhere and a collection of upgrade scripts - like Ruby on Rails does, but I wonder if there are other approaches I should investigate?

Any tips or pointers welcome.


Comments on this Entry

Re: Managing database updates
Posted by dkg (216.254.xx.xx) on Fri 2 Feb 2007 at 05:28
[ View Weblogs ]
For one project i've worked on, we decided that the file containing the database schema could never be edited, just appended to.

this makes the schema difficult to read for folks who expect the current schema to be visible, but it actually makes the upgrades really easy: just feed the diff between revisions (without the diff markup) directly into the database engine on the live site.

If folks want to see a clean view of the current schema, that's something you can generate: just create an empty database, feed it the schema file, dump the schema, and drop the database. This strategy treats the schema history like any other source file under revision control: if you want a processed/compiled version you need to apply some post-processing to get it. You could even include a makefile to auto-generate it (assuming you have database creation privileges) before packing the code for publication.

[ Parent ]

Re: Managing database updates
Posted by Steve (80.68.xx.xx) on Fri 2 Feb 2007 at 10:43
[ View Weblogs ]

That is a pretty neat solution, which I guess does the job!

That would suggest I have two files, the "initial" file which creates the tables, and then the "upgrade" file showing the changes which had been applied since that starting point.

(The upgrades should be careful not to disturb any existing data).

The only downside I see to this approach is that you really have to be careful that you don't run the upgrade.sql script when it no longer applies - hence your suggestion to use the diff.


Initial State
CREATE TABLE `users` (
  `username` VARCHAR(25) NOT NULL DEFAULT  '',
Upgrade File
UPDATE USERS SET username = 'skx' + username WHERE id=1;

Running that more than once would be a bad thing.

I think having versioned schemas would solve that problem. Keep the pristine starting point, and then create a new upgrade file for each new revision - these could then be used to upgrade from X to schema X+N, and since it would be scripted it would be possible to ensure things weren't run more than once.


[ Parent ]

Re: Managing database updates
Posted by dkg (216.254.xx.xx) on Fri 2 Feb 2007 at 15:03
[ View Weblogs ]
That would suggest I have two files,

As long as you only apply diffs, you can treat the whole thing as a single file, i think. The updates just append to the existing original schema.

Seen another way, the "original schema" is just an "update" to a blank database.

[ Parent ]

Re: Managing database updates
Posted by Steve (80.68.xx.xx) on Fri 2 Feb 2007 at 15:22
[ View Weblogs ]

True .. I just dislike the idea of having to use diff - especially when considering upgrading multiple times.


[ Parent ]

Re: Managing database updates
Posted by redbeard (216.49.xx.xx) on Thu 19 Apr 2007 at 14:56
[ View Weblogs ]

Sorry I didn't check this out sooner... I'm way behind on my feeds :)

I don't now what your backend is written, but there is a PEAR package for PHP called MDB2_Schema that is designed to handle just this. I haven't used it in production yet (no time... we're doing something similar to what you're currently doing at the moment). I have played with it some, though.

Basically, you do the following.

  1. Take snapshots of the schema for every release
  2. Use MDB2_Schema to apply changes between releases

MDB2_Schema will actually grab the current schema from the database to update and figure out what needs to be done to bring it to the new schema.

Again, I don't know how well it works in practice, but the theory looks good.


[ Parent ]

Re: Managing database updates
Posted by Steve (80.68.xx.xx) on Thu 19 Apr 2007 at 14:59
[ View Weblogs ]

Thanks for the comment!

The Yawns backend is all perl-based, and I've been very happy with the way that the incremental updates have worked out.


[ Parent ]