Broken Migrations and Missing schema.rb

ActiveRecord Migrations are slick.  They allow easy access to a powerful database with minimal knowledge of SQL, and they promise an easy way to reconstruct a DB schema.  Unfortunately, they’re fragile in the face of large teams and code that has changed significantly.

Migrations and schema.rb

There are two ways migrations can reconstruct a schema into a fresh DB:

  1. Restoring from db/schema.rb (or db/development_structure.sql)
  2. Replaying the full migration chain against a properly prepared empty template database

In small projects, or projects without many concurrent branches of development, schema.rb is authoritative. Branching, however, leads to guaranteed conflicts in schema.rb (on the schema_version). Therefore, schema.rb is often omitted from source-control.

In the absence of schema.rb, the complete migration chain is the only authoritative representation of the schema.

Breaking the Chain

However, the complete chain can be broken under a number of circumstances:

  • Concurrent development: Developers in different branches can check in migrations that semantically conflict, even though they don’t textually conflict (e.g., both rename column A, one to B and the other to C).
  • Stale Migrations: Migrations from the past often reference old models or other code that’s been deleted from the application, especially if they migrate data.  The “right way” is to make migrations self-contained, but it’s not obvious that you need to do this until a non-self-contained migration has already been applied.  Regardless, it’s a little tricky to test data migrations, so failures go unnoticed. See my earlier post about data migrations for an example.   The work involved in fixing old migrations, while noble, is hard to justify in the face of crunched schedules — it is by definition hard to relate to business value.

Large, long-lived projects inevitably encounter one or both of these situations. So, they fall back to using a live database instance as an authoritative schema — hopefully a protected DB whose only purpose is to remember the schema, but often this ends up being a dev, production or staging server.

Solution:  Schema Capture

We’ve developed an approach that lets you keep an authoritative schema in source-control, ignore broken old migrations that aren’t self-contained, and avoid checking in a schema file as a rollup migration:

  1. Capture a schema file from your development DB
  2. Check it in, named for the branch (and a timestamp if you anticipate conflicts, or use an auto-merge tool).  For example:  db/schema/master-201107160123.rb 

You can do this capture once, and automate the load with some rake tasks — thereby isolating your developers from broken migrations without adding a rollup migration that you need to QA against your production DB.  Or you can install it as a git pre-commit hook to run it on every checkin.

We’ve implemented a “smart” schema capture tool as set of rake tasks and published them in this gist.  We’ll soon be rolling them into the tddium command.

The tddium:db:capture task can be run standalone, or from a pre-commit hook.  It accounts for ActiveRecord::Base.schema_format, and it only adds a schema file if the database has actually changed.  The gist also contains an example of how to integrate this schema capture into Tddium using tddium:db_hook. To start using it now with Tddium:

  1. install tddium.rake into lib/tasks/
  2. Run rake tddium:db:capture
  3. Commit the file created in db/schema/
  4. Start tests in Tddium using tddium spec

Post a Comment