lightweight database migrations using PostgreSQL - 2

2016-01-26 · Computing

In lightweight database migrations using PostgreSQL I outlined some ideas about running database migrations in a simple manner, with minimal dependencies, using the database itself in combination with common language-agnostic utilities. In this post, I detail an update to the ideas previously outlined, fixing various issues and streamlining the overall approach.

creating migrations

stage 0: pre

For the stage previously named initialisation (0_init.sql), I create a stage called pre (0_pre.sql). Although almost all of the code in this file remains the same, this is not simply a rename, but rather a slight alteration in how I’m thinking about the code which runs before and after the migrations. Previously, I considered this stage as just a setup phase, but now I consider this more as one end of a wrapper which surrounds the migrations-proper. However, the only code change is to move the transaction beginning and reporting of the current migration version into this file.

-- database/0_pre.sql

CREATE TABLE IF NOT EXISTS _meta (
    key     VARCHAR     PRIMARY KEY,
    value   VARCHAR
);

INSERT INTO _meta (key, value) VALUES ('version', 0);


CREATE FUNCTION pg_temp.version() RETURNS integer AS $$
    SELECT value::integer FROM _meta WHERE key = 'version';
$$ LANGUAGE sql;

CREATE FUNCTION pg_temp.version(integer) RETURNS boolean AS $$
    UPDATE _meta SET value = $1 WHERE key = 'version';
    SELECT TRUE;
$$ LANGUAGE sql;

CREATE FUNCTION pg_temp.migrate(integer) RETURNS boolean AS $$
    SELECT CASE WHEN pg_temp.version() = $1 - 1 THEN NULL ELSE FALSE END;
$$ LANGUAGE sql;


BEGIN;

SELECT pg_temp.version() "version_f";

stage 1: migrations

For the migrations-proper stage, I previously defined a unique, temporary function, leaving the execution until the last stage. The idea was that the function would only be called if needed, and in the correct order. But this failed in some scenarios, where defining the function itself depended on something which didn’t exist in the database, causing the function definition to be invalid, thus breaking the migration chain. In such a case, it did not matter about the later stage, as the overall process had already failed.

Instead, I change the skeleton of each migration to conditionally execute the migration within that file itself, requiring a strict order of cross-file concatenation (see running migrations). This also eliminates the chain of SELECT columns which was used to guarantee execution order and report on whether the migration was run. Instead, everything related to that migration is contained in the single migration file, making it easier to reason about, and reducing edits when creating a new migration using an existing one as a template (a common workflow pattern for me, I find, regardless of migration tooling or language). I also rename to use the same temporary function each time, which is created and dropped within the same file. Again, this reduces differences between multiple migration files.

I also trivially rename the migration file prefix to be shorter. If desired, a longer migration comment can be supplied as the column name, as before. Note that although this approach still requires the migration number to be specified in multiple places (because of not using variables, for simplicity), at least now those multiple places are in the same file (indeed, in the same group of lines).

-- database/1_m_1.sql

CREATE FUNCTION pg_temp.migration() RETURNS boolean AS $$


-- TODO: insert migration SQL


SELECT pg_temp.version(1);
$$ LANGUAGE sql;
SELECT COALESCE(pg_temp.migrate(1), pg_temp.migration()) "m_1";
DROP FUNCTION pg_temp.migration();

stage 2: post

For the stage previously named execution (2_exec.sql), I create a stage called post (2_post.sql). This is the other end of the wrapper which surrounds the migrations-proper. Rather than conditionally execute the migrations as before, this now simply commits the transaction and reports on the final migration version, as the migrations themselves have already been run. As before, I ensure that migrations end-to-end are wrapped within an explicit transaction, to facilitate rollback in the event of an error. As noted in the original post, with PostgreSQL, even DDL statements will get rolled back if the transaction fails, making the use of a transaction attractive.

-- database/2_post.sql

SELECT pg_temp.version() "version_t";

COMMIT;

running migrations

As noted above (see stage 1: migrations), simply using cat does not provide the ordering guarantees required, despite this split into the three separate stages. Instead, I change to perform a natural sort of the files using ls -v. Using this method, it shouldn’t be necessary to anticipate the total number of migrations in advance and zero-pad (e.g. 001 rather than 1), although that could be done if desired.

# database.sql.sh

#!/bin/sh

cd database/ && cat $(ls -v)

Code related to this post is available from my coding notebook.