In this guide, I show how to deploy Clojure applications using bakesale. I wrote bakesale because I wanted to return to the main ingredients of deployment, doing only so much as is finger-saving, and leaving the rest to the user to sort. I find this gives a lot of flexibility, whilst not cluttering my deployment config with lots of custom settings. bakesale is written as a collection of simple Shell scripts, and is pretty transparent. It’s built around the premise that deployment is much like baking a cake for a bakesale. Although I usually use it to deploy Ruby applications, deploying Clojure applications is just as straightforward. For this guide, I deploy MTRX9 (GitHub; Twitter), a simple websockets matrix monitoring tool which uses the notions of streams, chars, and time.
The server, running
Ubuntu 12.04 LTS, has a sudoer user called
mtrx9, with a corresponding home directory. RubyGems is installed as a package, because I’ll use Foreman to write upstart scripts on the server. This isn’t necessary, of course, but it’s pretty convenient. Note that this server isn’t running Ruby applications, so I’m not using RVM or another version manager; instead, I just let the package dependencies get satisfied. I have Java OpenJDK installed, again as a package.
I manage these prerequisites using Puppet (a masterless setup, also deployed using bakesale); however, this is about deploying Clojure, so I’ll assume you’ve set up the server in whichever way makes you happy. To manually install the necessary packages:
apt-get install rubygems openjdk-7-jdk
bakesale just needs to be somewhere on your local system so that you can source it in scripts. I assume a local user called
mlnw, with a corresponding home directory at
/Users/mlnw/. I also assume your code repository is at
/Users/mlnw/mtrx9/. Clone bakesale somewhere:
git clone git://github.com/tiredpixel/bakesale.git /Users/mlnw/bakesale
That’s all that’s needed to use bakesale; by default, the master branch will be used, which is (hopefully) stable.
define settings and services
bakesale discourages having the deployment config in the code repository itself (let’s stop embedding config in repositories). Create a directory to hold your deployment config:
mkdir -p /Users/mlnw/Deployments/mtrx9/
Create a file containing environment variables for the application; this will be exported to upstart using Foreman. MTRX9 only uses environment variables settings, but if you have other settings, you can write these in a similar manner. Copy the example and edit as appropriate:
cp /Users/mlnw/mtrx9/.env.example /Users/mlnw/Deployments/mtrx9/mtrx9.env
Procfile defines services for the application. We will compile into a standalone uberjar, so write a custom
# /Users/mlnw/Deployments/mtrx9/mtrx9.Procfile web: java -jar target/*-*-standalone.jar $PORT
write deployment recipe
The deployment recipe defines the details of the deployment. However, it’s just a Shell script, so you can put whatever custom logic in there you need. If you find yourself using the same small fragment of code a lot across scripts, then it might be a good candidate for a new bakesale ‘stage’. It would be excellent if you would be so kind as to fork bakesale and submit a pull request with your improvement. Include bakesale at the top of your script:
Then, just write a Shell script to deploy the application, using the bakesale ‘stage’ helpers. Without further ado, here is the complete script for deploying MTRX9. It clones the MTRX9 repository (fresh each time, by design), writes the settings and services configs, compiles the application into an uberjar, rsyncs everything to the server, uses Foreman to export it to upstart (Foreman gets automatically installed, if it’s not already, as part of that command), and restarts the exported services. This only takes a few lines:
# /Users/mlnw/Deployments/mtrx9/mtrx9.sh #!/bin/bash source /Users/mlnw/bakesale/bakesale.sh firstname.lastname@example.org sshs="($ssh1)" # = Bake bakesale bake git git://github.com/tiredpixel/mtrx9.git bakesale bake copy /Users/mlnw/Deployments/mtrx9/mtrx9.env .env bakesale bake copy /Users/mlnw/Deployments/mtrx9/mtrx9.Procfile Procfile bash -c "cd '$bakesale_cakebox/'; lein uberjar" # or any other custom step # = Carry bakesale carry rsync_ssh "$sshs" mtrx9/ # = Wave bakesale wave ssh_foreman "$sshs" "export --root mtrx9/ --app mtrx9 --user mtrx9 --concurrency web=1 upstart /etc/init; restart mtrx9"
bakesale supports multi-server deployments; just pass an array of locations to the SSH commands.
The deployment config is just a script, so to deploy, just execute it:
At some point, you’ll be asked for the sudo password; this is necessary so Foreman can export it to upstart.
bakesale isn’t particularly fast at deployment; there are a few reasons for this. One is that a fresh clone of the code repository is made each time; this is to provide absolute assurance that nothing’s been included by accident. Another reason for slowness is that the ‘cakebox’ is rsynced up to the server, rather than letting the server perform the clone; this is so the server doesn’t need to have Git installed, and doesn’t require the server to have access to the code repository. bakesale is just as opinionated as any other deployer; it doesn’t claim to be the best solution, only a solution built on a certain set of ideas (such as not having to pull hair out about server access to repositories, and forwarded SSH settings). It’s still a young project, so if you’d like to help improve it, please do get in contact—or just fork it and see.
I’m not the first person to say it, and I certainly won’t be the last. And yes, I’m sometimes guilty of it myself. I write a program—probably not going to be open-sourced, for internal use only, and not a library. So, I embed a port number (MySQL really isn’t going to be anywhere other than port 3306, right?). That file gets committed, and other developers on my team know what the connections and passwords are. That’s fine, right? Okay, as many of us know, it’s sort of ‘bad’. But why? Here are some reasons.
embedding config can make code harder to read
Whether the code is familiar or not, mixing core logic with specific implementation can make the code much harder to read. For example: Is this number something specific which much be kept the same for the program to work, or is it something which depends on an external factor? Where can the port for this service be changed—oh, it’s somewhere buried within this file. By ‘config’, I don’t just mean passwords and ports; extracting other settings (e.g. polling intervals) can make the important parts of the code more obvious. At the very least, extracting such settings to a separate file within the repository is a step in the right direction. But I think it’s better not to commit those files at all.
embedding config makes it hard to deploy multiple installations
Perhaps I’ve written some internal program, but then find this could also be used by another team. If I’ve embedded config, maintaining separate installations is troublesome. Cloning the repository pulls down all sorts of settings which are not relevant, so I have to fork to change or remove irrelevant config (assuming, of course, the credentials can be shared across teams; if not, then embedded config prevents the repository from being shared at all). With the config extracted and not committed, however, deploying code is just as easy as for the existing installation. It’s not the configuration which is of interest; it’s the core logic itself. Configuration of deployers has a similar argument; if you’re writing Ruby, for instance, perhaps you want to use Capistrano or Mina, and use a
config/deploy.rb. That’s well and good—but don’t commit it! If you do, you’ve immediately made it far harder for anyone else to use the repository. Both these deployers support using arbitrary locations for deployment configuration, anyway; there really is no reason to store such within the repository.
development is just another installation
When developing, I like to run as close to production as possible; instead of seeding, I prefer to use dumps of live databases. This gives me greater assurance that the code I’m writing is going to work on another machine. But what about optimisations like caching? True, you don’t usually want that enabled in development. Less obvious is that you might not want it in production, either. Thanks to popular frameworks like Ruby on Rails, we’ve got into this habit of thinking about development, production, and maybe testing environments. But what if I want two productions—one tracking a develop branch, and another tracking a master branch for something customer-facing (maybe set up for automatic deployment on push)? I probably want caching on the customer-facing version, but I might not on the internal develop branch. But setting whether or not to apply caching really has nothing to do with the core logic itself; you might even like to enable caching to test something in development, without enabling the rest of the features usually used in production. I’ve seen much code checking
RAILS_ENV—but such embedded decisions don’t scale well to multiple environments, and also make it hard to use logical names such as develop, user-test, live-beta, live. Having separate configs makes this straightforward.
but storing config in a repository is convenient
So long as you’ve got tight control over who can read and write to that repository, there’s no problem with storing config in a repository. But make it a different repository—one that you know has nothing to do with the code, and doesn’t pollute it with assumptions about config or deployers. Maybe you want to have a config repository per code repository. Maybe you want to keep all credentials for your programs in a shared repository, similar to tracking Puppet manifests. Both approaches work; the key is to keep core logic separate from specific implementation. That way, you get all the benefits, as well as the possibility of quickly changing the deployment method across multiple repositories and installations.
Currently, I tend to favour using environment variables where possible (such as can be easily read by Foreman), and JSON config files where not. I provide example config files (which I do commit), describe in the documentation which files need to be copied and modified, and ignore the target locations. Thus, setting up a new program is quick, and I can see all settings at a glance. When coding, I think: Could this repository be made public without jeopardising a live installation (business considerations ignored)? If not, then that’s probably a hint that config could be extracted. I keep the deployment configurations entirely separate to the code repositories; I disfavour the
How about you—what’s your opinion about embedding config in repositories?