Building Haskell 8.8.3 programs using Dockerfiles

2020-07-15 · computing

I recently completed some major updates to my continuous integration pipelines, building Haskell 8.8.3 programs using Dockerfiles. Having worked with devops automation for a number of projects and clients over the years, I thought it might be interesting to share a little of my latest thinking, along with specific code examples.

Firstly, a little by way of explanation—or, perhaps, justification: In the Haskell world, there is no need to use Docker at all, since Stack exists, and this is likely more convenient for many programmers . I, however, have found myself switching between programming languages rather frequently, and I’ve come to appreciate some kind of standardisation within my tooling, both as a programmer and a devops engineer. Thus, I purposely try to avoid language-specific tooling unless there’s an overwhelming reason to do so (at least, above the package manager level), instead working with similar recipes cross-language. Secondly, it’s useful to note that although I mention Docker and Dockerfiles, much of this applies to similar technologies such as Podman and Buildah with OCI container images. I finally rid myself of the Docker daemon for deployed programs earlier this year (and finally achieved greater stability), and am gradually working on replacing it from my development tooling, too.

# FROMFREEZE docker.io/library/haskell:8.8.3
FROM docker.io/library/haskell@sha256:5df798b4864130e7608b515391168f54d62a858e905cee8362aad8ec2a53ba20

ARG USER=x
ARG HOME=/home/x

There is no FROMFREEZE instruction in Dockerfiles. I, however, strictly lock all dependencies, right from using cabal.config files generated by cabal freeze (or in Ruby, Gemfile.lock, or in Python, requirements.txt), through to bit-copying dependencies installed using Cabal through the CI pipeline, and through to using specific versions of base images. This, however, can easily turn into a security patching nightmare, so I have a script which pulls the latest base image version, and updates with the SHA-256 checksum. This I consider part of the code, and commit it to Git version control along with cabal.config. This prevents issues with cache busting within the CI or at some other layer.

In FROM, I use fully-qualified base images, not just because of now increasingly using Podman, but because of the large security risk otherwise present when using private registries. It’s worth pointing out here that I use the builder pattern rather than multi-stage builds; there are various reasons for this, but most hinge around the way I design CI pipelines to utilise these intermediary images for guaranteed repeatability, rather than as useless intermediary images used only to compile the resulting Haskell binaries. The ARG instructions are because where reasonable, I not only run with, but also compile with, dropped privileges. It appears that ARG instructions can’t be stacked.

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        daemontools \
        ghc-8.8.3-prof \
        happy \
        hlint \
        jq \
        libpcre3-dev && \
    rm -rf /var/lib/apt/lists/*

RUN useradd ${USER} -d ${HOME} && \
    mkdir -p ${HOME}/repo && \
    chown -R ${USER}:${USER} ${HOME}

Haskell 8.8.3 base images are built on Debian 10; this was updated from Debian 9 recently. Here we update the package cache, install some dependencies, and destroy the cache afterwards to minimise the size of the image layer. curl is used for healthchecks (since this example comes from a web app), happy and hlint are used for testing and linting using Happy, Hspec, and HLint, jq is used for debugging (since this is a JSON API), and ghc-8.8.3-prof and libpcre3-dev are installed as dependencies for GHC and program-specific dependencies. This is one thing that container images are very useful for; gone are the days of chasing a missing C library for some Cabal package (or Ruby gem). daemontools is used later within CI for dropping privileges during the build phase. The rest sets up a user and directories for holding a skeleton repository.

USER ${USER}

WORKDIR ${HOME}/repo

ENV PATH ${HOME}/.cabal/bin:$PATH

COPY --chown=x:x [ \
    "cabal.config", \
    "*.cabal", \
    "./"]

RUN cabal v1-update && \
    cabal v1-install -j --only-dependencies --enable-tests

Here, I switch USER away from root, and use WORKDIR to set the working directory to the empty directory created above. ENV is used to extend the PATH to include Cabal binaries (which is also where compiled programs will be placed). I use COPY to copy over the Cabal file defining the program, along with the cabal.config file freezing package dependencies to specific package versions. In recent Dockerfiles, and since I’m using Linux, it’s possible to use --chown within a COPY instruction; it used to be necessary to add a whole other layer to chown files after a COPY, since COPY always used to use root regardless of the USER context. Here, I chown to the unprivileged user I created above. Finally, I update the Cabal package cache, and install dependencies using -j to use as many CPUs as possible, and --only-dependencies to avoid installing the program itself. Constructing the container image in this way is critical to ensuring efficient use of cache layers; otherwise, I’d have to reinstall all the dependencies every time I recompiled the program (woefully time-consuming). I’m still using Cabal v1 syntax, because I haven’t yet updated.

# COPY --chown=x:x . .

For the past couple of years, at this stage I’ve copied the actual code into the image, meaning that it can be used as-is for development, and is available within the CI pipeline. I’ve just stopped doing this, however, in favour of using local mounts for development (using Docker Compose), and the Git repository itself within CI. This eliminates what has for me mostly become a pointless layer, since whilst developing, I always need to mount the latest code over the top, and within CI, the repository is always available anyway, locked to a specific commit. Thus, the container image becomes purely for satisfying dependencies and installing specific tooling needed for CI, whilst not containing any actual code, and I skip this section entirely.

ENV ADDRESS=0.0.0.0 \
    PORT=8000

CMD cabal v1-run eg-prog -- -b ${ADDRESS} -p ${PORT}

EXPOSE ${PORT}

HEALTHCHECK CMD curl -fs http://${ADDRESS}:${PORT} || false

With ENV, I set any specific environment variables required for the program; placing it this late in the Dockerfile ensures that the variables can be updated without needing to reinstall all the dependencies. CMD defines which command to run for the program; I use cabal run not cabal install, because this is a build image, used for development when executed, and only used to supply dependencies with a different command when used within CI. Finally, since it’s an API image, the port is exposed using EXPOSE (not strictly necessary, but neater), and HEALTHCHECK is used to define the healthcheck (I almost always define healthchecks for containers).

With that, I have a useful container image, containing everything needed to compile the program except for the code itself, with all dependencies installed, both those within Debian and those within Cabal, and have layers optimised to utilise the layer cache efficiently whilst making changes. This image is then able to be used locally for development, within the build stage in CI, within the test stage in CI, and also to provide tooling for the package stage in CI, whilst not being used for the final packaged artefact itself.