security hardening with Fence Virsh, Libvirt, SELinux, and Polkit

2020-09-07 · computing

If you’re serious about high-availability infrastructure, you’ve probably at least heard of Pacemaker and Corosync. Although the learning curve for these can be steep, once you have a good understanding of them, they are powerful tools on which you can build a lot of high-availability layers. For mission-critical clusters, proper installation and testing of fencing agents for STONITH is crucial. There are many fencing agents available, but if you’re running your own hypervisors providing virtualisation via KVM and QEMU through the Libvirt API, you might want to take a look at Fence Virsh, which fences via SSH. However, as noted in its documentation, by default Virsh needs a root account in order to work properly. For higher-security infrastructures, this is concerning! But in fact, a root account isn’t actually necessary—even with SELinux and Polkit enabled—and this post provides an overview of security hardening this setup. The example systems are CentOS 8.2.2004 via Ansible 2.9.11.

hypervisor fencing user

First, we create a dedicated user for connecting to the hypervisor from cluster virtual machines. Fence Virsh fences via SSH, meaning we don’t need to concern ourselves with anything at the networking level. This theoretically also opens up the possibility of cross-datacentre fencing without a shared private network, but this is a more complex approach. Note that, in contrast to the defaults, this user is not root, and also is not a sudoer or member of the wheel group. Let’s call the user hafence.

-
  name: USER hafence create
  user:
    name: hafence

Next, we use Ansible to reflect on the created user metadata via /etc/passwd, allowing us to fetch defaulted settings such as the home directory. If you prefer, you could hard-code these values instead.

-
  name: GETENT passwd
  getent:
    database: passwd

Finally, we export a shell variable to point Virsh via the Libvirt API to the QEMU system. Since this takes place on the hypervisor after SSH login, this points to the local hypervisor system.

-
  name: USER hafence bashrc
  blockinfile:
    path: "{{ getent_passwd.hafence.4 }}/.bashrc"
    marker: "# <!-- {mark} VIRT BASHRC -->"
    block: |
      export LIBVIRT_DEFAULT_URI=qemu:///system

Whilst this creates a user, the user doesn’t have yet have permission to actually access or control the virtual machines.

SELinux

Setting up SELinux properly could easily grow into another discussion; it’s a brilliant system, but a complex one. However, here we recap the minimum probably necessary in order to have everything working, using extra Ansible modules to be able to control some SELinux directives directly. If you want to use this functionality, you’ll need to install some extra packages.

First, we install a variety of packages which will allow Ansible to configure the SELinux system. These allow use of not only selinux module, but also seboolean, and selinux_permissive modules.

-
  name: PACKAGE install
  package:
    name:
      - policycoreutils-python-utils
      - python3-libselinux
      - python3-libsemanage

Next, we set the policy and state. If you’re only just activating SELinux, don’t forget to reboot in order to complete the setup, and also note you might have to do a filesystem relabel.

-
  name: SELINUX config
  selinux:
    policy: targeted
    state: enforcing

Polkit

We now extend Polkit with a custom rule for Libvirt for our created user. We loop over all templates, allowing us flexibility to install more as required.

-
  name: POLKIT rules
  template:
    src: polkit/{{ item }}.rules
    dest: /etc/polkit-1/rules.d/{{ item }}.rules
    backup: true
  loop: "{{ security.polkit.rules.keys() | list }}"

Within that directory, we create the custom rule, allowing the hafence user to manage Libvirt.

polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.unix.manage" && subject.isInGroup("hafence")) {
        return polkit.Result.YES;
    }
});

Finally, we activate this Polkit rule for a specific server or group by setting an Ansible variable:

security:
  polkit:
    rules:
      80-libvirt: {}

Fence Virsh

On each virtual machine within the cluster, we install Fence Virsh. I’m presuming that you already have Pacemaker and Corosync installed and configured correctly, as this is another topic which can expand rapidly in complexity. You might need to enable the High Availability CentOS 8 repository.

-
  name: PACKAGE install
  package:
    name:
      - fence-agents-virsh

SSH

Generate an SSH key for each server in the cluster, and ensure that it’s authorised for login to the hypervisor hafence user. I’m not including specific instructions on how to generate and authorise the keys here, since this will vary depending on how you’re administering SSH keys within your infrastructure.

We set the location of the SSH keys using an Ansible variable.

pacemaker:
  ssh: ~/.ssh.ops-infra.hafence

Before continuing, ensure that the virtual machine can connect to the hafence user on the hypervisor without prompting for a password.

stoniths

All that remains is to create the STONITH resources with the correct syntax. For this, we template a script via Ansible, which will loop through the hypervisors defined in some group and create the command for each.

We start the script by substituting the ~ symbol for $HOME, since this isn’t expanded within the STONITH resource.

#!/bin/bash -u

ssh="{{ pacemaker.ssh | regex_replace('~', '$HOME') }}"
user=hafence

Next, we define a simple function to cleanup on exit.

function clean() {
    rm -rf "$tmp"
}

trap clean EXIT

Now, we create a temporary directory, and dump the CIB (Cluster Information Base). This method of configuring the cluster is more efficient, since we can diff to see which changes need to be applied, and batch multiple changes where required. Otherwise, we would need to wait for acknowledgement each time, which is significantly slower, and also execute unnecessary commands, which risks contention.

tmp=$(mktemp -d)
cib="$tmp/cib"

pcs cluster cib > "$cib.old"
cp "$cib.old" "$cib"

Next, we write out the commands to create each STONITH resource. We don’t load these live, but rather, write them into CIB dump ready for pushing. We point the resource to the SSH private key, and use DNS for the hypervisor name. If for some reason you can’t rely on or would prefer not to use DNS, this can be an actual IP instead.

{% for s in groups.my_little_hypervisors | sort %}
pcs -f "$cib" stonith create fence_{{ hostvars[s].inventory_hostname_short }} \
    fence_virsh \
    identity_file="$ssh/id_rsa" \
    ip="{{ s }}" \
    username="$user"
{% endfor %}

Finally, we push any changes into the live CIB on the cluster.

pcs cluster cib-push "$cib" diff-against="$cib.old"

The script won’t get executed automatically, so once you’ve deployed it via Ansible and have checked it looks sane, execute it on one of the cluster nodes.

If all has worked as it ought, pcs stonith status should show something like the following. If it doesn’t, or the resources exit with a failure, you’ll need to debug.

[root@vm-0 ~]# pcs stonith status
  * fence_hyper-0     (stonith:fence_virsh):  Started vm-0