log shipping using Rsync over SSH

2015-01-18 · Computing

If you’re looking to ship logs from multiple servers and perform all sorts of magic, you might like to consider Logstash or Fluentd. But what if you want something really lightweight, or simply don’t have the time to configure a full-blown solution? Rsync over SSH can provide a simple alternative, provided that all you really want to do is have logs accessible from a central place and are a happy Grep user. You’ve probably got the packages installed already, too.

Aim: One or more servers shipping logs to a central log master, using Rsync over SSH. System: Ubuntu Server 14.04 LTS, managed using Puppet.

set up log satellites

Set up a new SSH key for the log satellites, which will be connecting to the log master. This outgoing user could be root, which has full read-access. If root concerns you and you choose another user, be sure it is a member of appropriate groups for read-access to /var/log/.

Distribute the SSH key, including the private component (which is why it must be a new key and not used for anything else), to the log satellites (e.g. /root/.ssh/id_rsa). This will be used to connect to the log master. Ensure that the log master is marked as a known host to prevent issues with the first connection; one way to do this is to use ssh-keyscan:

$logger = 'logger.example.com'

exec { "/usr/bin/ssh-keyscan ${logger} >> /root/.ssh/known_hosts":
  unless => "/bin/grep ${logger} /root/.ssh/known_hosts",
}

set up log master

Open Port 22 from the log satellites to the log master in your firewall. Create a new user (e.g. logmaster) on the log master for receiving the logs, authorising the public component of the SSH key you created, and choosing somewhere to store the shipped logs (e.g. /srv/logs). You’ll probably want to include this in your backups.

user { 'logmaster':
  password   => 'PASSWORD_HASH',
  managehome => true,
  shell      => '/bin/bash',
}->

ssh_authorized_key { 'logmaster/ops-root@example.com':
  type    => 'ssh-rsa',
  key     => 'PUBLIC_KEY',
  user    => 'logmaster',
  options => [
    'command="rsync --server -ltrze.iLs . /srv/logs/${SSH_CLIENT%% *}"',
    'no-agent-forwarding',
    'no-port-forwarding',
    'no-x11-forwarding',
  ],
}->

file { '/srv/logs':
  ensure  => directory,
  mode    => '0730',
  owner   => 'root',
  group   => 'logmaster',
}

The ssh_authorized_key options are to restrict the logmaster user to running only the command specified, with received logs stored in a subdirectory named after the log satellite IP address. This is harder than hostname for the log satellite to switch, although you might like to give thought to what you’d like to do if a log satellite reuses a previous IP address. (An easy alternative, although slower and more space-intensive, would be to copy /var/log to somewhere including the hostname before shipping, or somehow pass through the hostname. I’d love to hear if you’ve got a nice adjustment for this without relaxing the SSH command strictness.)

Here, the permissions on /srv/logs are to allow logmaster write-access without list-access on the top-level, restricting visibility from within the logmaster user.

schedule log shipping

All that remains is to schedule the logs to be shipped. The actual shipping should usually be pretty quick, perhaps only a second or two, so Cron is good for this. You’ll probably want to test the command manually, too. (If you have problems and need to debug, considering temporarily relaxing the logmaster user SSH command to allow you to test the connection more easily.)

cron { 'rsync_var_log':
  command => '/usr/bin/rsync -rltz /var/log/ logmaster@logger.example.com:',
  user    => 'root',
  hour    => '*',
  minute  => '*/5',
}

Here, no -delete is used, to work nicely with log rotation; if log satellites rotate log files to a timestamped filename and keep a maximum number, the log master should build up an archive without further ado. If you choose to alter the Rsync options, you’ll need to ensure that the logmaster user SSH command is changed to match; you can see which command is needed by connecting manually using the -e ‘ssh -v’ option and observing the remote Rsync command. No destination directory is needed, as this gets ignored anyway by the restrictions on the log master.