Rsync is a program that can be used to easily do automated, readily-available backups. My experience is that it is secure, especially when used in conjunction with appropriate firewall (iptables) rules.
rsync needs to be installed both on the machines that will be doing the backups as well as those that will be backed up. On a Debian system the command "apt-get install rsync" will take care of the installation.
We use two different machines to backup all of the machines in our network. The only requirements for these machines is that they have a large harddrive reserved for the backups. In our case, we use 120 GB IDE drives that cost under $350 each. Ideally, the computers used to do the backups should be in different rooms and even different buildings so that at least one backup will be available even in the case of a fire or other catastrophic event.
Our setup is as follows: One of the machines does a daily backup of everything we want backed up. Currently this includes student home directories, our Beowulf cluster home directories, individual faculty home directories, and the /etc directory on our server. However, it keeps two copies of the backups. One is updated every odd day of the month, the other is updated every even day. In this way, users always have two recent backups they can access. The other machine does the same thing, but once a week instead of every day. This way we also have at least one backup from approximately one week in the past. The amount that needs to be backed up is on the order of 30 GB, but the backups only take a few minutes as rsync is very smart in that it only downloads changes to files.
The backups are of a live file system, so it is possible that some backed up files could be corrupted. This is not a big concern, particularly since the backups are done at 3 AM. However, for files like those used by a mysql database, the mysql server could be shut down temporarily while the backup is taking place. This would entail a simple modification to the scripts we use. Since rsync simply copies files to another live file system, they are easy to retrieve. Our users have access to the backup using an nfs mount or ssh, so the system administrator does not even need to be contacted to do the most commonly needed restore - an accidentally deleted file.
Contents:Inetd configuration: While it is possible to run rsync as a daemon that starts up at boot, in most cases it makes more sense to have the rsync daemon started automatically as needed using inetd (or a similar service). On a Debian system all one needs to do is make sure the following line appears in the file /etc/inetd.conf.
rsync stream tcp nowait root /usr/bin/rsync rsyncd --daemon
After adding this line to the file, either give the command
'killall -1 inetd' or reboot the machine to have it take effect.
Sample /etc/rsyncd.conf: This configuration file must exist on each machine that needs to be backed up. My example contains one configuration called "rsync", but it could be followed by other configurations, each beginning with the name of the configuration in brackets. This example script says that the machines mach1 and mach2 have permission to do backups, that they will back up the entire /home directory, and that the root password they will use to connect is in the file /etc/.rs_sec.
[rsync]
path = /home
use chroot = no
max connections = 4
auth users = root
hosts allow = mach1.uchicago.edu mach2.uchicago.edu
secrets file = /etc/.rs_sec
uid = root
gid = root
Sample /etc/.rs_sec: This file, whose actual name is specified in the /etc/rsyncd.conf configuration file, must be present on the hosts that need to be backed up. It contains the user/password combination(s) that the backup machines can use to connect to the host using rsync. In our case just one line is needed that contains the password for the root user.
root:rootpassword
Sample /etc/.rs_pass: This file is not strictly necessary but makes the rsync connections more secure. The actual filename is arbitrary and is specified in the rsync command that actually does the backup. It needs to be present on the machines doing the backups, and it simply contains the rsync password that will be used to connect to the machine that needs to be backed up. It should be the same password that appears in the .rs_sec file on the client above.
rootpassword
Sample iptables configuration: The tcp port 873 must be open to the backup machines on the machines that need to be backed up. The following bash script generates the appropriate iptables rules. The ip addresses given in "rsynchosts" should be the addresses of the hosts allowed in the /etc/rsyncd.conf file.
# "rsynchosts" is a space-separated list of the ip addresses of
# the machines that will be doing the backups.
rsynchosts="192.168.1.1 192.168.1.2";
for rsynchost in $rsynchosts; do
iptables -A INPUT -j ACCEPT -p tcp -s $rsynchost --dport 873
done
iptables -A INPUT -j DROP -p tcp --dport 873
Sample crontab entry: Cron is used for regularly scheduled automated tasks. Here it will be used to tell the backup machines when and how to do the backups. Most systems have a confusing variety of methods for specifying cron jobs, but it is usually possible to create cron jobs using the file /etc/crontab. The example below consists of two lines for /etc/crontab that works on Debian systems (which have a user field that many other cron daemons do not have). The first line runs the script /etc/rsync.daily with the argument "1" as root every odd-numbered day at 3:00 AM in the morning. The second line does the same thing, but on even-numbered days and with the argument "2".
0 3 1-31/2 * * root /etc/rsync.daily 1
0 3 2-31/2 * * root /etc/rsync.daily 2
Sample /etc/rsync.daily script used in above crontab entry: The command 'chmod 700 /etc/rsync.daily' will give this file the appropriate permissions it needs. In this example, mach1 is the machine doing the backups, and it is backing up mach1, mach2, and mach3.
#!/bin/bash
# The argument this script is called with, either 1 or 2
ext=$1
# The full paths of the programs used in this script
rsync=/usr/bin/rsync
mount=/bin/mount
umount=/bin/umount
# Good rsync options for backups.
rsync_opts="-av --delete --delete-excluded"
# The name of the file containing the rsync connection password
password="--password-file=/etc/.rs_pass"
# A list of files and directories that do not need to be backed up
exclude_list="noback/ core .kde/ .gnome/ .netscape/cache/ Cookies/ backup/"
excludes=""
for exclude in $exclude_list; do
excludes="$excludes --exclude=$exclude"
done
# Backup /home on mach1 to /backup/mach1_1/home or
# /backup/mach1_2/home depending on the argument the script
# was called with. Dump any output and error messages to
# /etc/backup/mach1_home_1 or /etc/backup/mach1_home_2
$rsync $rsync_opts $excludes /home /backup/mach1_${ext}/ > \
/etc/backup/mach1_home_${ext} 2>&1
# Backup /profiles on mach1
$rsync $rsync_opts $excludes /profiles /backup/mach1_${ext}/ > \
/etc/backup/mach1_profiles_${ext} 2>&1
# Backup /etc on mach1
$rsync $rsync_opts /etc /backup/mach1_${ext}/ > \
/etc/backup/mach1_etc_${ext} 2>&1
# Backup mach2 and mach3 according to the [rsync] sections
# of the rsyncd.conf files on the two machines. Use the
# password given in /etc/.rs_pass.
$rsync $rsync_opts $excludes $password mach2::rsync \
/backup/mach2_${ext}/home/ > /etc/backup/mach2_${ext} 2>&1
$rsync $rsync_opts $excludes $password mach3::rsync \
/backup/mach3_${ext}/home/ > /etc/backup/mach3_${ext} 2>&1