Disaster Recovery Using Bacula
When disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?
Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linuxand Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, a number of Bacula users have reported success using BartPE.
Here are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.
- If the building which houses your computers burns down or is otherwise destroyed, do you have off-site backup data?
- Disaster recovery is much easier if you have several machines. If you have a single machine, how will you handle unforeseen events if your only machine is down?
- Do you want to protect your whole system and use Bacula to recover everything? or do you want to try to restore your system from the original installation disks and apply any other updates and only restore user files?
- Create a rescue or CDROM for each of your Linux systems. Generally, they are offered by each distribution, and there are many good rescue disks on the Web (Knoppix, sysrescuecd, PLD Linux rescue CD, tomsrtbt, RIP …
- Create a bacula-hostname directory on each machine and save it somewhere – possibly on a USB key.
- Ensure that you always have a valid bootstrap file for your backup and that it is saved to an alternate machine. This will permit you to easily do a full restore of your system.
- If possible copy your catalog nightly to an alternate machine. If you have a valid bootstrap file, this is not necessary, but can be very useful if you do not want to reload everything. .
- Ensure that you always have a valid bootstrap file for your catalog backup that is saved to an alternate machine. This will permit you to restore your catalog more easily if needed.
- Test using the Rescue CDROM before you are forced to use it in an emergency situation.
- Make a copy of your Bacula .conf files, particularly your bacula-dir.conf, and your bacula-sd.conf files, because if your server goes down, these files will be needed to get it back up and running, and they can be difficult to rebuild from memory.
As an alternative to creating a Rescue CD, please see the section below entitled Bare Metal Recovery using a LiveCDLiveCD.
Bacula previously had a Rescue CD. Unfortunately, this CD did not work on every Linux Distro, and in addition, Linux is evolving with different boot methods, more and more complex hardware configurations (LVM, RAID, WiFi, USB, …). As a consequence, the Bacula Rescue CD as it was originally envisioned no longer exists.
However there are many other good rescue disks available. A so called “Bare Metal” recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases.
Bare Metal Recovery assumes that you have the following items for your system:
- A Rescue CDROM containing a copy of your OS.
- Perhaps a copy of your hard disk information, as well as a statically linked version of the Bacula File daemon.
- A full Bacula backup of your system possibly including Incremental or Differential backups since the last Full backup
- A second system running the Bacula Director, the Catalog, and the Storage daemon. (this is not an absolute requirement, but how to get around it is not yet documented here)
Now, let’s assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:
- A recent Bacula backup (Full plus Incrementals)
- A Rescue CDROM.
- Your Bacula Director, Catalog, and Storage daemon running on another machine on your local network.
This is a relatively simple case, and later in this chapter, as time permits, we will discuss how you might recover from a situation where the machine that crashes is your main Bacula server (i.e. has the Director, the Catalog, and the Storage daemon).
You will take the following steps to get your system back up and running:
- Boot with your Rescue CDROM.
- Start the Network (local network)
- Re-partition your hard disk(s) as it was before
- Re-format your partitions
- Restore the Bacula File daemon (static version)
- Perform a Bacula restore of all your files
- Re-install your boot loader
Now for the details …
Each rescue disk boots somewhat differently. Please see the instructions that go with your CDROM.
You can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up.
Put the static file daemon and its conf file in /tmp.
chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf
The above command starts the Bacula File daemon with the proper root disk location (i.e. /mnt/disk/tmp. If Bacula does not start, correct the problem and start it. You can check if it is running by entering:
You can kill Bacula by entering:
kill -TERM <pid>
where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command.
Now, you should be able to use another computer with Bacula installed to check the status by entering:
into the Console program, where xxxx is the name of the client you are restoring.
One common problem is that your bacula-dir.conf may contain machine addresses that are not properly resolved on the stripped down system to be restored because it is not running DNS. This is particularly true for the address in the Storage resource of the Director, which may be very well resolved on the Director’s machine, but not on the machine being restored and running the File daemon. In that case, be prepared to edit bacula-dir.conf to replace the name of the Storage daemon’s domain name with its IP address.
On the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:
then run the restore.
You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot.
/sbin/grub-install --root-directory=/mnt/disk /dev/hda
Note, in this case, you omit the chroot command, and you must replace /dev/hda with your boot device. If you don’t know what your boot device is, run the ./run_grub script once and it will tell you.
Finally, I’ve even run into a case where grub-install was unable to rewrite the boot block. In my case, it produced the following error message:
/dev/hdx does not have any corresponding BIOS drive.
The solution is to insure that all your disks are properly mounted on /mnt/disk, then do the following:
chroot /mnt/disk mount /dev/pts
Then edit the file /boot/grub/grub.conf and uncomment the line that reads:
So that it reads:
Note, the /dev/hda may be /dev/sda or possibly some other drive depending on your configuration, but in any case, it is the same as the one that you previously tried with grub-install.
Then, enter the following commands:
grub --batch --device-map=/boot/grub/device.map \ --config-file=/boot/grub/grub.conf --no-floppy root (hd0,0) setup (hd0) quit
If the grub call worked, you will get a prompt of grub before the root, setup, and quit commands, and after entering the setup command, it should indicate that it successfully wrote the MBR (master boot record).
First unmount all your hard disks, otherwise they will not be cleanly shutdown, then reboot your machine by entering exit until you get to the main prompt then enter Ctrl-d. Once back to the main CDROM prompt, you will need to turn the power off, then back on to your machine to get it to reboot.
If everything went well, you should now be back up and running. If not, re-insert the emergency boot CDROM, boot, and figure out what is wrong.
Above, we considered how to recover a client machine where a valid Bacula server was running on another machine. However, what happens if your server goes down and you no longer have a running Director, Catalog, or Storage daemon? There are several solutions:
- Bring up static versions of your Director, Catalog, and Storage daemon on the damaged machine.
- Move your server to another machine.
- Use a Hot Spare Server on another Machine.
The first option, is very difficult because it requires you to have created a static version of the Director and the Storage daemon as well as the Catalog. If the Catalog uses MySQL or PostgreSQL, this may or may not be possible. In addition, to loading all these programs on a bare system (quite possible), you will need to make sure you have a valid driver for your tape drive.
The second suggestion is probably a much simpler solution, and one I have done myself. To do so, you might want to consider the following steps:
- If you are using MySQL or PostgreSQL, configure, build and install it from source (or use rpms) on your new system.
- Load the Bacula source code onto your new system, configure, install it, and create the Bacula database.
- Ideally, you will have a copy of all the Bacula conf files that were being used on your server. If not, you will at a minimum need create a bacula-dir.conf that has the same Client resource that was used to backup your system.
- If you have a valid saved Bootstrap file as created for your damaged machine with WriteBootstrap, use it to restore the files to the damaged machine, where you have loaded a static Bacula File daemon using the Rescue disk). This is done by using the restore command and at the yes/mod/no prompt, selecting mod then specifying the path to the bootstrap file.
- If you have the Bootstrap file, you should now be back up and running, if you do not have a Bootstrap file, continue with the suggestions below.
- Using bscan scan the last set of backup tapes into your MySQL, PostgreSQL or SQLite database.
- Start Bacula, and using the Console restore command, restore the last valid copy of the Bacula database and the Bacula configuration files.
- Move the database to the correct location.
- Start the database, and restart Bacula. Then use the Console restore command, restore all the files on the damaged machine, where you have loaded a Bacula File daemon using the Rescue disk.
For additional details of restoring your database, please see the Restoring When Things Go Wrongdatabase_restore section of the Console Restore Command chapter of this manual.
Since every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.
Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake.
As an alternative to the old now defunct Bacula Rescue CDROM, you can use any system rescue or LiveCD to recover your system. The big problem with most rescue or LiveCDs is that they are not designed to capture the current state of your system, so when you boot them on a damaged system, you might be somewhat lost – e.g. how many of you remember your exact hard disk partitioning.
This lack can be easily corrected by running the part of the Bacula Rescue code that creates a directory containing a static-bacula-fd, a snapshot of your current system disk configuration, and scripts that help restoring it.
Before a disaster strikes:
- Run only the make bacula part of the Bacula Rescue procedure to create the static Bacula File daemon, and system disk snapshot.
- Save the directory generated (more details below) preferrably on a CDROM or alternatively to some other system.
- Possibly run make bacula every night as part of your backup process to ensure that you have a current snapshot of your system.
Then when disaster strikes, do the following:
- Boot with your system rescue disk or LiveCD (e.g. Knoppix).
- Start the Network (local network).
- Copy the Bacula recovery directory to the damaged system using ftp, scp, wget or if your boot disk permits it reading it directly from a CDROM.
- Continue as documented above.
- Re-partition your hard disk(s) as it was before, if necessary.
- Re-format your partitions, if necessary.
- Restore the Bacula File daemon (static version).
- Perform a Bacula restore of all your files.
- Re-install your boot loader.
In order to create the Bacula recovery directory, you need a copy of the Bacula Rescue code as described above, and you must first configure that directory.
Once the configuration is done, you can do the following to create the Bacula recovery directory:
cd <bacula-rescue-source>/linux/cdrom su (become root) make bacula
The directory you want to save will be created in the current directory with the name bacula. You need only save that directory either as a directory or possibly as a compressed tar file. If you run this procedure on multiple machines, you will probably want to rename this directory to something like bacula-hostname.
The same basic techniques described above also apply to FreeBSD. Although we don’t yet have a fully automated procedure, Alex Torres Molina has provided us with the following instructions with a few additions from Jesse Guardiani and Dan Langille:
- Boot with the FreeBSD installation disk
- Go to Custom, Partition and create your slices and go to Label and create the partitions that you want. Apply changes.
- Go to Fixit to start an emergency console.
- Create devs ad0 .. .. if they don’t exist under /mnt2/dev (in my situation) with MAKEDEV. The device or devices you create depend on what hard drives you have. ad0 is your first ATA drive. da0 would by your first SCSI drive. Under OS version 5 and greater, your device files are most likely automatically created for you.
- mkdir /mnt/disk this is the root of the new disk
- mount /mnt2/dev/ad0s1a /mnt/disk mount /mnt2/dev/ad0s1c /mnt/disk/var mount /mnt2/dev/ad0s1d /mnt/disk/usr ….. The same hard drive issues as above apply here too. Note, under OS version 5 or higher, your disk devices may be in /dev not /mnt2/dev.
- Network configuration (ifconfig xl0 ip/mask + route add default ip-gateway)
- mkdir /mnt/disk/tmp
- cd /mnt/disk/tmp
- Copy bacula-fd and bacula-fd.conf to this path
- If you need to, use sftp to copy files, after which you must do this: ln -s /mnt2/usr/bin /usr/bin
- chmod u+x bacula-fd
- Modify bacula-fd.conf to fit this machine
- Copy /bin/sh to /mnt/disk, necessary for chroot
- Don’t forget to put your bacula-dir’s IP address and domain name in /mnt/disk/etc/hosts if it’s not on a public net. Otherwise the FD on the machine you are restoring to won’t be able to contact the SD and DIR on the remote machine.
- mkdir -p /mnt/disk/var/db/bacula
- chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf to start bacula-fd
- Now you can go to bacula-dir and restore the job with the entire contents of the broken server.
- You must create /proc
The same basic techniques described above apply to Solaris:
- the same restrictions as those given for Linux apply
- you will need to create a Rescue disk
However, during the recovery phase, the boot and disk preparation procedures are different:
- there is no need to create an emergency boot disk since it is an integrated part of the Solaris boot.
- you must partition and format your hard disk by hand following manual procedures as described in W. Curtis Preston’s book “Unix Backup & Recovery”
Once the disk is partitioned, formatted and mounted, you can continue with bringing up the network and reloading Bacula.
As mentioned above, before a disaster strikes, you should prepare the information needed in the case of problems. To do so, in the rescue/solaris subdirectory enter:
su ./getdiskinfo ./make_rescue_disk
The getdiskinfo script will, as in the case of Linux described above, create a subdirectory diskinfo containing the output from several system utilities. In addition, it will contain the output from the SysAudit program as described in Curtis Preston’s book. This file diskinfo/sysaudit.bsi will contain the disk partitioning information that will allow you to manually follow the procedures in the “Unix Backup & Recovery” book to repartition and format your hard disk. In addition, thegetdiskinfo script will create a start_network script.
Once you have your disks repartitioned and formatted, do the following:
- Start Your Network with the start_network script
- Restore the Bacula File daemon as documented above
- Perform a Bacula restore of all your files using the same commands as described above for Linux
- Re-install your boot loader using the instructions outlined in the “Unix Backup & Recovery” book using installboot
When a pre-1.30 version of Bacula restores a directory, it first must create the directory, then it populates the directory with its files and subdirectories. The act of creating the files and subdirectories updates both the modification and access times associated with the directory itself. As a consequence, all modification and access times of all directories will be updated to the time of the restore.
This has been corrected in Bacula version 1.30 and later. The directory modification and access times are reset to the value saved in the backup after all the files and subdirectories have been restored. This has been tested and verified on normal restore operations, but not verified during a bare metal recovery.
If any of you look closely at the bootstrap file that is produced and used for the restore (I sure do), you will probably notice that the FileIndex item does not include all the files saved to the tape. This is because in some instances there are duplicates (especially in the case of an Incremental save), and in such circumstances, Bacula restores only the last of multiple copies of a file or directory.
Due to open system files, and registry problems, Bacula cannot save and restore a complete Win2K/XP/NT environment.
A suggestion by Damian Coutts using Microsoft’s NTBackup utility in conjunction with Bacula should permit a Full bare metal restore of Win2K/XP (and possibly NT systems). His suggestion is to do an NTBackup of the critical system state prior to running a Bacula backup with the following command:
ntbackup backup systemstate /F c:\systemstate.bkf
The backup is the command, the systemstate says to backup only the system state and not all the user files, and the /F c:\systemstate.bkf specifies where to write the state file. this file must then be saved and restored by Bacula. This command can be put in a Client Run Before Job directive so that it is automatically run during each backup, and thus saved to a Bacula Volume.
To restore the system state, you first reload a base operating system, then you would use Bacula to restore all the users files and to recover the c:\systemstate.bkf file, and finally, run NTBackup and catalogue the system statefile, and then select it for restore. The documentation says you can’t run a command line restore of the systemstate.
This procedure has been confirmed to work by Ludovic Strappazon – many thanks!
A new tool is provided in the form of a bacula plugin for the BartPE rescue CD. BartPE is a self-contained WindowsXP boot CD which you can make using the PeBuilder tools available at http://www.nu2.nu/pebuilder/ and a valid Windows XP SP1 CDROM. The plugin is provided as a zip archive. Unzip the file and copy the bacula directory into the plugin directory of your BartPE installation. Edit the configuration files to suit your installation and build your CD according to the instructions at Bart’s site. This will permit you to boot from the cd, configure and start networking, start the bacula file client and access your director with the console program. The programs menu on the booted CD contains entries to install the file client service, start the file client service, and start the WX-Console. You can also open a command line window and CD Programs\Bacula and run the command line console bconsole.
Bacula versions after 1.31 should properly restore ownership and permissions on all WinNT/XP/2K systems. If you do experience problems, generally in restores to alternate directories because higher level directories were not backed up by Bacula, you can correct any problems with the SetACL available under the GPL license at: http://sourceforge.net/projects/setacl/.
Ludovic Strappazon has suggested an interesting way to backup and restore complete Win32 partitions. Simply boot your Win32 system with a Linux Rescue disk as described above for Linux, install a statically linked Bacula, and backup any of the raw partitions you want. Then to restore the system, you simply restore the raw partition or partitions. Here is the email that Ludovic recently sent on that subject:
I've just finished testing my brand new cd LFS/Bacula with a raw Bacula backup and restore of my portable. I can't resist sending you the results: look at the rates !!! hunt-dir: Start Backup JobId 100, Job=HuntBackup.2003-04-17_12.58.26 hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:14 JobId: 100 Job: HuntBackup.2003-04-17_12.58.26 FileSet: RawPartition Backup Level: Full Client: sauvegarde-fd Start time: 17-Apr-2003 12:58 End time: 17-Apr-2003 13:14 Files Written: 1 Bytes Written: 10,058,586,272 Rate: 10734.9 KB/s Software Compression: None Volume names(s): 000103 Volume Session Id: 2 Volume Session Time: 1050576790 Last Volume Bytes: 10,080,883,520 FD termination status: OK SD termination status: OK Termination: Backup OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune. hunt-dir: Start Restore Job RestoreFilesHunt.2003-04-17_13.21.44 hunt-sd: Forward spacing to file 1. hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:54 JobId: 101 Job: RestoreFilesHunt.2003-04-17_13.21.44 Client: sauvegarde-fd Start time: 17-Apr-2003 13:21 End time: 17-Apr-2003 13:54 Files Restored: 1 Bytes Restored: 10,056,130,560 Rate: 5073.7 KB/s FD termination status: OK Termination: Restore OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune.
If for some reason you want to do a Full restore to a system that has a working kernel (not recommended), you will need to take care not to overwrite the following files:
/etc/grub.conf /etc/X11/Conf /etc/fstab /etc/mtab /lib/modules /usr/modules /usr/X11R6 /etc/modules.conf
Many thanks to Charles Curley who wrote Linux Complete Backup and Recovery HOWTO for the The Linux Documentation Project. This is an excellent document on how to do Bare Metal Recovery on Linux systems, and it was this document that made me realize that Bacula could do the same thing.
You can find quite a few additional resources, both commercial and free at Storage Mountain, formerly known as Backup Central.
And finally, the O’Reilly book, “Unix Backup & Recovery” by W. Curtis Preston covers virtually every backup and recovery topic including bare metal recovery for a large range of Unix systems.