Question: Best tool for bare metal restore of Debian servers?
Posted by paulgear on Wed 4 Jun 2008 at 11:46
I've been doing a bit of searching through the Debian Administration archives and one thing that doesn't seem to have been discussed very much is full system recovery. There are plenty of discussions on different backup options, but nothing targeted at what seems to me the simplest possible backup scenario: protecting a single machine (specifically a server) so that if it is compromised it can be rolled back to a previous state.
My current disaster recovery plan for my (Internet accessible) server is: rebuild from scratch and copy in my rsnapshot backups as necessary. I'm very happy with the way rsnapshot gives me multiple point-in-time backups at very little cost in terms of disk space, but i'm a little concerned about the time to recover that my current plan would involve (especially since I'm self-employed, and time is money).
I've done some web searches and the product that seems to keep popping up is mondo. However, the last D-A article on mondo is a little old. Is it a good choice? Are there other good options?
My expected recovery scenario is having a recovery DVD or USB hard disk drive sitting off-site somewhere (if it's a DVD, i'd probably make a few copies and keep one on site), and recover from that and then restore my latest rsnapshot files from a USB hard disk. How many people have actually tested a disaster recovery plan similar to this and found it to be suitable in terms of quality of recovery and time taken to recover?
[ Parent | Reply to this comment ]
Intro:
"Relax and Recover (abbreviated ReaR) is a highly modular disaster recovery framework for GNU/Linux based systems, but can be easily extended to other UNIX alike systems. The disaster recovery information (and maybe the backups) can be stored on the network, USB devices and DVD/CD-R. The result is bootable rescue system that can be booted via PXE, DVD/CD and USB media."
Sounds like what you need.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Afterwards cfengine automagically configures the server and Iôm done.
It took a long time to setup this, but now it runs fine :)
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
I also use Bacula. Very nice back up solution. It's easier to restore from LVM's but Bacula is great for the average folder/partition restore.
I've been meaning to see if I can restore the logical volume with a live CD. Anyone have experience restoring a logical volume with a live cd?
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
In the question, i asked about a single machine. I have more than that, but it certainly fits your "5 or 6 machines, each with different package sets" scenario better than the "industrial strength imaging solution" scenario. I would want to be able to deploy the same solution for my laptop, which sees a lot of hostile environments in my work as an independent consultant.
Most of the answers above didn't address how to get the data back onto the machine. It's all well and good to say "take a dd", or "use LVM snapshots", or "make full backups with rsnapshot", but it's the restore process i'm concerned about. How do you get a machine to the state where you can actually do the restores? Boot from a Knoppix disk and set up all the partitions & stuff manually? Reinstall? Those are the sort of solutions i want to avoid due to the time they take.
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Really if the backup is offsite, via rsnapshot you've got two problems for a speedy recovery:
- Bootstrapping a working system so that you can transfer the files back.
- Transferring the data back - ie. actually restoring.
Me? I take backups of 4-5 machines every few hours (including this host!) via a combination of rsnapshot & backuppc.
For recovery, thus far, I've never had to do a bare metal restoration - usually just restore specific files on request from users, or to repair my own mistakes. Those "repairs" are almost instant.
However the most interesting part of your problem is the bare-metal nature of the restoration. Whilst it is true that booting into a Debian Installation CD-ROM, and running through the partitioning + minimal setup would take time I think that you have to assume that a full restoration is an infrequant excercise, and that the most timeconsuming part of a full restore is going to be the network transfer of all the data back onto the "new" machine.
So I'd probably go that route - either boot into a knoppix CD-ROM via a PXE bootserver, or run through a basic console-only installation of Debian from a CD-ROM/PXE boot. (Search the site for "pxe" to find instructions on how to setup a PXE-environment - trivial to do, and very handy).
If you find that you have too many machines, or that failures are common then I guess systemimager would be a better path.
(IMHO using LVM snapshots can be useful; but it doesn't solve the problem that a restore, over a network, is going to take a lot longer than any other part of the job - realising that is probably key ..)
[ Parent | Reply to this comment ]
Really if the backup is offsite, via rsnapshot you've got two problems for a speedy recovery:
- Bootstrapping a working system so that you can transfer the files back.
- Transferring the data back - ie. actually restoring.
I wasn't planning on leaving the backup off site to do the restore. :-)
...
For recovery, thus far, I've never had to do a bare metal restoration - usually just restore specific files on request from users, or to repair my own mistakes. Those "repairs" are almost instant.However the most interesting part of your problem is the bare-metal nature of the restoration. Whilst it is true that booting into a Debian Installation CD-ROM, and running through the partitioning + minimal setup would take time I think that you have to assume that a full restoration is an infrequant excercise, and that the most timeconsuming part of a full restore is going to be the network transfer of all the data back onto the "new" machine.
I think the most "interesting" (by which i mean difficult and annoying) part is what to restore from the full backup and what to leave on the restored machine. This page from a previous D-A article has a few hints, but i'd rather use a recovery system that has worked out the exact mechanism without requiring additional thought.
I guess that's the bottom line for me: what is the most bullet-proof recovery system? The big opportunity for mistakes is when i put my fat fingers on the keyboard and start messing with things. I want something that takes the human factor out of it as much as possible.
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
If you can be on-site, as you suggest, then I would imagine restoring the complete contents of the backup would be
- Simple.
- Less confusing.
- Lower risk
With onsite access you're literally talking about booting with a knoppix CD, and copying over the contents of your backup data-store to the new system. The only steps you should need are:
- Partition the disk(s).
- Do the copy.
- Install grub.
- Make sure that /etc/fstab is correct.
- Reboot.
Still it sounds like you'd want a "one click" restoration system, and to the best of my knowledge nothing like that exists just yet. You need, in some way, to have a working system to accept a backup via rsync/scp/ssh. Failing that you need physical access if you're just going to do a straight copy via dd.
Regardless of what you use to restore the data there will almost always be some fiddling one complete - things like grub, and ensuring that partition names & etc are correct.
[ Parent | Reply to this comment ]
It really payes off at large installations.
[ Parent | Reply to this comment ]
Check out Clonezilla LiveCD [6] if you want to give it a try.
BTW, if you need massive full system recovery for PC classroom or PC Cluster, you can check out DRBL[7] for multicast support of Clonezilla.
[1] partimage - http://www.partimage.org/
Linux.com :: Backup and Restore Linux Partitions Using Partimag
http://www.linux.com/feed/59730
[2] ntfsclone - http://www.linux-ntfs.org/doku.php?id=ntfsclone
[3] partimage-ng - http://partimage-ng.org/
[4] Clonezilla - http://clonezilla.org
Linux.com :: Manage partitions and disks with GParted-Clonezilla live CD
http://www.linux.com/feature/115208
[5] Clonezilla/DRBL apt repository -
deb http://drbl.nchc.org.tw/drbl-core drbl stable
[6] Clonezilla Live CD - http://www.clonezilla.org/clonezilla-live/
gparted-clonezilla / clonezilla-sysresccd - http://www.clonezilla.org/related-live-cd/
[7] DRBL(Diskless Remote Boot Linux) - http://drbl.sf.net
http://www.clonezilla.org/clonezilla-server-edition/
[ Parent | Reply to this comment ]
Recovery time is usually within minutes.
I also use it to clone identical PCs or servers. It's pretty bullet proof.
[ Parent | Reply to this comment ]
I also use DAR for doing backups and restoring them.
To answer your question on what to restore from the backup: Depends on what your backup contains. I restore everything, but I dont backup everything. My (very shortened) DAR-command goes something like this:dar -R / -X "*.dar" -X "*.iso" -X "ibdata1" -X "ib_logfile?" -P cdrom -P mnt -P media -P lost+found -P proc -P sys -P dev/pts -P tmp -P var/log/mysql -P var/tmp -D -c foobar
-X means exclude file, -P exclude directory.
The mount-command will give you an idea which directories on your system are on a tmpfs and thus get wiped on shutdown. You generally dont need to backup them. Also I dont backup my MySQL-databases with DAR, there are special tools for doing that.
For the bare-metal-restore:
Boot from a Live-CD (preferably a Debian/Ubuntu-CD), enable the repositories in sources.list and apt-get install dar (or use any other method to get DAR running in your live system). Then partition your disks (dont forget the swap-partition), and mount them. Now restore your backups (dar -x mybackup -R /where/to/restore), first the full backups, then the differential/incremental ones if you have any).
If the partition layout of the new system is different from the old backed up system, you will maybe have to modify /???/boot/grub/device.map, /???/boot/grub/menu.lst and /???/etc/fstab (replace ??? with the path where you restored the files). The same is true if you use UUIDs to identify your partitions (vol_id gives you the UUIDs of your new partitions).
If your NICs changed too, check /???/etc/network/interfaces and (if you use udev) /???/etc/udev/rules.d/70-persistent-net.rules
Last step: Install grub with "grub-install --directory /??? /dev/sdxx" and "chroot /???/ update-grub" and reboot into your restored system.
I dont think its worth to automate these steps because with five servers youll have to do that very rarely.
[ Parent | Reply to this comment ]
This only gives you a single day of backup unless you save the images, which can get very large. I think that is why you see most people use a mix of setting up a base system and then recovering from backup because they want snapshots of data for other reasons besides disaster recovery. You know the famous, oops I need the old version of this file I just overwrote :).
[ Parent | Reply to this comment ]
We had to do some hacking to get mondo to work correctly with Sarge and Raid I think more because of Debian than Mondo. When doing software raid everything can get a bit more interesting.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
IMHO the best practice is planning bare metal restore from the install fase of a machine, and the best tool I found for this is FAI[1] (FAI the fully automated installation framework for linux).
I don't use it yet, but my plans are to use it to install all my servers.
You can read an articole here[2] on Debian Administration.
With network boot you can restore a server in minutes, or you can boot from a fai-cd.
You can have different versions of you configs via SVN or CVS.
For live data backup / restore you can use any solution.
[1] http://www.informatik.uni-koeln.de/fai/
[2] http://www.debian-administration.org/articles/240
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Backup and restore with openvz and VE's is soooo sweet and easy..
---
stoffell
[ Parent | Reply to this comment ]
Here's how I have my systems set up. I have separate physical partitions for our database storage, virtual Apache web servers, and /home. I do an bare metal backup of the / and /boot partitions, which encompass all but the three partitions I have already mentioned, with Mondo. I then do daily backups of the other three partitions with rsnapshot.
I create dvd .iso images with Mondo and have tested them successfully. Mondo does a very good job in my estimation. The only problem I have run into so far is with the 2.6.24 amd64 kernel. I have to boot the systems into another kernel to do the Mondo backups, but they successfully restore to a working amd64 kernel.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
nice tools, agents etc, but $$ kosts money.
Im using it now for about 1 year and i works very well.
i backup 6 servers with it.
[ Parent | Reply to this comment ]
We keep a full-backup (excluding /proc, /sys and maybe /dev) of all the systems on a backup-server.
We keep a file with the output of df -h in /etc/, so combined with /etc/fstab we know the layout and size of the different paritions (there is also /var/log/dmesg if you need more hardware information).
When you want to do a restore:
pop in a live-system (cd, usb whatever)
ssh into the backup-server to look at the df and fstab
create the filesystems and mountpoints for example in /target
create a /proc, /sys.
And rsync them back.
Then login to the newly create system.
There are 2 ways:
1. chroot:
chroot /target/bin/login
2. use the live-cd to boot from the newly created system:
specify the right kernel-line on the live-system boot
when you are logged into the system, run grub or lilo (or silo or whatever) with options if needed. And then you can boot in the newly restored system.
So we haven't automated it, but it doesn't happen like every day or something.
And it's just a few commands.
Although with /sys and /dev and udev added in later releases it has become more work, for example this file also needs to be kept an eye on:
/etc/udev/rules.d/z25_persistent-net.rules
It specifies the name (ethN) of the network interface.
We use it for doing upgrade-testing as well (usually we create only one filesystem and maybe leave out some data).
[ Parent | Reply to this comment ]
As older Debian-kernels might otherwise maybe not understand them and refuse to mount the partition.
The chroot-methode allows you to do a lot of other things as well, as install a different kernel because it's different hardware or restoring on.
I started doing this a long time a go, when servers still had the possibility to use floppy disks, and used tomrtbt-disk and an extra disk with a compressed statically linked rsync.
I still keep a statically linked rsync (available from the backupserver by ftp, http, scp, sftp).
You statically link it this way: CCFLAGS=-static make
You can see if it's statically linked with: file rsync
To make the rsync-file smaller you can use strip rsync (removes debug-symbols)
You can also use: ldd -v rsync
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
What I have done is place all the configs and package dependancies in my own repo & packages. I can just build server from disk point it to my repo and install the package for that server which is linked to say custom-env and custom-env-vim and slapd-domain.name.here.
The data well I have my own scripts that rsync/ssh/zip the stuff around the place to a couple of different servers in a couple of different areas
Making my own packages which are kept in cvs gives me a lot of control. And yep I have 2 working repo's as well just in case
Alex
[ Parent | Reply to this comment ]
Make a new image after major changes and incremental rsnapshots after that?
You can store the image off-site for additional reliability or make several copies. I know some friends who use Ghost4Linux:
http://www.howtoforge.com/back_up_restore_harddrives_partitions_w ith_ghost4linux
[ Parent | Reply to this comment ]