Posted by docelic on Mon 3 Jan 2005 at 18:26
When you are trying to reboot the system remotely after a kernel upgrade it's a good idea to have a rescue net. Using lilo allows you just such a thing, automatic rebooting if the machine panics or hangs.
When you reboot after an upgrade there are two things you want to make sure:
- That the kernel will boot properly (mount the root filesystem) - That the network interfaces will go up as expected
We will take care of the first problem by supplying panic= argument on system boot, to auto-reboot in case of problems. We'll take care of the second problem by running a special script that will perform network connectivity test.
There are other things you'd want to check for. For example, you could check if the SSH daemon is running and is properly accepting connections, or you could perform some site-specific checks. This ideas are not implemented in my article. If you enhance my procedure, please notify me of the results because at some future time I might create a Debian package dedicated to setting things up for more reliable remote reboots.
Here are the steps needed:
* Adding a special bootloader entry (LILO example): image=/vmlinuz.new label=Newkernel append="panic=5 newkernel" read-only optionalThis will create a special image name to test the new kernel. panic=5 makes sure the kernel autoreboots on panic in 5 secods, and newkernel is an arbitrary "newinit"-like name we chose, so we can later check if we're in test phase or normal run.
* Put the testnet script in /etc/init.d/ and activate it: update-rc.d testnet start 40 S .This will install a script that will test network connectivity once you reboot. This will handle cases where the kernel does not panic (it mounts the root filesystem at least), but then something goes wrong and it doesnt start up the network properly (incorrect kernel driver modules setup or something). The script can be found at my website, or locally here
* If you installed the kernel image from the .deb package, make sure the /vmlinuz link still points to the old kernel, and /vmlinuz.new to the new one.
* Also make sure there's an account left open for the colo facility personnel to access the system if you mess it up. Use adduser to create the account "support", then add them to sudoers file without a password: echo "support ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
* Write the new lilo.conf (which we modified in step 1), and make the test kernel (newkernel image) the default for just the next boot: lilo lilo -R newkernel
* Reboot and see how it plays out.If you get into the new kernel, adjust /vmlinuz and /vmlinuz.old symlinks appropriately, re-run lilo, and reboot once again to the new image (which is now the default (so no lilo -R ...)).
* When the new system comes up the second time, disable the 'support' account.
Original location of this article is http://colt.projectgamma.com/debian/remote-reboot.html . It was written by Davor Ocelic (docelic+mail.inet.hr) to help with the maintenance of The Internet Hosting Cooperative ( http://www.hcoop.net ) machines. Improvements to the process are of course welcome, please send them to the above author's email address.