This site is now 100% read-only, and retired.

XML logo

Good ways to detect and restart failed daemons
Posted by endecotp on Sat 8 Mar 2008 at 23:45
Tags: none.
Does anyone have any hints for automatically detecting and restarting daemons that have failed for some reason? I have a couple of concerns: my own code that could just crash, and "innocent" daemons that are killed by the OOM killer when memory is low (because my own code got into a loop eating memory...). In both cases these are things that are started by /etc/init.d scripts and there are probably /var/run/*.pid files for them.

I could easily knock together a cron script that would use the pid files to check for daemons that have gone away, and restart them. Of course automatic restart would not always be appropriate, but I'm thinking about the best thing to do when I'm on holiday and human intervention could be weeks away! I was wondering if there is any existing utility that would do this - maybe even tied in to start_stop_daemon, for example, or the metadata at the start of the init.d files.

Any ideas anyone, before I roll my own?

 

Comments on this Entry

Re: Good ways to detect and restart failed daemons
Posted by mwr (24.158.xx.xx) on Sun 9 Mar 2008 at 00:56
[ View Weblogs ]
For the specific task of restarting processes, monit. For the more general case of restarting processes plus other configuration tasks, puppet.

[ Parent ]

Re: Good ways to detect and restart failed daemons
Posted by Steve (82.32.xx.xx) on Sun 9 Mar 2008 at 12:03
[ View Weblogs ]

Definitely using monit is the way forward..

Steve

[ Parent ]

Re: Good ways to detect and restart failed daemons
Posted by Anonymous (201.208.xx.xx) on Mon 10 Mar 2008 at 03:08
I'll try it, my clamav has been dying since a few months ago and I don't know why.

[ Parent ]

Re: Good ways to detect and restart failed daemons
Posted by dkg (216.254.xx.xx) on Mon 10 Mar 2008 at 07:18
[ View Weblogs ]
You could also use runit as a process supervision suite. It's a beautifully clean design and implementation, and the maintainer (Gerrit Pape) is reasonable, responsive and easy to work with.

[ Parent ]

Re: Good ways to detect and restart failed daemons
Posted by endecotp (86.6.xx.xx) on Mon 10 Mar 2008 at 11:10
[ View Weblogs ]
Thanks for the monit suggestions. I wasn't aware of its ability to restart things. I'll look into it some more.

[ Parent ]

Re: Good ways to detect and restart failed daemons
Posted by drgraefy (128.59.xx.xx) on Thu 13 Mar 2008 at 20:10
[ View Weblogs ]
I agree with dkg that the real proper way to manage daemons is with runit. It handles daemon supervision from top to bottom, including writing to logs, and can automatically restart and manage daemons if they die.

[ Parent ]