This site is now 100% read-only, and retired.

XML logo

Just routine
Posted by jparrella on Wed 9 Nov 2005 at 16:41
Tags: none.

Hey there, everybody. Yesterday I had to handle a bunch of problems in our Mail Server. Just for the record, we're runnning Sarge with 2.6.8-1-686-smp from Debian and OpenWebMail 2.41-9 in Apache 1.3.33-6-sarge1, and so on. No strange packages installed. This machine had 250 days of uptime.

Suddenly, one of the users (specifically, The Boss) had trouble logging in to OpenWebMail. After he logged in, it derived in a Server Error message. He could, however, download and send mail using POP/SMTP.

So, checking the logs, I found that:
gdbm fatal: read error [Wed Nov 9 08:07:18 2005] [error] [client 172.16.0.1] Premature end of script headers: /usr/lib/cgi-bin/openwebmail/openwebmail-main.pl

So, the trouble was in libgdbm3, apparently. I checked out bugs against this package and found out that several people are getting this error frequently while using updatedb and man. I contributed with my case, since it seems to be an ugly coding of this library.

I tried Steve Langasek's patch, but this didn't work. I tried changing my kernel to 2.6.8-2-686, but this didn't work either. So I guessed it was about file corruption and that kind of stuff, so I fscked the /home partition (which is the place where OpenWebMail stores stuff) and rebooted with the new kernel.

Stop here and take into account that I work in a government office which opens from 0800 to 1630, and I usually work remotely. The fsck against the /dev/md0 (my RAID0 array for incoming mail -> /var/spool/mail) somehow failed and the system was hanged. Woo-hoo and wait until tomorrow.

So this morning I rebooted and had to re-create the RAID0 using mdadm:
mdadm -C /dev/md0 --level=raid0 --raid-devices=2 /dev/sda7 /dev/sdb1

After this, I remounted. This solved a new problem, but not the original one. So I decided to do it the hard way. I copied /var/spool/mail/user and /home/user, deleted him with deluser, added him again and copied over all his files. Now it's working ok.

Conclusion: libgdbm will definitely fail again. This is not solved. This is just the beggining.

 

Comments on this Entry

Re: Just routine
Posted by simonw (84.45.xx.xx) on Wed 9 Nov 2005 at 17:52
[ View Weblogs ]
I'm becoming a cynic of the whole file dbm database thing.

It seems natural that a hashed lookup should be such a simple database that it doesn't need a full blown relational database, but I've had several experiences of corruption on various of these databases over the last couple of few years (not all gdbm, in fact gdbm is probably one of the better code bases, and has some understandable tools for sorting things).

Where possible I now migrate data to Postgres instead, since most of the boxes have Postgres on for administrative data, and having it all in one place is great for integrating stuff.

Unless the maps are a build and forget type affair like many Postfix maps, which just recreate the entire database from a text file each time, in which case I'm reasonable happy. Indeed if I have all the data in a nice simple text file to recreate the db from, I'm reasonably happy anyway.

Last database of this type I added was the Postgrey greylist database, but that one can always be "rm'ed" if I find it has become corrupt.

[ Parent ]

Re: Just routine
Posted by Anonymous (142.179.xx.xx) on Tue 14 Nov 2006 at 00:03
I'm a system admin with SD#57 in BC,Canada and run my own web email
host server at home for servel companies.
I had this same issue but found there to be a problem with the
/var/mail spool files.
So I changed the grp back to "mail" and set the rights to 660 -R
and all was well again. Hope this helps
Openwebmail 2.52-1

Ben Stewart = Technical Analyst ( LSCE, CIRA register )
551 - Reta Ave.
Willow River B.C. Canada V2J 3C0
email: benny@clanstewart.ca

[ Parent ]