This site is now 100% read-only, and retired.

Splitting updatedb into daily and weekly

Posted by mcortese on Thu 20 Apr 2006 at 09:41

We all appreciate the locate command when we are such in a hurry we cannot afford a full and in-elegant find. What we like a little less, though, is the updatedb script consuming up all our disk bandwidth at each boot, summoned by anacron.

Of course, this is only the case if you are running a "desktop" machine: since you turn it on when you need to do some work, then you long for a way to shorten the period of reduced usability forced by updatedb.

Inversely, if you run a server that never goes down, and you successfully schedule your updatedb tasks late at night, then this article is not for you.

Two speeds

In any normal installation, there are directories that change more often than others. This reflects the traditional split between programs and datas.

The /usr directory has a static nature: the files in it are not meant to be changed by normal users, and even root does not update its contents very often. On some installations, /usr is even mounted read-only or served by a remote host via NFS. A common scheme is to access /usr in read/write mode only when doing a software upgrade (e.g. via apt-get).

The /home and /var directories, on the other hand, contain data that change continuously because of users and system activity.

So, it would be a good idea to have two databases for locate: one updated daily with the contents of the dynamic (and often small) directories, the other updated weekly with the contents of the static (and usually big) directories like /usr.

For the quick-changing database, I chose to keep the standard location /var/cache/locate/locatedb. For the rarely-modified one, a good choice could be /var/cache/locate/locatedb.usr.

Two cron scripts

The first think to do, is to duplicate the cron script that updates the locate database, so that one copy is run daily, the second one is run weekly:

# cp /etc/cron.daily/find /etc/cron.weekly/find

The daily script must be modified to ignore the /usr path. So edit /etc/cron.daily/find adding the following lines just after the parts that sources the configuration file, but before calling updatedb:

### Skip big discs rarely updated:
PRUNEPATHS="$PRUNEPATHS /usr"

The weekly script needs to be changed as well. Edit /etc/cron.weekly/find at the line that invokes updatedb and modify it so that it reads:

  ### Search only /usr, since the rest is done daily:
  ARGS="--output=/var/cache/locate/locatedb.usr --localpaths=/usr"
  cd / && nice -n ${NICE:-10} updatedb $ARGS 2>/dev/null

One command for two databases

The final step is to tell locate that it has to fetch its data from two files, not just one. This is done specifying the two filenames in a shell variable, separated with a colon:

$ LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr

I suggest you make this setting the default for every user adding the following lines to /etc/bash_bashrc:

### Locate the daily and weekly databases, if not defined yet:
if [ -z "$LOCATE_PATH" ]; then
  export LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr
fi

To test your setup, manually run the daily and weekly scripts, and then try to run locate with a filename present both inside and outside /usr:

# /etc/cron.weekly/find
# /etc/cron.daily/find
# LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr
# locate dmesg

You should find both the dmesg log file in /var and the man page in /usr.

 

 


Re: Splitting updatedb into daily and weekly
Posted by Anonymous (62.147.xx.xx) on Fri 21 Apr 2006 at 08:53
Hello, this article is very helpfull, but i run Linux Ubuntu Breezy Badger and this way to do seems not to be good on this system.
There's no file called "find" in cron.daily, but there's slocate, find.noslocate and find.noslocate.dpkg-new...

These 3 files seem to run updatedb, but i'm not sure.
Can you explain me what are the things to do on ubuntu ?
Thank you.

[ Parent ]

Re: Splitting updatedb into daily and weekly
Posted by mcortese (213.70.xx.xx) on Fri 21 Apr 2006 at 11:05
[ View Weblogs ]

Not being familiar with Ubuntu, I cannot give you detailed instructions. I can only suggest:

  • in the daily version, you should define the shell variable PRUNEPATHS to the directories you want to exclude before calling updatedb;
  • in the weekly version, you should make sure that the invocation of updatedb gets the arguments --output and --localpaths as stated in my article.

By the way, a trailing .dpkg-new indicates that during the last upgrade a new version of that file was available, but the system detected that you had manually changed the old file, so the new one was not installed with its regular name in order not to overwrite your changes.

[ Parent ]

Re: Splitting updatedb into daily and weekly
Posted by Anonymous (68.149.xx.xx) on Fri 21 Apr 2006 at 20:30
This is very usefull for laptop users! Thanks.



[ Parent ]

Re: Splitting updatedb into daily and weekly
Posted by Anonymous (193.57.xx.xx) on Wed 12 Dec 2007 at 08:07
Hello, I have been using this trick until recently when locate was provided in a different package than findutils.

I installed mlocate as replacement. The package control file advertises that "instead of re-reading all the contents of all directories each time the database is updated, mlocate keeps timestamp information in its database and can know if the contents of a directory changed without reading them again. This makes updates much faster and less demanding on the hard drive. This feature is only found in mlocate".

I am thus wondering if it is still worth splitting the updatedb into daily and weekly with mlocate. After few tests, it seems that the database update is indeed very fast.

[ Parent ]