This site is now 100% read-only, and retired.

Searching mail with mboxgrep

Posted by Steve on Fri 25 Mar 2005 at 13:25

I've always kept my email around ever since I first started receiving it. Every message addressed to me, except spam and viral mail is archived. Each message is filed away safe and sound in case I need to look at it again.

Barring mailing list messages which I only keep if they particularly interest me, or for the duration of my participation in a particular thread, I've kept every mail addressed to me.

The benefits of keeping mail are probably obvious. Amongst other things I can keep track of people, interests, purchases and more.

The biggest drawback other than the disk space required to hold the archives is that as your mail archive grows searching becomes more difficult.

Thankfully there are solutions for searching mail, some technical and some a matter of organisation.

Whenever I start receiving mail from a new company, for example, I save it away in a new mailbox. So I have a box for Amazon, which is called Amazon.com, another box for Openstuff (openstuff.net), yet another for Slashdot mail (slashdot.org), and so on.

I've found that naming mailboxes after the domain name the mail comes from very useful - it makes it obvious that it is from a company, and it allows all the mail to be kept together cleanly.

Beyond that simple level of organization I can take advantage of the searching facilities of my mail client.

I use Mutt as my mail client. It's a console based application which has a very powerfuil method of searching built in.

Mutt's search is mostly invoked via the l command which stands for limit. If you've got a mailbox open, and you're at the index then you can press l to limit the view of the box to messages which match a particular pattern.

For example if you wish to only view messages which have a subject containing the word "job" you can use:

~s job

This pattern uses "~s" to match the Subject of a message. Other modifiers are available, for example to match the word job in the body of a message you'd use:

~b job

All the limiting patterns for mutt are explained in it's manual - which is worth a read if you use it.

If you use a different email client you probably have searching facilities built in which you can explore.

However if your client doesn't have a very flexible method of searching your mailboxes then all is not lost, there is a perfectly usable stand-alone tool which allows you to search your mail in a flexible manner: mboxgrep.

As its name suggests mboxgrep is a package which allows you to search mailboxes with grep-like facilities.

(If you're not familiar with grep I highly suggest you have a look it via your favourite search engine, or manpage. I did briefly discuss regular expressions when writing about ngrep).

Unix mail can be stored in many different formats, the oldest form was to simply append messages together in a single file - this is the classic mbox format. (I'm simplifying here!).

Another method of storing messages is to place each individual message in its own file, and store that in a so-called Mailbox directory.

There are yet more formats for archiving mail, but these two formats mbox and Maildir are probably the most widely used. Happily mboxgrep can work with either.

The basic use is:

skx@mystery:~$ mboxgrep [flags] pattern Mailbox

Where pattern is a regular expression, and Mailbox is the file, or directory, you wish to search.

Assuming for the moment that your mail is stored beneath ~/Mail in the mbox format you could search for any message with the Subject containing the word Kemp as follows:

mboxgrep '^Subject:.*Kemp' ~/Mail/*

This will direct all messages which match to the console, where you can save them to a file, or examine them.

There are several flags you can use, of which some useful ones are:

-r, --recursive      Descend into directories recursively.
-i, --ignore-case    Ignore case distinctions in matches.
-H, --headers        Only search message headers.
-B, --body           Only search message bodies.
-m, --mailbox-format Allows you to specify the mailbox format.

All these options are documented in the manpage which you can read by installing the package and typing:

man mailboxgrep

Between this tool, and my mail clients built in tools, I've usually been able to find mail that I've received without too much effort.

There are other more complex solutions which are available, such as running a scanner on your mail and allowing it to be searched via a local search engine, but I've not had to investigate these further.