This site is now 100% read-only, and retired.

Providing a website search facility with the namazu indexer

Posted by Steve on Mon 21 Sep 2009 at 15:29

Adding searching facilities to websites makes it a lot easier for finding content. When sites are dynamically constructed it is often simple to update the code to perform the searching in the application, but for sites constructed of static pages using an indexer such as namazu can give you a great interface in very short space of time.

The namazu2 package will allow you to create an index of the contents of a local directory of content, and then search against that index - The package includes a handy CGI script which can be used by your site users for that purpose.To get started you'll need to install the packages:

root@skx:~# aptitude update
root@skx:~# aptitude install namazu2 namazu2-index-tools

Indexing your content

Once you have the packages installed you'll need to create an index of your content. For this demonstration I've got a site located in the directory:

  • /home/www/blog.example.org/htdocs/

I'm going to create a new directory to store the index, and call that /index/ - so we'll run the indexer like this:

root@skx:~# mkdir /home/www/blog.example.org/index
root@skx:~#
root@skx:~# mknmz --output-dir /home/www/blog.example.org/index/  \
    /home/www/blog.example.org/htdocs/

This will very quickly perform the indexing (the next time you run this you'll find it skips content which hasn't changed since the last time you ran it), and create a number of files in the index/ directory:

root@skx:~# cd /home/www/blog.example.org/index/
root@skx:~# ls -l | wc -l
64

Configuring the CGI Script

Now that you have your content indexed you can allow visitors to your website to actually use that index to search your site.

The search script is located in /usr/lib/cgi-bin/namazu.cgi so you'll need to ensure that this can be executed by your site. Or you can do what I do which is to create a symlink for your site:

root@skx:~# mkdir /home/www/blog.example.org/cgi-bin
root@skx:~# ln -s /usr/lib/cgi-bin/namazu.cgi /home/www/blog.example.org/cgi-bin/namazu.cgi

Now that you have the CGI script available you need to configure it to use the index that is present. To do this you need to create a file .namazurc in the same directory as the script.

The most basic file would look like this:

##
## Index: Specify the directory where the indexes are located.
##
Index         /home/www/blog.example.org/index

## Replace: Replace TARGET with REPLACEMENT in URIs in search
## results.
Replace       /home/www/blog.example.org/htdocs/  http://blog.example.org/

With that done you should be able to point your browser at the search URL and enable it to work:

The results will be presented to the user via the stock templates, but these can be updated.

Customizing things

By default the search interface uses a number of template files from the location /usr/share/namazu/template/ but you can copy the files from that location somewhere else, edit them, and then point the CGI script at them.

To specify an alternative location edit the .namazurc file to include:

##
## Template: Set the template directory containing
## NMZ.{head,foot,body,tips,result} files.
##
#Template      /usr/share/namazu/index
Template       /home/www/blog.example.org/cgi-bin/template

Had you not installed your own .namazurc file the global one in /etc/namazu/namazurc would have been read - and that file contains default settings which you can view as a good example.

With the namazu2 package constructing a website search is a quick and painless operation, and by using a per-domain configuration file you can use the same script & templates across any number of sites.

 

 


Re: Providing a website search facility with the namazu indexer
Posted by ajt (195.112.xx.xx) on Wed 23 Sep 2009 at 20:25
[ View Weblogs ]

If you install the ikiwiki package as your wiki, Debian suggests the Xapian search engine. They seems to work very well together, I've not got round to using on a tranditional web site yet. Did you look at this or others when you were looking for a search engine?

--
"It's Not Magic, It's Work"
Adam

[ Parent ]

Re: Providing a website search facility with the namazu indexer
Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Thu 24 Sep 2009 at 04:18
[ View Weblogs ]

No, I didn't look at the alternatives. This was the first package that "apt-cache search .. " lead me to, and it worked nicely.

The poorly disguised example was based upon my blog search, and I was impressed at how well it worked.

(Not to mention the regexp search it allows. e.g. /CVE-200\d-\d/.)

Steve

[ Parent ]

Re: Providing a website search facility with the namazu indexer
Posted by ajt (195.112.xx.xx) on Thu 24 Sep 2009 at 20:38
[ View Weblogs ]

It installed okay and seems to work. I had to fiddle with an Apache redirect rule as it indexed my raw blosxom files which are not html files and not designed to be viewed outside of the blosxom engine it's self. Otherwise it seems to be okay. Just tinkering with the templates at the moment to make the whole thing look like it belongs.

--
"It's Not Magic, It's Work"
Adam

[ Parent ]

Re: Providing a website search facility with the namazu indexer
Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Thu 24 Sep 2009 at 20:53
[ View Weblogs ]

If its indexing things it shouldn't take a look at --exclude. For example I have:

--exclude='(/archive|/tags|/stats|*.txt|*.inc)'

Steve

[ Parent ]

Re: Providing a website search facility with the namazu indexer
Posted by ajt (195.112.xx.xx) on Thu 24 Sep 2009 at 21:08
[ View Weblogs ]

Ta, however in this instance I want it to index the blog entries, I just don't want it to point at the raw files, rather the dynamically generated content. I could turn on Blosxom's static generation but I can't be bothered. A simple Apache RedirectMatch seemed to work fine.

--
"It's Not Magic, It's Work"
Adam

[ Parent ]

Last link does not point to packages directly.
Posted by PaulePanter (78.53.xx.xx) on Wed 30 Sep 2009 at 10:51
[ View Weblogs ]

The last link does not point to the namazu2 package page and just to http://packages.debian.org/.

[ Parent ]

Re: Last link does not point to packages directly.
Posted by Steve (2001:0xx:0xx:0xxx:0xxx:0xxx:xx) on Wed 30 Sep 2009 at 10:59
[ View Weblogs ]

Thanks - I've updated it now.

Steve

[ Parent ]