good OCR tools under debian?
Posted by dkg on Fri 26 Jan 2007 at 18:39
I have never needed to do Optical Character Recognition (turning scanned documents back into text form), but it appears i may soon need to (in english, FWIW).

Does anyone have a preferred tool/suite that is packaged for debian?

A scan of the archive turns up

  • gocr
  • ocrad
  • unpaper
  • clara
none of which i've ever used, and some of which seem stale (clara's version number is 20031214-2. Suggestions? Things to avoid? Have i missed something important?


Comments on this Entry

Re: good OCR tools under debian?
Posted by redbeard (216.49.xx.xx) on Tue 6 Feb 2007 at 11:57
I haven't used it, but tesseract-ocr (currently only in etch) is supposed to be a commercial quaility OCR package that far outperforms anything other open source packages currently available.

Apparently, Google released it after acquiring it from UNLV, which acquired it from HP. The reason Google got it is that the original developer was working at Google at the time. Check out this article on it for more info.

Good luck.

