SWISH::Filters::Doc2txt - Perl extension for filtering MSWord documents with Swish-e
This is a plug-in module that uses the "catdoc" program to convert MS Word documents to text for indexing by Swish-e. "catdoc" can be downloaded from:
The program "catdoc" must be installed and your PATH before running Swish-e.
This filter does not specify input or output character encodings. This will change in the future to all use of the user_data to set the encoding.
A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file.