The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

msdoc - module to replace MS document by its text contents

VERSION

Version 0.05

SYNOPSIS

optex command -Mmsdoc

NOTICE

There is more general successor version of this module. Use https://github.com/kaz-utashiro/optex-textconv.

DESCRIPTION

This module replaces argument which terminate with .docx, pptx or xlsx files by node representing its text information. File itself is not altered.

For example, you can check the text difference between MS word files like this:

    $ optex diff -Mmsdoc OLD.docx NEW.docx

If you have symbolic link named diff to optex, and following setting in your ~/.optex.d/diff.rc:

    option default --msdoc
    option --msdoc -Mmsdoc $<move>

Next command simply produces the same result.

    $ diff OLD.docx NEW.docx

Text data is extracted by greple command with -Mmsdoc module, and above command is almost equivalent to below bash command using process substitution.

    $ diff <(greple -Mmsdoc --dump OLD.docx) \
           <(greple -Mmsdoc --dump NEW.docx)

ENVIRONMENT

This version experimentally support other converter program. If the environment variable OPTEX_MSDOC_CONVERTER is set, it is used instead of greple. Choose one from greple, pandoc or tika.

SEE ALSO

https://github.com/kaz-utashiro/optex-msdoc

It is possible to use other data conversion program, like pandoc or "Apache Tika". Feel to free to modify this module. I'm reluctant to use them, because they work quite leisurely.

https://github.com/kaz-utashiro/optex-textconv

LICENSE

Copyright 2020 Kazumasa Utashiro.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Kazumasa Utashiro