The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Amuse::Preprocessor - Helpers for Text::Amuse document formatting.

VERSION

Version 0.64

SYNOPSIS

  use Text::Amuse::Preprocessor;
  my $pp = Text::Amuse::Preprocessor->new(
                                          input => $infile,
                                          output => $outfile,
                                          html           => 1,
                                          fix_links      => 1,
                                          fix_typography => 1,
                                          fix_nbsp       => 1,
                                          fix_footnotes  => 1
                                         );
  $pp->process;

DESCRIPTION

This module provides a solution to apply some common fixes to muse files.

Without any option save for input and output (which are mandatory), the only things the module does is to remove carriage returns, replace character ligatures or characters which shouldn't enter at all and expand the tabs to 4 spaces (no smart expanding).

LANGUAGE SUPPORT

The following languages are supported

english

smart quotes, dashes, and the common superscripts (like 11th)

russian

smart quotes, dashes and non-breaking spaces

spanish

smart quotes and dashes

finnish

smart quotes and dashes

swedish

smart quotes and dashes

serbian

smart quotes and dashes

croatian

smart quotes and dashes

italian

smart quotes and dashes

macedonian

smart quotes and dashes

german

smart quotes and dashes

ACCESSORS

The following values are read-only and must be passed to the constructor.

Mandatory

input

Can be a string (with the input file path) or a reference to a scalar with the text to process).

output

Can be a string (with the output file path) or a reference to a scalar with the processed text.

Optional

html

Before doing anything, convert the HTML input into a muse file. Even if possible, you're discouraged to do the html import and the fixing in the same processing. Instead, create two objects, then first do the HTML to muse convert, save the result somewhere, add the headers, then reprocess it with the required fixes above.

Notably, the output will be without an header, so the language will not be detected.

Default to false.

Find the links and add the markup if needed. Default to false.

fix_typography

Apply the typographical fixes. Default to false. This add the "smart quotes" feature.

remove_nbsp

Remove all the non-break spaces in the document, unconditionally. This options does not conflict with the following. If both are provided, first the non-break spaces are removed, then reinserted.

fix_nbsp

Add non-break spaces where appropriate (whatever this means).

show_nbsp

Make the non-break spaces visible and explicit as ~~ (available on Text::Amuse since version 0.94).

fix_footnotes

Rearrange the footnotes if needed. Default to false.

debug

Don't unlink the temporary files and be verbose

METHODS

new(%options)

Constructor. Accepts the above options.

process

Process input according to the options passed and write into output. Return output on success, false otherwise.

html_to_muse

Can be called on the class and will invoke the Text::Amuse::Preprocessor::HTML's html_to_muse function on the argument returning the converted chunk.

error

This is set only when processing footnotes. See Text::Amuse::Preprocessor::Footnotes documentation for the hashref returned when an error has been detected.

tmpdir

Return the directory name used internally to hold the temporary files.

AUTHOR

Marco Pessotto, <melmothx at gmail.com>

BUGS

Please report any bugs or feature requests to the author's email. If you find a bug, please provide a minimal muse file which reproduces the problem (so I can add it to the test suite).

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Text::Amuse::Preprocessor

Repository available at GitHub: https://github.com/melmothx/text-amuse-preprocessor

SEE ALSO

The original documentation for the Emacs Muse markup can be found at: http://mwolson.org/static/doc/muse/Markup-Rules.html

The parser itself is Text::Amuse.

This distribution ships the following executables

  • html-to-muse.pl (HTML to muse converter)

  • muse-check-footnotes.pl (footnote checker)

  • muse-rearrange-footnotes.pl (fix footnote numbering)

  • pod-to-muse.pl (POD to muse converter)

  • muse-preprocessor.pl (script which uses this module)

See the manpage or pass --help to the scripts for usage.

LICENSE

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.