Shlomi Fish

NAME

MediaWiki::CleanupHTML - cleanup the MediaWiki-generated HTML from MediaWiki embellishments.

VERSION

version 0.0.3

SYNOPSIS

    use MediaWiki::CleanupHTML;

    open my $fh, '<:encoding(UTF-8)', $filename
        or die "Cannot open '$filename' - $!";

    my $cleaner = MediaWiki::CleanupHTML->new({ fh => $fh });

    open my $out_fh, '>:encoding(UTF-8)', $processed_filename
        or die "Cannot open '$processed_filename' for output - $!";

    $cleaner->print_into_fh($out_fh);

    $cleaner->destroy_resources();

DESCRIPTION

The HTML rendered on MediaWiki pages is full of MediaWiki-specific embellishments such as edit sections. This module attempts to clean it up and return a more straightforward HTML. Note that the HTML returned by MediaWiki APIs may not always available (for instance if the wiki is down), so this module should be considered a fallback.

VERSION

Version 0.0.3

SUBROUTINES/METHODS

MediaWiki::CleanupHTML->new({fh => $fh})

The constructor - accepts the filehandle from which to read the XHTML.

$cleaner->print_into_fh($fh)

Output to a filehandle. The filehandle should be able to process UTF-8 output.

$cleaner->destroy_resources()

Destroy the allocated resources (of the HTML::TreeBuilder tree, etc.). Must be called before destruction.

AUTHOR

Shlomi Fish, http://www.shlomifish.org/ .

BUGS

Please report any bugs or feature requests to bug-mediawiki-cleanuphtml at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-CleanupHTML. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc MediaWiki::CleanupHTML

You can also look for information at:

ACKNOWLEDGEMENTS

The developers of HTML::TreeBuilder::XPath, HTML::TreeBuilder and related modules for their helpful code.

LICENSE AND COPYRIGHT

Copyright 2012 Shlomi Fish.

This program is distributed under the MIT (X11) License: http://www.opensource.org/licenses/mit-license.php

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

AUTHOR

Shlomi Fish <shlomif@cpan.org>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2012 by Shlomi Fish.

This is free software, licensed under:

  The MIT (X11) License

BUGS

Please report any bugs or feature requests on the bugtracker website http://rt.cpan.org/NoAuth/Bugs.html?Dist=MediaWiki-CleanupHTML or by email to bug-mediawiki-cleanuphtml@rt.cpan.org.

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.

SUPPORT

Perldoc

You can find documentation for this module with the perldoc command.

  perldoc MediaWiki::CleanupHTML

Websites

The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.

Bugs / Feature Requests

Please report any bugs or feature requests by email to bug-mediawiki-cleanuphtml at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=MediaWiki-CleanupHTML. You will be automatically notified of any progress on the request by the system.

Source Code

The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)

http://bitbucket.org/shlomif/perl-mediawiki-cleanuphtml

  hg clone ssh://hg@bitbucket.org/shlomif/perl-mediawiki-cleanuphtml