The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::Copy - copy a HTML file without breaking links.

VERSION

Version 1.12

SYMPOSIS

  use HTML::Copy;
  
  HTML::Copy->htmlcopy($source_path, $destination_path);
  
  # or
  
  $p = HTML::Copy->new($source_path);
  $p->copy_to($destination_path);

DESCRIPTION

This module is to copy a HTML file without beaking links in the file. This module is a sub class of HTML::Parser.

REQUIRED MODULES

HTML::Parser

CLASS METHODS

htmlcopy

    HTML::Copy->htmlcopy($source_path, $destination_path);

Parse contents of $source_path, change links and write into $destination_path.

parse_file

    $html_text = HTML::Copy->parse_file($source_path, $destination_path);

Parse contents of $source_path and change links to copy into $destination_path. But don't make $destination_path. Just return modified HTML. The encoding of strings is converted into utf8.

CONSTRUCTOR METHODS

new

    $p = HTML::Copy->new($source_path);

Make an instance of this module.

INSTANCE METHODS

copy_to

    $p->copy_to($destination_path)

Parse contents of $source_path given in new method, change links and write into $destination_path.

parse_to

    $p->parse_to($destination_path)

Parse contents of $source_path given in new method, change links and return HTML contents to wirte $destination_path. Unlike copy_to, $destination_path will not created.

ACCESSOR METHODS

io_layer

    $p->io_layer;
    $p->io_layer(':utf8');

Get and set PerlIO layer to read $source_path and to write $destination_path. Usualy it was automatically determined by $source_path's charset tag. If charset is not specified, Encode::Guess module will be used.

encode_suspects

    @suspects = $p->encode_sustects;
    $p->encode_suspects(qw/shiftjis euc-jp/);

Add suspects of text encoding to guess the text encoding of the source HTML. If the source HTML have charset tag, it is not requred to add suspects.

source_html

    $p->source_html;

Obtain source HTML's contents

AUTHOR

Tetsuro KURITA <tkurita@mac.com>