The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

    html2alvis - HTML to Alvis XML converter
    

SYNOPSIS

    html2alvis [options] [source directory ...]

  Options:

    --html-ext                 HTML file identifying filename extension
    --meta-ext                 meta file identifying filename extension
    --out-dir                  output directory
    --N-per-out-dir            # of records per output directory
    --meta-encoding            the encoding of the meta files
    --html-encoding            the encoding of all HTML files
    --html-encoding-from-meta  take the encoding of the HTML files from
                               the meta files (attribute 'detected-charset')
    --[no]original             include original document?
    --help                     brief help message
    --man                      full documentation
    --[no]warnings             warnings output flag
    

OPTIONS

--html-ext
    Sets the HTML file identifying filename extension. 
    Default value: 'html'.
--meta-ext
    Sets the  meta file identifying filename extension.
    The meta file syntax is

          <feature name>\t<feature value>\n

    Special features are url,title,date,detectedCharSet.
    Default value: 'meta'.
--out-dir
    Sets the output directory. Default value: '.'.
--N-per-out-dir
    Sets the # of records per output directory. Default value: 1000.
--meta-encoding
    Specifies the encoding of all meta files. Default value 'iso-8859-1'.
--html-encoding
    Specifies the encoding of all HTML files. Default value 'iso-8859-1'.
    Default: undef (meaning 'guess').
--html-encoding-from-meta
    Specifies whether the encoding of an HTML file should be read from
    the corresponding meta file. If no information is given there,
    --html-encoding is used, if that is not given, the encoding is guessed.
    Default: no.
--[no]original
    Shall the original document be included in the output? Default
    value: yes.
--help
    Prints a brief help message and exits.
--man
    Prints the manual page and exits.
--[no]warnings
    Output (or suppress) warnings. Default value: yes.

DESCRIPTION

    Goes recursively through the files under the source directory
    and converts them to Alvis XML files. Meta information (such
    as the URL or the detected character set, title of the document
    etc.) can be given in a separate meta file, one per each document,
    recognized by the shared basename. E.g. the HTML document is
    called foo.original and the meta information is in foo.meta.
    In this case html2alvis should be called like this:
   
          html2.alvis --html-ext original --meta-ext meta