NAME

    news_xml2alvis.pl - news XML to Alvis XML converter
    

SYNOPSIS

    news_xml2alvis.pl [options] [source directory ...]

  Options:

    --xml-ext            XML file identifying filename extension
    --meta-ext           meta file identifying filename extension
    --out-dir            output directory
    --N-per-out-dir      # of records per output directory
    --meta-encoding      the encoding of the meta files
    --help               brief help message
    --man                full documentation
    --[no]warnings       warnings output flag
    

OPTIONS

--xml-ext
    Sets the XML file identifying filename extension. 
    Default value: 'xml'.
--meta-ext
    Sets the  meta file identifying filename extension.
    Default value: 'meta'.
--out-dir
    Sets the output directory. Default value: '.'.
--N-per-out-dir
    Sets the # of records per output directory. Default value: 1000.
--meta-encoding
    Specifies the encoding of the meta files. Default value 'iso-8859-1'.
--help
    Prints a brief help message and exit.
--man
    Prints the manual page and exits.
--[no]warnings
    Output (or suppress) warnings. Default value: yes.

DESCRIPTION

    Goes recursively through the files under the source directory
    and converts them to Alvis XML files. Meta information (such
    as the URL or the detected character set, title of the document
    etc.) can be given in a separate meta file, one per each document,
    recognized by the shared basename. E.g. the XML document is
    called foo.news and the meta information is in foo.meta.
    In this case news_xml2alvis.pl should be called like this:
   
          news_xml2.alvis.pl --xml-ext news --meta-ext meta  
    
    The news XML files are expected to be of the format

    <DOCUMENT>
      <article>
        <date></date>
        <iso-date></iso-date>
        <title></title>
        <content></content>
        <links>
            <link type="a">
                <location></location>
            </link>
        </links>
      </article>

    and meta files of the format 

          <feature name>\t<feature value>\n

    Special features are url,title,date,detectedCharSet.