NAME - class for internal representation of a document record


 use Combine::XWI;
 $xwi = new Combine::XWI;

 #single value record variables

 my $server = $xwi->server();

 #original content

 my $text = ${$xwi->content()};

 #multiple value record variables

 my ($name,$content);
 while (1) {
  ($name,$content) = $xwi->meta_get;
  last unless $name;


Provides methods for storing and retrieving structured records representing crawled documents.




Saves $val using AUTOLOAD. Can later be retrieved, eg

    $xwi->MyVar('My value');
    $t = $xwi->MyVar;

will set $t to 'My value'


Forget all values.


*_get will start with the first value.


stores values into the datastructure


retrieves values from the datastructure

meta_reset() / meta_rewind() / meta_add() / meta_get()

Stores the content of Meta-tags

Takes/Returns 2 parameters: Name, Content


 my ($name,$content);
 while (1) {
  ($name,$content) = $xwi->meta_get;
  last unless $name;

xmeta_reset() / xmeta_rewind() / xmeta_add() / xmeta_get()

Extended information from Meta-tags. Not used.

url_remove() / url_reset() / url_rewind() / url_add() / url_get()

Stores all URLs (ie if multiple URLs for the same page) for this record

Takes/Returns 1 parameter: URL

heading_reset() / heading_rewind() / heading_add() / heading_get()

Stores headings from HTML documents

Takes/Returns 1 parameter: Heading text

Stores links from documents

Takes/Returns 5 parameters: URL, netlocid, urlid, Anchor text, Link type

robot_reset() / robot_rewind() / robot_add() / robot_get()

Stores calculated information, like genre, language, etc

Takes/Returns 2 parameters Name, Value. Both are strings with max length Name: 15, Value: 20

topic_reset() / topic_rewind() / topic_add() / topic_get()

Stores result of topic classification.

Takes/Returns 5 parameters: Class, Absolute score, Normalized score, Terms, Algorithm id

Class, Terms, and Algorithm id are strings with max lengths Class: 50, and Algorithm id: 25

Absolute score, and Normalized score are integers

Normalized score and Terms are optional and may be replaced with 0, and '' respectively


Combine focused crawler main site


Yong Cao <> v0.05 1997-03-13

Anders Ardö, <>


Copyright (C) 2005,2006 Anders Ardö

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

See the file LICENCE included in the distribution at

1 POD Error

The following errors were encountered while parsing the POD:

Around line 424:

Non-ASCII character seen before =encoding in 'Ardö,'. Assuming ISO8859-1