The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RePrec::Collection - Parse relevance judgements for evaluation purposes

SYNOPSIS

  require RePrec::Collection;

DESCRIPTION

To do an evaluation of effectiveness of information retrieval methods one needs relevance judgements for queries and a collection under consideration. These need to be parsed for doing the evaluation. Class RePrec::Collection provides for means to do so which should suit for most formats of relevance judgments. In case it doesn't suit one can subclass this class. From a list of relevance judgements one needs to filter the query ID (QID), the document ID (DOCID) and a judgement (JUDGE) wether DOCID is relevant with respect to QID. As an additional parameter the number of documents in the collection under consideration is needed.

METHODS

new %parms

Constructor which does the parsing of a given judgements file. The constructor calls the private method _init (with %parms as argument) in order to do the parsing. The argument %parms is described within the documentation of that method.

_init %parms

The file parsing method, which should be the only method to replace in subclasses of RePrec::Collection. Within this baseclass it is assumed that the data in $file comes as an table, with each row containing a QID, a DOCID and the judgement (JUDGE) itself. A document is marked relevant if the value of JUDGE equals 1. Argument %parms keep the following parameters (defaults are given in parens):

separator (' +')

perl regular expression separating columns

qid

column which holds the QIDs

docid

column which holds the DOCIDs

judge

column which holds the JUDGEs

ignore (undef)

perl regular expression; matching rows are ignored

numdocs (undef)

number of documents in the collection under consideration.

relevant $qid, $docid

returns 1 if document with ID $docid is relevant with respect to query with ID $qid. Else returns undef.

get_numdocs

returns number of documents with respect to the collection under consideration.

get_numrels $qid

returns number of relevant documents for query with ID $qid with respect to the collection under consideration.

BUGS

Yes. Please let me know!

SEE ALSO

perl(1).

AUTHOR

Norbert Gövert <goevert@ls6.cs.uni-dortmund.de>