The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

NLP::GATE::AnnotationSet - A class for representing GATE-like annotation sets

VERSION

Version 0.6

SYNOPSIS

  use NLP::GATE::AnnotationSet;
  my $annset = NLP::GATE::AnnotationSet->new();
  $annset->add($annotation);
  $newannset = $annset->get($type[,$featuremap]);
  $arrayref = $annset->getAsArrayRef();
  $ann = $annset->getByIndex();
  $ann = $annset->size();

DESCRIPTION

This is a simple class representing a annotation set for documents in the format the GATE software (http://gate.ac.uk/) uses.

An annotation set can contain any number of NLP::GATE::Annotation objects. Currently, there is no code to make sure that annotations are only added once.

Annotation sets behave a bit like arrays in that each annotation can be addressed by an index and each set always contains a known number of annotations.

TODO: use the offset indices in method getByOffset()

METHODS

new()

Create a new annotation set. The name of the annotationset is not a property of the set, instead, each set is associated with a name when stored with a NLP::GATE::Document object using the setAnnotationSet() method.

add($annotation)

Add an annotation object to the annotation set.

getByIndex($n)

Return the annotation for index $n or signal an error.

get($type[,$featureset[,$matchtype]])

Return a new annotation set containing all the annotations from this set that match the given type, and if specified, all the feature/value pairs given in the $featureset hash map reference. If no annotations match, an empty annotation set will be returned.

The parameter $matchtype specifies how features are matched: "exact" will do an exact string comparison, "nocase" will compare after converting both strings to lower case using perl's lc function, and "regexp" will interpret the string given in the parameter as a regular expression. Default is "exact".

If some feature is specified in the featureset it MUST occur in the feature set of the annotation AND satisfy the testing matchtype method of testing for equality.

The annotations in the new set will be the same as in the original set, so changing the annotation objects will change them in both sets!

getByOffset(from,to,type,featureset,$featurematchtype,$rangematchtype)

Return all the annotations that span the given offset range, optionally filtering in addition by type and features. This method requires an offset range and in addition filters annotation as the get method does.

If from one of the parameters is undef, any value is allowed for the match to be successful.

The parameter $featurematchtype specifies how features are matched: "exact" will do an exact string comparison, "nocase" will compare after converting both strings to lower case using perl's lc function, and "regexp" will interpret the string given in the parameter as a regular expression. Default is "exact".

The $rangematchtype argument specifies how offsets will be compared, if they are specified (case does not matter): "COVER" - any annotation with a from less than or equal than $from and a to greater than or equal than $to: annotations that contain this range "EXACT" - any annotation with from and to offsets exactly as specified. This is the default: annotations that are co-extensive with this range "WITHIN" - any annotation that lies fully within the range "OVERLAP" - any annotation that overlaps with the given range

For example to find an annotation that fully contains the text from offset 12 to offset 17, use getByOffset(12,17,undef,undef,"cover").

getAsArrayRef()

Return an array reference whose elements are the Annotation objects in this set.

getAsArray()

Return an array whose elements are the Annotation objects in this set.

size()

Return the number of annotations in the set

getTypes()

Return an array of all different types in the set.

NOTE: this will currently go through all annotations in the set and collect the types. No caching of type names is done in this function or during creation of the set.

indexByOffsetFrom ()

Creates an index for the set that will speed up the retrieval of annotations by offset or offset interval. Unlike in GATE, this is not called automatically but must be explicitly requested before doing the retrieval.

If an index already exist it is discarded and a new index is built.

AUTHOR

Johann Petrak, <firstname.lastname-at-jpetrak-dot-com>

BUGS

Please report any bugs or feature requests to bug-gate-document at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=NLP::GATE. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc NLP::GATE

You can also look for information at: