The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

semcor-reformat.pl - Reformat SemCor sense tagged files for use by wsd.pl

SYNOPSIS

 semcor-reformat.pl {--semcor DIR | --file FILE [FILE ...]} [--key] 

EXAMPLE

 semcor-reformat.pl --semcor ~/semcor2.0

DESCRIPTION

This script reads a SemCor-formatted file and produces formatted text that can be used as input to wsd.pl. Alternatively, if the --key option is specified, the output will also include the sense number for each work, and this output can be used as a key file.

There are a few sources of data that are SemCor formatted, including SemCor itself and the Senseval-2 and Senseval-3 all words data sets. They have been made available for download by Rada Mihalcea:

http://www.cs.unt.edu/~rada/downloads.html

Only the words that are assigned valid sense numbers will be passed through this program. All other words are discarded. This means that only open-class words that appear in WordNet will be passed through. Closed class words (pronouns, conjuctions, etc.) and other words not appearing in WordNet are discarded.

head1 OPTIONS

--semcor=DIRECTORY

The location of the SemCor directory. This directory will contain several sub-directories, including 'brown1' and 'brown2'. Do not specify these sub-directories. Only specify the directory name that contains them. For example, if /home/user/semcor2.0 contains the brown1 and brown2 directories, you would only specify /home/user/semcor2.0 as the value of this option. Do not use this option at the same time as the --file option.

--file=FILE

A semcor-formatted file to process. This can be used instead of the previous option to only specify a few Semcor files or to specify Senseval files. When this option is used, multiple files can be specified on the command line. For example

 semcor-reformat.pl --file br-a01 br-a02 br-k18 br-m02 br-r05

Do not attempt to use this option when using the previous option.

--key

Generates a key file for use by the allwords-scorer2.pl program instead of a file that can be used for wsd.pl. The allwords-scorer2.pl program can be used to measure the performance of a word sense disambiguation program. See the documentation for scorer2-format.pl and allwords-scorer2.pl for more information.

AUTHORS

 Jason Michelizzi

 Varada Kolhatkar, University of Minnesota, Duluth
 kolha002 at d.umn.edu

 Ted Pedersen, University of Minnesota, Duluth
 tpederse at d.umn.edu

This document last modified by : $Id: semcor-reformat.pl,v 1.17 2009/05/22 19:16:38 kvarada Exp $

SEE ALSO

 L<wsd-experiments.pl> L<scorer2-format.pl> L<scorer2-sort.pl> L<allwords-scorer2.pl>

COPYRIGHT AND LICENSE

Copyright (C) 2005-2008 by Jason Michelizzi and Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.