Ted Pedersen

NAME

CONFIG - [documentation] Description of all configuration options for measures

DESCRIPTION

The following is a list of options supported by the measures of semantic relatedness. This is intended to serve as a "master list" of options so that descriptions can be copied from here and pasted into the documentation for specific modules.

trace

This option is supported by all measures.

The value of this parameter specifies the level of tracing that should be employed for generating the traces. This value is an integer equal to 0, 1, or 2. If the value is omitted, then the default value, 0, is used. A value of 0 switches tracing off. A value of 1 or 2 switches tracing on. The difference between a value of 1 or 2 depends upon the measure being used.

For vector_pairs and lesk, a value of 1 displays as traces only the gloss overlaps found. A value of 2 displays as traces all the text being compared.

For the res, lin, jcn, wup, lch, path, and hso measures, a trace of level 1 means the synsets are represented as word#pos#sense strings, while for level 2, the synsets are represented as word#pos#offset strings.

cache

This option is supported by all measures.

The value of this parameter specifies whether or not caching of the relatedness values should be performed. This value is an integer equal to 0 or 1. If the value is omitted, then the default value, 1, is used. A value of 0 switches caching 'off', and a value of 1 switches caching 'on'.

maxCacheSize

This option is supported by all measures.

The value of this parameter indicates the size of the cache, used for storing the computed relatedness value. The specified value must be a non-negative integer. If the value is omitted, then the default value, 5,000, is used. Setting maxCacheSize to zero has the same effect as setting cache to zero, but setting cache to zero is likely to be more efficient. Caching and tracing at the same time can result in excessive memory usage because the trace strings are also cached. If you intend to perform a large number of relatedness queries, then you might want to turn tracing off.

rootNode

This option is supported by the res, lin, jcn, wup, path, and lch measures.

The value of this parameter indicates whether or not a unique root node should be used. In WordNet, there is no unique root node for the noun and verb taxonomies. If this parameter is set to 1 (or if the value is omitted), then certain measures (wup, path, lch, res, lin, and jcn) will "fake" a unique root node. If the value is set to 0, then no unique root node will be used. If the value is omitted, then the default value, 1, is used.

infocontent

This option is supported by the res, lin, and jcn measures.

The value for this parameter should be a string that specifies the path of an information content file containing the frequency of occurrence of every WordNet concept in a large corpus. A number of utility programs are included in this distribution that can be used to generate an infocontent file (see utils.pod). If no path is specified, then the default infocontent file is used, which was generated from SemCor using the sense-tags.

taxonomyDepthsFile

This option is supported only by the lch measure.

The value for this parameter should be a string that specifies the location of a taxonomy depths file (as generated by wnDepths.pl). If no path is specified, then the default file is used, which was generated when the Similarity package was installed.

synsetDepthsFile

This option is supported only by the wup measure.

The value for this parameter should be a string that specifies the location of a synset depths file (as generated by wnDepths.pl. If no path is specified, then the default file is used, which was generated when the Similarity package was installed.

relation

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter is the path to a file that contains a list of WordNet relations. The path may be either an absolute path or a relative path.

The vector_pairs module combines the glosses of synsets related to the target synsets by these relations and forms the gloss-vector from this combined gloss.

The lesk module combines glosses of synsets related to the target synsets by these relations and then searches for overlaps in these "super-glosses."

WARNING: the format of the relation file is different for the vector_pairs and lesk measures. The documentation for lesk and vector_pairs describe the respective formats for the relation files. See WordNet::Similarity::vector_pairs(3pm) and WordNet::Similarity::lesk(3pm).

stop

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter the path of a file containing a list of stop words that should be ignored in the glosses. The path may be either an absolute path or a relative path.

stem

This option is supported only by the lesk and vector_pairs measures.

The value of this parameter indicates whether or not stemming should be performed. The value must be an integer equal to 0 or 1. If the value is omitted, then the default value, 0, is used. A value of 1 switches 'on' stemming, and a value of 0 switches stemming 'off'. When stemming is enabled, all the words of the glosses are stemmed before their vectors are created for the vector measure or their overlaps are compared for the lesk measure.

normalize

This option is supported only by the lesk measure.

The value of this parameter indicates whether or not normalization of scores is performed. The value must be an integer equal to 0 or 1. If the value is omitted, then the default value, 0, is assumed. A value of 1 switches 'on' normalizing of the score, and a value of 0 switches normalizing 'off'. When normalizing is enabled, the score obtained by counting the gloss overlaps is normalized by the size of the glosses. The details are described in Banerjee and Pedersen (2002).

vectordb

This option is supported only by the vector_pairs measure.

The value of this parameter is the path to a Vectors file containing word vectors, i.e. co-occurrence vectors for all the words in the WordNet glosses. The value of this parameter may not be omitted, and the vector_pairs measure will not run without a DB file being specified in a configuration file.

maxrand

This option is supported only by the random measure.

The value of this option is the maximum random number that will be generated. The value of this option must be a positive floating-point number. The default value is 1.0. All random numbers generated will be in the range [0, maxrand).

SEE ALSO

intro.pod

Mailing list: http://groups.yahoo.com/group/wn-similarity

Project Home page: http://wn-similarity.sourceforge.net

AUTHORS

 Ted Pedersen, University of Minnesota Duluth
 tpederse at d.umn.edu

 Siddharth Patwardhan, University of Utah, Salt Lake City
 sidd at cs.utah.edu

 Satanjeev Banerjee, Carnegie Mellon University, Pittsburgh
 banerjee+ at cs.cmu.edu

 Jason Michelizzi

COPYRIGHT

Copyright (c) 2005-2008, Ted Pedersen, Siddharth Patwardhan, Satanjeev Banerjee, and Jason Michelizzi

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Note: a copy of the GNU Free Documentation License is available on the web at http://www.gnu.org/copyleft/fdl.html and is included in this distribution as FDL.txt.




Hosting generously
sponsored by Bytemark