The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WordNet::Extend::Locate - Perl modules for locating where in WordNet a lemma should be inserted.

SYNOPSIS

Basic Usage Example

use WordNet::Extend::Locate;

 my $locate = WordNet::Extend::Locate->new();

 $locate->stopList('(the|is|at)');

 $locate->setCleanUp(1);

 $locate->preProcessing();

 $locate->toggleCompareGlosses(1,1,0);

 $locate->setBonus(25);

 $locate->toggleRefineSense(0);

 print "Finding location for 'dog noun withdef.1 man's best friend'\n"; 

 @location = @{$locate->locate("dog\tnoun\twithdef.1\tman\'s best friend")};

 print "Location found: @location\n";

DESCRIPTION

Introduction

WordNet is a widely used tool in NLP and other research areas. A drawback of WordNet is the amount of time between updates. WordNet was last updated and released in December, 2006, and no further updates are planned. WordNet::Extend::Locate aims to help users decide where a good place to insert new lemmas into WordNet is by presenting several different methods to run. Users can then take the suggestion from Locate and use that with WordNet::Extend::Insert or simply use it as a guiding point and choose their own location.

Methods

The following methods are defined in this package:

Public methods

$obj->new()

The constructor for WordNet::Extend::Locate objects.

Parameters: none.

Return value: the new blessed object

$obj->getError()

Allows the object to check if any errors have occurred. Returns an array ($error, $errorString), where $error value equal to 1 represents a warning and 2 represents an error and $errString contains the possible error. For example, if a user forgets to run preProcessing() before a method that relies on it, the error would be 2 and errorString would mention that preProcessing had not been run.

Parameter: None

Returns: array of the form ($error, $errorString).

$obj->locateFile($input_file, $output_file)

Attempts to locate best WordNet position for each word from input file into WordNet, outputs results to output file.

Parameter: location of input file and output file respectively

Returns: nothing

$obj->locate($wordPosGloss)

Takes in single lemma with gloss and returns location of best insertion point in WordNet.

Parameter: Lemma string in format of 'word\tpos\titem-id\tdef' NOTE: String must only be separated by \t no space.

Returns: Array in format of (item-id, WordNet sense, operation)

$obj->stopList($newStopList)

Takes in new stop list, in regex form

Parameter:the new stop list in regex substitution form (w1|w2|...|wn)

Returns: nothing

$obj->setCleanUp($switch)

Allows the user to toggle whether or not glosses should be cleaned up.

Parameter: 0 or 1 to turn clean up off or on respectively

Returns: nothing

$obj->addCleanUp($cleanUp)

Allows the user to add their own regex for cleaning up the glosses.

Parameter: Regex representing the cleanup the user wants performed.

Returns: Nothing

$obj->preProcessing()

Highly increases speed of program by making as many outside calls as possible and storing outside info to be used later.

Parameter: none

Returns: nothing

$obj->processLemma(@inLemma)

Determines where the OOV Lemma should be inserted into WordNet, returns the output.

Parameter: the lemma to be inserted in array form (lemma, part-of-speech, item-id, definition, def source)

Returns: chosen lemma in array form (item-id, WordNet sense, operation)

$obj->toggleCompareGlosses($hype,$hypo,$syns)

Toggles which glosses are used in score sense. by default, the sense, the sense's hypernyms' glosses,hyponyms' glosses, and synsets' glosses are turned on. This method allows for toggling of hypes,hypos,synsets, by passing in three parameters, 1 for on and 0 for off. Example: toggleCompareGlosses(0,0,0) toggles all three off.

Parameters: 0 or 1 for toggling hypernyms, hyponyms, and synset comparisons.

Returns: nothing

$obj->setBonus($bonus)

Allows the user to set the bonus that will be used when scoring lemmas that contain the new lemma.

Parameter: the multiplier that should be used in calculating the bonus.

Returns: nothing

sub setBonus() { my $base = 0; if(scalar @_ == 2)#checks if method entered by object. { $base = 1; }

    $bonus = $_[$base];
}
$obj->scoreSense(@inLemma, $compareSense)

Serves as a wrapper method to facilitate the main program by directing it to the currently chosen scoring method. By default the average highest scoring method is chosen. This can be changed with setScoreMethod().

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the in lemma is to the compareSense.

$obj->setScoreMethod($scoreMethod)

Allows the user to choose which scoring method should be used by default when running the program from the top. Options are: 'baseline' 'BwS' - baseline system with stemming and lemmitization --as more are added they will appear here.

Parameter: the chosen scoring method

Returns: nothing.

$obj->Similarity(@inLemma, $compareSense)

Calculates a score for the passed sense and returns that score.

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the im lemma is to the compareSense.

$obj->BwS(@inLemma, $compareSense)

Calculates a score for the passed sense and returns that score. This is a modified baseline() method which adds stemming to the data.

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the in lemma is to the compareSense.

$obj->baseline(@inLemma, $compareSense)

Calculates a score for the passed sense then returns that score. This class is a wrapper for the simpleScoreSense() method as it makes sure no stemming or lemmatization is present in the preProcessing().

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the in lemma is to the compareSense.

$obj->word2VecCompare(@inLemma)

Calculates a score for the passed sense by using the gensim Word2Vec model trained on Google news vectors.

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the in lemma is to the compareSense.

$obj->setConfidenceValue()

Allows the user to set the confidence value for word2vecCompare(). The confidence value is the cutoff for the similarity score. If the similarity score is below the confidence value it will be dropped. This aims to increase accuracy but will reduce recall.

Parameters: the new confidence value, default is set to 0

Returns: Nothing

$obj->simpleScoreSense(@inLemma, $compareSense)

Calculates a score for the passed sense then returns that score. This is the baseline system which was submitted for SemEval16 task 14. This algorithm scores by overlapping words found in the lemma's gloss and also with the lemma's hypernym and hyponyms' glosses.

Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.

Returns: a score of how related the in lemma is to the compareSense.

$obj->getExtendedGloss($compareSense)

Calculates the extended gloss based on which glosses are toggled and returns an array

which contains the full glosses.

Parameter: the sense which the extended gloss is based on

Returns: an array which contains the extended gloss

$obj->toggleRefineSense($toggle)

Allows user to toggle refineSense() on/off.

Parameter: 0 or 1 to toggle the refine sense method on or off respectively in the processLemma method.

Returns: nothing

$obj->refineSense(@inLemma, $highSense)

Refines chosen sense, by determing which numbered sense should be chosen.

Parameters: the in lemma in form of (lemma, part-of-speech, item-id, definition, def source) and the sense which currently bests matches the inlemma.

Returns:the new highest scoring sense

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 127:

You forgot a '=back' before '=head2'

Around line 133:

=over without closing =back

Around line 648:

Unknown directive: =ctu