The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Word2vec::Lesk - Word2vec-Interface Utility Module.

SYNOPSIS

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my $string_a = "This is a test string";
 my $string_b = "This is another test string";

 my $lesk_score   = $lesk->CalculateLeskScore( $string_a, $string_b );
 my $cosine_score = $lesk->CalculateCosineScore( $string_a, $string_b );
 my $f_score      = $lesk->CalcualteFScore( $string_a, $string_b );

 print( "Lesk Score: $lesk_score\n"     );
 print( "Cosine Score: $cosine_score\n" );
 print( "F Score: $f_score\n"           );

 undef( $lesk );

 or

 my $lesk = Word2vec::Lesk->new();

 my $string_a = "This is a test string";
 my $string_b = "This is another test string";

 my %results  = %{ $lesk->CalculateAllScores( $string_a, $string_b ) };

 for my $key ( sort keys %results )
 {
    print "$key: $results{ $key }\n";
 }

 undef( %results );
 undef( $lesk    );

DESCRIPTION

Word2vec::Lesk is a module of Lesk functions for the Word2vec::Interface package. Lesk, Raw Lesk, Cosine, F, Recall and Precision scores are all calculated and returned to the used based on phrase/feature overlap between two strings.

Main Functions

new

Description:

 Returns a new "Word2vec::Lesk" module object.

 Note: Specifying no parameters implies default options.

 Default Parameters:
    debugLog = 0
    writeLog = 0

Input:

 $debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
 $writeLog -> Instructs module to print debug statements to a log file.  (1 = True / 0 = False)

Output:

 Word2vec::Lesk object.

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 undef( $lesk );

DESTROY

Description:

 Removes Word2vec::Lesk object from memory.

Input:

 None

Output:

 None

Example:

 See above example for "new" function.

 Note: Destroy function is also automatically called during global destruction when exiting the program.

GetMatchingFeatures

Description:

 Given two strings, this returns a hash of all overlapping (matching) features between both strings and their frequency counts.

Input:

 $string_a -> First comparison string
 $string_b -> Second comparison string

Output:

 $hash_ref -> Returns a hash table reference with keys being the unique matching feature between two input string parameters and the value as the frequency count of each unique feature.

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my %matching_features = %{ $lesk->GetMatchingFeatures( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

 for my $feature ( sort keys %matching_features )
 {
    print "$feature : $matching_features{ $feature }\n";
 }

 undef( %matching_features );
 undef( $lesk );

GetPhraseOverlap

Description:

 Given two strings, this returns a hash of all overlapping (matching) phrases between both strings and their frequency counts. This prioritizes longer phrases as higher priority when matching.

Input:

 $string_a -> First comparison string
 $string_b -> Second comparison string

Output:

 $hash_ref -> Returns a hash table reference with keys being the unique matching phrase between two input string parameters and the value as the frequency count of each unique phrase.

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my %phrase_overlaps = %{ $lesk->GetPhraseOverlap( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

 for my $phrase ( sort keys %phrase_overlaps )
 {
    print "$phrase : $phrase_overlaps{ $phrase }\n";
 }

 undef( %phrase_overlaps );
 undef( $lesk );

CalculateLeskScore

Description:

 Given two strings, this returns a lesk score based on overlapping (matching) features between both strings.

Input:

 $string_a -> First comparison string
 $string_b -> Second comparison string

Output:

 $score    -> Lesk Score (Float)

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my $lesk_score = $lesk->CalculateLeskScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

 print "Lesk Score: $lesk_score\n";

 undef( $lesk );

CalculateCosineScore

Description:

 Given two strings, this returns a cosine score based on overlapping (matching) features between both strings.

Input:

 $string_a -> First comparison string
 $string_b -> Second comparison string

Output:

 $score    -> Cosine Score (Float)

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my $cosine_score = $lesk->CalculateCosineScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

 print "Cosine Score: $cosine_score\n";

 undef( $lesk );

CalculateFScore

Description:

 Given two strings, this returns a F score based on overlapping (matching) features between both strings.

Input:

 $string_a -> First comparison string
 $string_b -> Second comparison string

Output:

 $score    -> F Score (Float)

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my $f_score = $lesk->CalculateFScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

 print "F Score: $f_score\n";

 undef( $lesk );

CalculateAllScores

Description:

 Given two strings, this returns a list of scores (F, Cosine, Lesk, Raw Lesk, Precision, Recall), frequency counts (features, phrases, string lengths).

Input:

 $string_a    -> First comparison string
 $string_b    -> Second comparison string

Output:

 $result_hash -> Hash reference containing: Lesk, Raw Lesk, F, Precision, Recall, Cosine, Matching Feature Frequency, Matching Phrase Frequency, String A Length and String B Length.

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();

 my %scores = %{ $lesk->CalculateAllScores( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

 for my $score_name ( sort keys %scores )
 {
    print "$score_name : $scores{ $score_name }\n";
 }

 undef( $lesk );

Accessor Functions

GetDebugLog

Description:

 Returns the _debugLog member variable set during Word2vec::Lesk object initialization of new function.

Input:

 None

Output:

 $value -> '0' = False, '1' = True

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new()
 my $debugLog = $lesk->GetDebugLog();

 print( "Debug Logging Enabled\n" ) if $debugLog == 1;
 print( "Debug Logging Disabled\n" ) if $debugLog == 0;

 undef( $lesk );

GetWriteLog

Description:

 Returns the _writeLog member variable set during Word2vec::Lesk object initialization of new function.

Input:

 None

Output:

 $value -> '0' = False, '1' = True

Example:

 use Word2vec::Lesk;

 my $lesk = Word2vec::Lesk->new();
 my $writeLog = $lesk->GetWriteLog();

 print( "Write Logging Enabled\n" ) if $writeLog == 1;
 print( "Write Logging Disabled\n" ) if $writeLog == 0;

 undef( $lesk );

Debug Functions

WriteLog

Description:

 Prints passed string parameter to the console, log file or both depending on user options.

 Note: printNewLine parameter prints a new line character following the string if the parameter
 is undefined and does not if parameter is 0.

Input:

 $string -> String to print to the console/log file.
 $value  -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

 None

Example:

 use Word2vec::Lesk:

 my $lesk = Word2vec::Lesk->new();
 $lesk->WriteLog( "Hello World" );

 undef( $lesk );

Author

 Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

 Bridget T McInnes, Virginia Commonwealth University
 btmcinnes at vcu dot edu

 Clint Cuffy, Virginia Commonwealth University
 cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.