Word2vec::Word2phrase - word2vec's word2phrase wrapper module.
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->SetMinCount( 12 ); $w2p->SetMaxCount( 20 ); $w2p->SetTrainFilePath( "textCorpus.txt" ); $w2p->SetOutputFilePath( "phraseTextCorpus.txt" ); $w2p->ExecuteTraining(); undef( $w2p ); # or my $w2p = Word2vec::Word2phrase->new(); $w2p->ExecuteTraining( $trainFilePath, $outputFilePath, $minCount, $threshold, $debug, $overwrite ); undef( $w2p );
Word2vec::Word2phrase is a word2vec package tool that "compoundifies" bi-grams in a text corpus based on a minimum and maximum frequency.
Description:
Returns a new 'Word2vec::Word2phrase' module object. Note: Specifying no parameters implies default options. Default Parameters: debugLog = 0 writeLog = 0 trainFilePath = "" outputFilePath = "" minCount = 5 threshold = 100 setW2PDebug = 2 workingDir = Current Directory word2PhraseExeDir = Word2Phrase Executable Directory overwriteOldFile = 0
Input:
$debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False) $writeLog -> Instructs module to print debug statements to a log file. (1 = True / 0 = False) $trainFilePath -> Specifies the training text corpus for word2phrase training. (String) $outputFilePath -> Specifies the output path for post word2phrase training. (String) $minCount -> Specifies the minimum range value for bi-gram 'compoundification'. (Positive Integer) $threshold -> Specifies the maximum range value for bi-gram 'compoundification'. (Positive Integer) $setW2PDebug -> Specifies the word2phrase debug information parameter value to show during training. (Integer) $workingDir -> Specifies the current working directory. (String) $word2PhraseExeDir -> Specifies word2phrase executable directory. (String) $overwriteOldFile -> Instructs the module to either overwrite any existing data with the same output file name and path. ( '1' or '0' ) Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.
Output:
Word2vec::Word2phrase object.
Example:
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); undef( $w2p );
Removes member variables and file handle from memory.
None
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->DESTROY(); undef( $w2p );
Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables. Any parameter specified will override its respective member variable. Note: If no parameters are specified, this module executes word2phrase training based on preset member variables. Returns string regarding training status.
$trainFilePath -> Training text corpus file path $outputFilePath -> Vector binary file path $minCount -> Minimum bi-gram frequency (Positive Integer) $threshold -> Maximum bi-gram frequency (Positive Integer) $debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information) $overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
$value -> '0' = Successful / '-1' = Un-successful
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->SetMinCount( 12 ); $w2p->SetMaxCount( 20 ); $w2p->SetTrainFilePath( "textCorpus.txt" ); $w2p->SetOutputFilePath( "phraseTextCorpus.txt" ); $w2p->ExecuteTraining(); undef( $w2p ); # Or use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->ExecuteTraining( "textCorpus.txt", "phraseTextCorpus.txt", 12, 20, 2, 1 ); undef( $w2p );
$trainingString -> String to train $outputFilePath -> Vector binary file path $minCount -> Minimum bi-gram frequency (Positive Integer) $threshold -> Maximum bi-gram frequency (Positive Integer) $debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information) $overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->SetMinCount( 12 ); $w2p->SetMaxCount( 20 ); $w2p->SetTrainFilePath( "large string to train here" ); $w2p->SetOutputFilePath( "phraseTextCorpus.txt" ); $w2p->ExecuteTraining(); undef( $w2p ); # Or use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->ExecuteTraining( "large string to train here", "phraseTextCorpus.txt", 12, 20, 2, 1 ); undef( $w2p );
Returns the operating system type string.
$string -> Operating system string.
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $operatingSystem = $w2p->GetOSType(); print( "Operating System: $operatingSystem\n" ) if defined( $operatingSystem ); undef( $w2p );
Returns the _debugLog member variable set during Word2vec::Word2phrase object initialization of new function.
$value -> 0 = False, 1 = True
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $debugLog = $w2p->GetDebugLog(); print( "Debug Logging Enabled\n" ) if $debugLog == 1; print( "Debug Logging Disabled\n" ) if $debugLog == 0; undef( $w2p );
Returns the _writeLog member variable set during Word2vec::Word2phrase object initialization of new function.
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $writeLog = $w2p->GetWriteLog(); print( "Write Logging Enabled\n" ) if $writeLog == 1; print( "Write Logging Disabled\n" ) if $writeLog == 0; undef( $w2p );
Returns file handle used by WriteLog() method.
$fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.
<This should not be called.>
Returns (string) training file path.
$string -> word2phrase training file path
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $filePath = $w2p->GetTrainFilePath(); print( "Output File Path: $filePath\n" ) if defined( $filePath ); undef( $w2p );
Returns (string) output file path.
$string -> word2phrase output file path
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $filePath = $w2p->GetOutputFilePath(); print( "Output File Path: $filePath\n" ) if defined( $filePath ); undef( $w2p );
Returns (integer) minimum bi-gram range.
$value -> Minimum bi-gram frequency (Positive Integer)
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $mincount = $w2p->GetMinCount(); print( "MinCount: $mincount\n" ) if defined( $mincount ); undef( $w2p );
Returns (integer) maximum bi-gram range.
$value -> Maximum bi-gram frequency (Positive Integer)
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $mincount = $w2p->GetThreshold(); print( "MinCount: $mincount\n" ) if defined( $mincount ); undef( $w2p );
Returns word2phrase debug parameter value.
$value -> 0 = No debugging, 1 = Show debugging, 2 = Show even more debugging
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $w2pdebug = $w2p->GetW2PDebug(); print( "Word2Phrase Debug Level: $w2pdebug\n" ) if defined( $w2pdebug ); undef( $w2p );
Returns (string) working directory path.
$string -> Current working directory path
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $workingDir = $w2p->GetWorkingDir(); print( "Working Directory: $workingDir\n" ) if defined( $workingDir ); undef( $w2p );
Returns (string) word2phrase executable directory path.
$string -> Word2Phrase executable directory path
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $workingDir = $w2p->GetWord2PhraseExeDir(); print( "Word2Phrase Executable Directory: $workingDir\n" ) if defined( $workingDir ); undef( $w2p );
Returns the current value of the overwrite training file variable.
$value -> 1 = True/Overwrite or 0 = False/Append to current file
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); my $overwrite = $w2p->GetOverwriteOldFile(); if defined( $overwrite ) { print( "Overwrite Old File: " ); print( "Yes\n" ) if $overwrite == 1; print( "No\n" ) if $overwrite == 0; } undef( $w2p );
Sets training file path.
$string -> Training file path
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->SetTrainFilePath( "filePath" ); undef( $w2p );
Sets word2phrase output file path.
use Word2vec::Word2phrase; my $w2p = Word2vec::Word2phrase->new(); $w2p->SetOutputFilePath( "filePath" ); undef( $w2p );
Sets minimum range value.
$value -> Minimum frequency value (Positive integer)
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetMinCount( 1 ); undef( $w2p );
Sets maximum range value.
$value -> Maximum frequency value (Positive integer)
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetThreshold( 100 ); undef( $w2p );
Sets word2phrase debug parameter.
$value -> word2phrase debug parameter (0 = No debug info, 1 = Show debug info, 2 = Show more debug info.)
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetW2PDebug( 2 ); undef( $w2p );
Sets working directory path.
$string -> Current working directory path.
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetWorkingDir( "filePath" ); undef( $w2p );
Sets word2phrase executable file directory path.
$string -> Word2Phrase executable directory path.
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetWord2PhraseExeDir( "filePath" ); undef( $w2p );
Enables overwriting word2phrase output file if one already exists with the same output file name.
$value -> Integer: 1 = Overwrite old file, 0 = No not overwrite old file.
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->SetOverwriteOldFile( 1 ); undef( $w2p );
Returns current time string in "Hour:Minute:Second" format.
$string -> XX:XX:XX ("Hour:Minute:Second")
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); my $time = $w2p->GetTime(); print( "Current Time: $time\n" ) if defined( $time ); undef( $w2p );
Returns current month, day and year string in "Month/Day/Year" format.
$string -> XX/XX/XXXX ("Month/Day/Year")
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); my $date = $w2p->GetDate(); print( "Current Date: $date\n" ) if defined( $date ); undef( $w2p );
Prints passed string parameter to the console, log file or both depending on user options. Note: printNewLine parameter prints a new line character following the string if the parameter is undefined and does not if parameter is 0.
$string -> String to print to the console/log file. $value -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.
use Word2vec::Word2phrase: my $w2p = Word2vec::Word2phrase->new(); $w2p->WriteLog( "Hello World" ); undef( $w2p );
Clint Cuffy, Virginia Commonwealth University
Copyright (c) 2016
Bridget T McInnes, Virginia Commonwealth University btmcinnes at vcu dot edu Clint Cuffy, Virginia Commonwealth University cuffyca at vcu dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
To install Word2vec::Interface, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Word2vec::Interface
CPAN shell
perl -MCPAN -e shell install Word2vec::Interface
For more information on module installation, please visit the detailed CPAN module installation guide.