The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Word2vec::Word2phrase - word2vec's word2phrase wrapper module.

SYNOPSIS

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetMinCount( 12 );
 $w2p->SetMaxCount( 20 );
 $w2p->SetTrainFilePath( "textCorpus.txt" );
 $w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
 $w2p->ExecuteTraining();
 undef( $w2p );

 # or

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->ExecuteTraining( $trainFilePath, $outputFilePath, $minCount, $threshold, $debug, $overwrite );
 undef( $w2p );

DESCRIPTION

Word2vec::Word2phrase is a word2vec package tool that "compoundifies" bi-grams in a text corpus based on a minimum and maximum frequency.

Main Functions

new

Description:

 Returns a new 'Word2vec::Word2phrase' module object.

 Note: Specifying no parameters implies default options.

 Default Parameters:
    debugLog                    = 0
    writeLog                    = 0
    trainFilePath               = ""
    outputFilePath              = ""
    minCount                    = 5
    threshold                   = 100
    setW2PDebug                 = 2
    workingDir                  = Current Directory
    word2PhraseExeDir           = Word2Phrase Executable Directory
    overwriteOldFile            = 0

Input:

 $debugLog                    -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
 $writeLog                    -> Instructs module to print debug statements to a log file. (1 = True / 0 = False)
 $trainFilePath               -> Specifies the training text corpus for word2phrase training. (String)
 $outputFilePath              -> Specifies the output path for post word2phrase training. (String)
 $minCount                    -> Specifies the minimum range value for bi-gram 'compoundification'. (Positive Integer)
 $threshold                   -> Specifies the maximum range value for bi-gram 'compoundification'. (Positive Integer)
 $setW2PDebug                 -> Specifies the word2phrase debug information parameter value to show during training. (Integer)
 $workingDir                  -> Specifies the current working directory. (String)
 $word2PhraseExeDir           -> Specifies word2phrase executable directory. (String)
 $overwriteOldFile            -> Instructs the module to either overwrite any existing data with the same output file name and path. ( '1' or '0' )

 Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.

Output:

 Word2vec::Word2phrase object.

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();

 undef( $w2p );

DESTROY

Description:

 Removes member variables and file handle from memory.

Input:

 None

Output:

 None

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();

 $w2p->DESTROY();
 undef( $w2p );

ExecuteTraining

Description:

 Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
 Any parameter specified will override its respective member variable.

 Note: If no parameters are specified, this module executes word2phrase training based on preset member
 variables. Returns string regarding training status.

Input:

 $trainFilePath  -> Training text corpus file path
 $outputFilePath -> Vector binary file path
 $minCount       -> Minimum bi-gram frequency (Positive Integer)
 $threshold      -> Maximum bi-gram frequency (Positive Integer)
 $debug          -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
 $overwrite      -> Overwrites old training file when executing training. (0 = False / 1 = True)

Output:

 $value          -> '0' = Successful / '-1' = Un-successful

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetMinCount( 12 );
 $w2p->SetMaxCount( 20 );
 $w2p->SetTrainFilePath( "textCorpus.txt" );
 $w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
 $w2p->ExecuteTraining();
 undef( $w2p );

 # Or

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->ExecuteTraining( "textCorpus.txt", "phraseTextCorpus.txt", 12, 20, 2, 1 );
 undef( $w2p );

ExecuteStringTraining

Description:

 Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
 Any parameter specified will override its respective member variable.

 Note: If no parameters are specified, this module executes word2phrase training based on preset member
 variables. Returns string regarding training status.

Input:

 $trainingString -> String to train
 $outputFilePath -> Vector binary file path
 $minCount       -> Minimum bi-gram frequency (Positive Integer)
 $threshold      -> Maximum bi-gram frequency (Positive Integer)
 $debug          -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
 $overwrite      -> Overwrites old training file when executing training. (0 = False / 1 = True)

Output:

 $value          -> '0' = Successful / '-1' = Un-successful

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetMinCount( 12 );
 $w2p->SetMaxCount( 20 );
 $w2p->SetTrainFilePath( "large string to train here" );
 $w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
 $w2p->ExecuteTraining();
 undef( $w2p );

 # Or

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->ExecuteTraining( "large string to train here", "phraseTextCorpus.txt", 12, 20, 2, 1 );
 undef( $w2p );

GetOSType

Description:

 Returns the operating system type string.

Input:

 None

Output:

 $string -> Operating system string.

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $operatingSystem = $w2p->GetOSType();
 print( "Operating System: $operatingSystem\n" ) if defined( $operatingSystem );
 undef( $w2p );

Accessor Functions

GetDebugLog

Description:

 Returns the _debugLog member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

 None

Output:

 $value -> 0 = False, 1 = True

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $debugLog = $w2p->GetDebugLog();

 print( "Debug Logging Enabled\n" ) if $debugLog == 1;
 print( "Debug Logging Disabled\n" ) if $debugLog == 0;

 undef( $w2p );

GetWriteLog

Description:

 Returns the _writeLog member variable set during Word2vec::Word2phrase object initialization of new function.

Input:

 None

Output:

 $value -> 0 = False, 1 = True

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $writeLog = $w2p->GetWriteLog();

 print( "Write Logging Enabled\n" ) if $writeLog == 1;
 print( "Write Logging Disabled\n" ) if $writeLog == 0;

 undef( $w2p );

GetFileHandle

Description:

 Returns file handle used by WriteLog() method.

Input:

 None

Output:

 $fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.

Example:

 <This should not be called.>

GetTrainFilePath

Description:

 Returns (string) training file path.

Input:

 None

Output:

 $string -> word2phrase training file path

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $filePath = $w2p->GetTrainFilePath();

 print( "Output File Path: $filePath\n" ) if defined( $filePath );
 undef( $w2p );

GetOutputFilePath

Description:

 Returns (string) output file path.

Input:

 None

Output:

 $string -> word2phrase output file path

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $filePath = $w2p->GetOutputFilePath();

 print( "Output File Path: $filePath\n" ) if defined( $filePath );
 undef( $w2p );

GetMinCount

Description:

 Returns (integer) minimum bi-gram range.

Input:

 None

Output:

 $value ->  Minimum bi-gram frequency (Positive Integer)

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $mincount = $w2p->GetMinCount();

 print( "MinCount: $mincount\n" ) if defined( $mincount );
 undef( $w2p );

GetThreshold

Description:

 Returns (integer) maximum bi-gram range.

Input:

 None

Output:

 $value ->  Maximum bi-gram frequency (Positive Integer)

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $mincount = $w2p->GetThreshold();

 print( "MinCount: $mincount\n" ) if defined( $mincount );
 undef( $w2p );

GetW2PDebug

Description:

 Returns word2phrase debug parameter value.

Input:

 None

Output:

 $value -> 0 = No debugging, 1 = Show debugging, 2 = Show even more debugging

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $w2pdebug = $w2p->GetW2PDebug();

 print( "Word2Phrase Debug Level: $w2pdebug\n" ) if defined( $w2pdebug );

 undef( $w2p );

GetWorkingDir

Description:

 Returns (string) working directory path.

Input:

 None

Output:

 $string -> Current working directory path

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $workingDir = $w2p->GetWorkingDir();

 print( "Working Directory: $workingDir\n" ) if defined( $workingDir );

 undef( $w2p );

GetWord2PhraseExeDir

Description:

 Returns (string) word2phrase executable directory path.

Input:

 None

Output:

 $string -> Word2Phrase executable directory path

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $workingDir = $w2p->GetWord2PhraseExeDir();

 print( "Word2Phrase Executable Directory: $workingDir\n" ) if defined( $workingDir );

 undef( $w2p );

GetOverwriteOldFile

Description:

 Returns the current value of the overwrite training file variable.

Input:

 None

Output:

 $value -> 1 = True/Overwrite or 0 = False/Append to current file

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 my $overwrite = $w2p->GetOverwriteOldFile();

 if defined( $overwrite )
 {
    print( "Overwrite Old File: " );
    print( "Yes\n" ) if $overwrite == 1;
    print( "No\n" ) if $overwrite == 0;
 }

 undef( $w2p );

Mutator Functions

SetTrainFilePath

Description:

 Sets training file path.

Input:

 $string -> Training file path

Output:

 None

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetTrainFilePath( "filePath" );

 undef( $w2p );

SetOutputFilePath

Description:

 Sets word2phrase output file path.

Input:

 $string -> word2phrase output file path

Output:

 None

Example:

 use Word2vec::Word2phrase;

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetOutputFilePath( "filePath" );

 undef( $w2p );

SetMinCount

Description:

 Sets minimum range value.

Input:

 $value -> Minimum frequency value (Positive integer)

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetMinCount( 1 );

 undef( $w2p );

SetThreshold

Description:

 Sets maximum range value.

Input:

 $value -> Maximum frequency value (Positive integer)

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetThreshold( 100 );

 undef( $w2p );

SetW2PDebug

Description:

 Sets word2phrase debug parameter.

Input:

 $value -> word2phrase debug parameter (0 = No debug info, 1 = Show debug info, 2 = Show more debug info.)

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetW2PDebug( 2 );

 undef( $w2p );

SetWorkingDir

Description:

 Sets working directory path.

Input:

 $string -> Current working directory path.

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetWorkingDir( "filePath" );

 undef( $w2p );

SetWord2PhraseExeDir

Description:

 Sets word2phrase executable file directory path.

Input:

 $string -> Word2Phrase executable directory path.

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetWord2PhraseExeDir( "filePath" );

 undef( $w2p );

SetOverwriteOldFile

Description:

 Enables overwriting word2phrase output file if one already exists with the same output file name.

Input:

 $value -> Integer: 1 = Overwrite old file, 0 = No not overwrite old file.

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->SetOverwriteOldFile( 1 );

 undef( $w2p );

Debug Functions

GetTime

Description:

 Returns current time string in "Hour:Minute:Second" format.

Input:

 None

Output:

 $string -> XX:XX:XX ("Hour:Minute:Second")

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 my $time = $w2p->GetTime();

 print( "Current Time: $time\n" ) if defined( $time );

 undef( $w2p );

GetDate

Description:

 Returns current month, day and year string in "Month/Day/Year" format.

Input:

 None

Output:

 $string -> XX/XX/XXXX ("Month/Day/Year")

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 my $date = $w2p->GetDate();

 print( "Current Date: $date\n" ) if defined( $date );

 undef( $w2p );

WriteLog

Description:

 Prints passed string parameter to the console, log file or both depending on user options.

 Note: printNewLine parameter prints a new line character following the string if the parameter
 is undefined and does not if parameter is 0.

Input:

 $string -> String to print to the console/log file.
 $value  -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

 None

Example:

 use Word2vec::Word2phrase:

 my $w2p = Word2vec::Word2phrase->new();
 $w2p->WriteLog( "Hello World" );

 undef( $w2p );

Author

 Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

 Bridget T McInnes, Virginia Commonwealth University
 btmcinnes at vcu dot edu

 Clint Cuffy, Virginia Commonwealth University
 cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.