NAME

Algorithm::LDA

SYNOPSIS

 use Algorithm::LDA;
 
 my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");

DESCRIPTION

Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm

add

description:

 Used to add to array of documents ($self->documents)

input:

 %args <- hash containing data

output:

example:

 while (my $line = <$fh2>) {
    my $obj = decode_json($line);
    add(%$obj);
 }

init

description:

 Initializes alpha, initializes beta, loads documents, starts main loop

input:

 None

output:

example:

 init();

printResults

description:

 Prints words in each topic, topics in each document, phi values, 
 and theta values to text files in the 'Results/$data' directory

input:

 None

output:

 None

example:

 printResults();

load

description:

 Loads documents from text files (in "data/$data") or JSON file (in "Documents")

input:

 None

output:

 None

example:

 load();

wordsPerTopic

  description:
    
 Creates an array of words in each topic

input:

 %args -> hash containing topic

output:

 @words -> Array containing words and probabilities (phi value) for $args{topic}

example:

 my $words_on_topic = wordsPerTopic(topic => $topic);

topicsPerDocument

description:

 Creates an array of topics in each document

input:

 %args -> hash containing document

output:

 @topics -> Array containing topics and probabilities (theta value) for $args{document}

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

sample_topic

description:

 Uses Gibbs Sampling to determine a topic given a document and word

input:

 $document -> ID of document word is in
 $word -> word that is to be evaluated

output:

 $topic -> topic ID
 $k -> last topic if topic can't be found

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

computePhi

description:

 Computes the expected phi value for a word given a topic ID

input:

 $topic -> ID of topic (iteration 0..$k)
 $word -> word that is to be evaluated

output:

 Phi value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

computeTheta

description:

 Computes the expected theta value for a topic given a document ID

input:

 $document -> ID of document
 $topic -> ID of topic (iteration 0..$k)

output:

 Theta value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

increaseMap

description:

 Increases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->increaseMap($data->{document}, $data->{topic}, $data->{word});

decreaseMap

description:

 Decreases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->decreaseMap($data->{document}, $data->{topic}, $data->{word});

valid

description:

 Returns whether or not $data is a valid array (able to be added to the dataset)

input:

 $data -> data to be evaluated

output:

 Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;

example:

 return unless (valid($args{data}));

removeSpecialChars

description:

 Removes special characters from a word (non-ascii/non-letter characters)

input:

 $word -> word to be cleaned

output:

 $newWord -> $word without non-ascii/non-letter characters

example:

 @ws = map { removeSpecialChars($_) } @ws;

beta

description:

 Randomly initializes beta values

input:

 None

output:

 None

example:

 beta();

stop

description:

 Stopword subroutine.  Generates a regex to remove words in a stopword list

input:

 None

output:

 $stop_regex -> regex containing stopwords

example:

 my $stop = stop();
 my $regex = qr/($stop)/;

REFERENCING

    If you have a reference paper for this module put it here in bibtex form

CONTACT US

  If you have any trouble installing and using <module name> 
  please contact us via :

      Bridget T. McInnes: btmcinnes at vcu.edu

AUTHORS

  Nick Jordan, Virginia Commonwealth University 

  Bridget McInnes, Virginia Commonwealth University

COPYRIGHT AND LICENSE

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.

To install Algorithm::LDA, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::LDA

CPAN shell

perl -MCPAN -e shell
install Algorithm::LDA

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

add

init

printResults

load

wordsPerTopic

topicsPerDocument

sample_topic

computePhi

computeTheta

increaseMap

decreaseMap

valid

removeSpecialChars

beta

stop

REFERENCING

CONTACT US

SEE ALSO

AUTHORS

COPYRIGHT AND LICENSE

Module Install Instructions