The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Algorithm::LDA

SYNOPSIS

 use Algorithm::LDA;
 
 my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");
 

DESCRIPTION

Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm

add

description:

 Used to add to array of documents ($self->documents)

input:

 %args <- hash containing data

output:

 1

example:

 while (my $line = <$fh2>) {
    my $obj = decode_json($line);
    add(%$obj);
 }

init

description:

 Initializes alpha, initializes beta, loads documents, starts main loop

input:

 None

output:

 1

example:

 init();

printResults

description:

 Prints words in each topic, topics in each document, phi values, 
 and theta values to text files in the 'Results/$data' directory

input:

 None

output:

 None

example:

 printResults();

load

description:

 Loads documents from text files (in "data/$data") or JSON file (in "Documents")

input:

 None

output:

 None

example:

 load();

wordsPerTopic

  description:
    
 Creates an array of words in each topic

input:

 %args -> hash containing topic

output:

 @words -> Array containing words and probabilities (phi value) for $args{topic}

example:

 my $words_on_topic = wordsPerTopic(topic => $topic);

topicsPerDocument

description:

 Creates an array of topics in each document

input:

 %args -> hash containing document

output:

 @topics -> Array containing topics and probabilities (theta value) for $args{document}

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

sample_topic

description:

 Uses Gibbs Sampling to determine a topic given a document and word

input:

 $document -> ID of document word is in
 $word -> word that is to be evaluated

output:

 $topic -> topic ID
 $k -> last topic if topic can't be found

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

computePhi

description:

 Computes the expected phi value for a word given a topic ID

input:

 $topic -> ID of topic (iteration 0..$k)
 $word -> word that is to be evaluated

output:

 Phi value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

computeTheta

description:

 Computes the expected theta value for a topic given a document ID

input:

 $document -> ID of document
 $topic -> ID of topic (iteration 0..$k)

output:

 Theta value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

increaseMap

description:

 Increases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->increaseMap($data->{document}, $data->{topic}, $data->{word});

decreaseMap

description:

 Decreases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->decreaseMap($data->{document}, $data->{topic}, $data->{word});

valid

description:

 Returns whether or not $data is a valid array (able to be added to the dataset)

input:

 $data -> data to be evaluated

output:

 Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;

example:

 return unless (valid($args{data}));

removeSpecialChars

description:

 Removes special characters from a word (non-ascii/non-letter characters)

input:

 $word -> word to be cleaned

output:

 $newWord -> $word without non-ascii/non-letter characters

example:

 @ws = map { removeSpecialChars($_) } @ws;

beta

description:

 Randomly initializes beta values

input:

 None

output:

 None

example:

 beta();

stop

description:

 Stopword subroutine.  Generates a regex to remove words in a stopword list

input:

 None

output:

 $stop_regex -> regex containing stopwords

example:

 my $stop = stop();
 my $regex = qr/($stop)/;

REFERENCING

    If you have a reference paper for this module put it here in bibtex form

CONTACT US

  If you have any trouble installing and using <module name> 
  please contact us via :

      Bridget T. McInnes: btmcinnes at vcu.edu

SEE ALSO

Additional modules associated with the package

AUTHORS

  Nick Jordan, Virginia Commonwealth University 

  Bridget McInnes, Virginia Commonwealth University

COPYRIGHT AND LICENSE

Copyright 2016 by Bridget McInnes, Nicholas Jordan

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.