Algorithm::LDA
use Algorithm::LDA; my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");
Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm
description:
Used to add to array of documents ($self->documents)
input:
%args <- hash containing data
output:
1
example:
while (my $line = <$fh2>) { my $obj = decode_json($line); add(%$obj); }
Initializes alpha, initializes beta, loads documents, starts main loop
None
init();
Prints words in each topic, topics in each document, phi values, and theta values to text files in the 'Results/$data' directory
printResults();
Loads documents from text files (in "data/$data") or JSON file (in "Documents")
load();
description: Creates an array of words in each topic
%args -> hash containing topic
@words -> Array containing words and probabilities (phi value) for $args{topic}
my $words_on_topic = wordsPerTopic(topic => $topic);
Creates an array of topics in each document
%args -> hash containing document
@topics -> Array containing topics and probabilities (theta value) for $args{document}
my $topics_on_document= topicsPerDocument(document => $doc);
Uses Gibbs Sampling to determine a topic given a document and word
$document -> ID of document word is in $word -> word that is to be evaluated
$topic -> topic ID $k -> last topic if topic can't be found
Computes the expected phi value for a word given a topic ID
$topic -> ID of topic (iteration 0..$k) $word -> word that is to be evaluated
Phi value
$dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));
Computes the expected theta value for a topic given a document ID
$document -> ID of document $topic -> ID of topic (iteration 0..$k)
Theta value
Increases the values of all of the hashmaps
$document -> ID of document $topic -> ID of topic $word -> word in document $document
$self->increaseMap($data->{document}, $data->{topic}, $data->{word});
Decreases the values of all of the hashmaps
$self->decreaseMap($data->{document}, $data->{topic}, $data->{word});
Returns whether or not $data is a valid array (able to be added to the dataset)
$data -> data to be evaluated
Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;
return unless (valid($args{data}));
Removes special characters from a word (non-ascii/non-letter characters)
$word -> word to be cleaned
$newWord -> $word without non-ascii/non-letter characters
@ws = map { removeSpecialChars($_) } @ws;
Randomly initializes beta values
beta();
Stopword subroutine. Generates a regex to remove words in a stopword list
$stop_regex -> regex containing stopwords
my $stop = stop(); my $regex = qr/($stop)/;
If you have a reference paper for this module put it here in bibtex form
If you have any trouble installing and using <module name> please contact us via : Bridget T. McInnes: btmcinnes at vcu.edu
Additional modules associated with the package
Nick Jordan, Virginia Commonwealth University Bridget McInnes, Virginia Commonwealth University
Copyright 2016 by Bridget McInnes, Nicholas Jordan
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
To install Algorithm::LDA, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Algorithm::LDA
CPAN shell
perl -MCPAN -e shell install Algorithm::LDA
For more information on module installation, please visit the detailed CPAN module installation guide.