The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::EN::Opinion - Measure the emotional sentiment of text

VERSION

version 0.1702

SYNOPSIS

  use Lingua::EN::Opinion;

  # Positive/Negative:
  my $opinion = Lingua::EN::Opinion->new( file => '/some/file.txt', stem => 1 );
  $opinion->analyze();

  my $scores = $opinion->scores;

  my $ratio = $opinion->ratio(); # Knowns / ( Knowns + Unknowns )
  $ratio = $opinion->ratio(1); # Unknowns / ( Knowns + Unknowns )

  $scores = $opinion->averaged_scores(5);

  my $score = $opinion->get_word('foo');
  my ( $known, $unknown );
  my $sentence = 'Mary had a little lamb.';
  ( $score, $known, $unknown ) = $opinion->get_sentence($sentence);

  # NRC:
  $opinion = Lingua::EN::Opinion->new( text => "$sentence It's fleece was ..." );
  $opinion->nrc_analyze();

  $scores = $opinion->nrc_scores;

  $ratio = $opinion->ratio();
  $ratio = $opinion->ratio(1);

  $score = $opinion->nrc_get_word('happy');
  ( $score, $known, $unknown ) = $opinion->nrc_get_sentence($sentence);
  $score = $opinion->nrc_get_sentence($sentence);

  $opinion->set_word(foo => 1);
  $opinion->nrc_set_word(foo => { anger => 0, etc => '...' });

DESCRIPTION

A Lingua::EN::Opinion object measures the emotional sentiment of text and saves the results in the scores and nrc_scores attributes.

When run against the positive and negative classified training reviews in the dataset referenced under "SEE ALSO", this module does ... okay. Out of 25k reviews, the eg/pos-neg program gets about 70% correct.

ATTRIBUTES

file

  $file = $opinion->file;

The text file to analyze.

text

  $text = $opinion->text;

A text string to analyze instead of a text file.

stem

  $stem = $opinion->stem;

Boolean flag to indicate that word stemming should take place.

For example, "horses" becomes "horse" and "hooves" becomes "hoof."

This is the proper way to use this module but takes ... a lot longer.

stemmer

  $stemmer = $opinion->stemmer;

Require the WordNet::QueryData and WordNet::stem modules to stem each word of the provided file or text.

* These modules must be installed and working to use this feature.

This is a computed result. Providing this in the constructor will be ignored.

sentences

  $sentences = $opinion->sentences;

Computed result. An array reference of every sentence!

scores

  $scores = $opinion->scores;

Computed result. An array reference of the score of each sentence.

nrc_scores

  $scores = $opinion->nrc_scores;

Computed result. An array reference of hash references containing the NRC scores for each sentence.

positive

  $positive = $opinion->positive;

Computed result. A module to use to "analyze".

negative

  $negative = $opinion->negative;

Computed result. A module to use to "analyze".

emotion

  $emotion = $opinion->emotion;

Computed result. The module to used to find the "nrc_sentiment".

familiarity

  $familiarity = $opinion->familiarity;

Computed result. Hash reference of total known and unknown words:

 { known => $x, unknown => $y }

METHODS

new

  $opinion = Lingua::EN::Opinion->new(
    file => $file,
    text => $text,
    stem => $stem,
  );

Create a new Lingua::EN::Opinion object.

analyze

  $scores = $opinion->analyze();

Measure the positive/negative emotional sentiment of text.

This method sets the familiarity, scores and sentences attributes.

averaged_score

Synonym for the "averaged_scores" method.

averaged_scores

  $scores = $opinion->averaged_scores($bins);

Compute the averaged scores given a number of (integer) bins.

Default: 10

This reduces the amount of "noise" in the original signal. As such, it loses information detail.

For example, if there are 400 sentences, bins of 10 will result in 40 data points. Each point will be the mean of each successive bin-sized set of points in the analyzed scores.

nrc_sentiment

Synonym for the "nrc_analyze" method.

nrc_analyze

  $scores = $opinion->nrc_analyze();

Compute the NRC sentiment of the given text.

This is given by a 0/1 list of these 10 emotional elements:

  anger
  anticipation
  disgust
  fear
  joy
  negative
  positive
  sadness
  surprise
  trust

This method sets the familiarity, nrc_scores and sentences attributes.

get_word

  $sentiment = $opinion->get_word($word);

Get the positive/negative sentiment for a given word. Return undef, 0 or 1 for "does not exist", "is positive" or "is negative", respectively.

set_word

  $opinion->set_word($word => $value);

Set the positive/negative sentiment for a given word as 1, -1 or undef.

nrc_get_word

  $sentiment = $opinion->nrc_get_word($word);

Get the NRC emotional sentiment for a given word. Return a hash reference of the NRC emotions as detailed in "nrc_analyze". If the word does not exist, return undef.

nrc_set_word

  $opinion->nrc_set_word($word => $value);

Set the NRC emotional sentiment for a given word.

The value must be given as a hash-reference with any of the keys detailed in the nrc_analyze method.

get_sentence

  ( $score, $known, $unknown ) = $opinion->get_sentence($sentence);
  ( $score, $known, $unknown ) = $opinion->get_sentence( $sentence, $known, $unknown );

Return the integer value for the sum of the word scores of the given sentence. Also return known and unknown values for the number of familiar words.

The known and unknown arguments refer to the "familiarity" and are incremented by this routine.

nrc_get_sentence

  ( $score, $known, $unknown ) = $opinion->nrc_get_sentence($sentence);
  ( $score, $known, $unknown ) = $opinion->nrc_get_sentence( $sentence, $known, $unknown );

Return the summed NRC emotion values for each word of the given sentence as a hash reference. Also return known and unknown values for the number of familiar words.

ratio

Return the ratio of either the known or unknown words vs the total known + unknown words.

Default: 0

If the method is given a 1 as an argument, the unknown words ratio is returned. Otherwise the known ratio is returned by default.

tokenize

  @words = $opinion->tokenize($sentence);

Drop punctuation and digits, then split the sentence by whitespace and return the resulting lower-cased "word" list.

SEE ALSO

The eg/ and t/ scripts

Moo

File::Slurper

Lingua::EN::Sentence

Statistics::Lite

Try::Tiny

WordNet::QueryData and WordNet::stem for stemming

https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html#lexicon

http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

https://ology.github.io/2018/03/04/book-of-revelation-sentiment-analysis/ is a write-up using this technique.

https://ai.stanford.edu/~amaas/data/sentiment/ is the "Large Movie Review Dataset"

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2022 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.