NAME

Text::Summarize - Routine to compute summaries of text.

SYNOPSIS

  use strict;
  use warnings;
  use Text::Summarize;
  use Data::Dump qw(dump);
  my $listOfSentences = [
    { id => 0, listOfTokens => [qw(all people are equal)] },
    { id => 1, listOfTokens => [qw(all men are equal)] },
    { id => 2, listOfTokens => [qw(all are equal)] },
  ];
  dump getSumbasicRankingOfSentences(listOfSentences => $listOfSentences);

DESCRIPTION

Text::Summarize contains a routine to score a list of sentences for inclusion in a summary of the text using the SumBasic algorithm from the report Beyond SumBasic: Task-Focused Summarization with Sentence Simplification and Lexical Expansion by L. Vanderwendea, H. Suzukia, C. Brocketta, and A. Nenkovab.

ROUTINES

`getSumbasicRankingOfSentences`

  use Text::Summarize;
  use Data::Dump qw(dump);
  my $listOfSentences = [
    { id => 0, listOfTokens => [qw(all people are equal)] },
    { id => 1, listOfTokens => [qw(all men are equal)] },
    { id => 2, listOfTokens => [qw(all are equal)] },
  ];
  dump getSumbasicRankingOfSentences(listOfSentences => $listOfSentences);

getSumbasicRankingOfSentences computes the sumBasic score of the list of sentences provided. It returns an array reference containing the pairs [id, score] sorted in descending order of score, where id is from listOfSentences.

listOfSentences

 listOfSentences => [{id => '..', listOfTokens => [...]}, ..., {id => '..', listOfTokens => [...]}]

listOfSentences holds the list of sentences that are to be scored. Each item in the list is a hash reference of the form {id => '..', listOfTokens => [...]} where id is a unique identifier for the sentence and listOfTokens is an array reference of the list of tokens comprizing the sentence.

tokenWeight

 tokenWeight => {}

tokenWeight is a optional hash reference that provides the weight of the tokens defined in listOfSentences. If tokenWeight is defined, but undefined for a token in a sentence, then the tokens weight defaults to zero unless ignoreUndefinedTokens is true, in which case the token is ignored and not used to compute the average weight of the sentences containing it. If tokenWeight is undefined then the weights of the tokens are either their frequency of occurrence in the filtered text, or their textranks if textRankParameters is defined.

ignoreUndefinedTokens

 ignoreUndefinedTokens => 0

If ignoreUndefinedTokens is true, then any tokens for which tokenWeight is undefined are ignored and not used to compute the average weight of a sentence; the default is false.

tokenWeightUpdateFunction

 tokenWeightUpdateFunction => &subroutine (currentTokenWeight, initialTokenWeight, token, selectedSentenceId, selectedSentenceWeight)

tokenWeightUpdateFunction is an optional parameter for defining the function that updates the weight of a token when it is contained in a selected sentence. Five parameters are passed to the subroutine: the token's current weight (float), the token's initial weight (float), the token (string), the id of the selected sentence (string), and the current average weight of the tokens in the selected sentence (float). The default is tokenWeightUpdateFunction_Squared.

textRankParameters

  textRankParameters => undef

If textRankParameters is defined, then the token weights are computed using Text::Categorize::Textrank. The parameters to use for Text::Categorize::Textrank, excluding the listOfTokens parameters, can be set using the hash reference defined by textRankParameters. For example, textRankParameters => {directedGraph => 1} would make the textrank weights be computed using a directed token graph.

`tokenWeightUpdateFunction_Squared`

Returns the tokens current weight squared.

`tokenWeightUpdateFunction_Multiplicative`

Returns the tokens current weight times its intial weight.

`tokenWeightUpdateFunction_Sentence`

Returns the tokens current weight times its the average weight of the tokens in the selected sentence.

INSTALLATION

Use CPAN to install the module and all its prerequisites:

  perl -MCPAN -e shell
  >install Text::Summarize

BUGS

Please email bugs reports or feature requests to bug-text-summarize@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Summarize. The author will be notified and you can be automatically notified of progress on the bug fix or feature request.

AUTHOR

 Jeff Kubina<jeff.kubina@gmail.com>

COPYRIGHT

The full text of the license can be found in the LICENSE file included with this module.

KEYWORDS

information processing, summary, summaries, summarization, summarize, sumbasic, textrank

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

ROUTINES

getSumbasicRankingOfSentences

tokenWeightUpdateFunction_Squared

tokenWeightUpdateFunction_Multiplicative

tokenWeightUpdateFunction_Sentence