NAME

Text::Summarize::En - Routine to summarize English text.

SYNOPSIS

use strict;
use warnings;
use Text::Summarize::En;
use Data::Dump qw(dump);
my $summarizerEn = Text::Summarize::En->new();
my $text         = 'All people are equal. All men are equal. All are equal.';
dump $summarizerEn->getSummaryUsingSumbasic(listOfText => [$text]);

DESCRIPTION

Text::Summarize contains routines for ranking the sentences in English text for inclusion in a summary using the sumBasic algorithm.

CONSTRUCTOR

`new`

The method new creates an instance of the Text::Summarize::En class with the following parameters:

endingSentenceTag

endingSentenceTag => 'PP'

endingSentenceTag is the part-of-speech tag that should be used to indicate the end of a sentence. The default is 'PP'. The value of this tag must be a tag generated by the module Lingua::EN::Tagger.

listOfPOSTypesToKeep

listOfPOSTypesToKeep => [qw(CONTENT_WORDS)]

The sumBasic algorithm preprocesses the text so that only certain parts-of-speech (POS) are retained and used to rank the sentences. The module Lingua::EN::Tagger is used to tag the parts-of-speech of the text. The parts-of-speech retained can be specified by word types, where the type is a combination of 'ALL', 'ADJECTIVES', 'ADVERBS', 'CONTENT_ADVERBS', 'CONTENT_WORDS', 'NOUNS', 'PUNCTUATION', 'TEXTRANK_WORDS', or 'VERBS'. The default is [qw(CONTENT_WORDS)], which equates to [qw(CONTENT_ADVERBS, VERBS, ADJECTIVES, NOUNS)].

listOfPOSTagsToKeep

listOfPOSTagsToKeep => [...]

listOfPOSTagsToKeep provides finer control over the parts-of-speech to be retained when filtering the tagged text. For a list of all the possible tags call getListOfPartOfSpeechTags().

METHODS

`getSummaryUsingSumbasic`

getSummaryUsingSumbasic computes the summary of text using the sumBasic algorithm.

listOfStemmedTaggedSentences

listOfStemmedTaggedSentences => [...]

listOfStemmedTaggedSentences is an array reference containing the list of stemmed and part-of-speech tagged sentences from Text::StemTagPos. If listOfStemmedTaggedSentences is not defined, then the text to be processed should be provided via listOfText.

listOfText

listOfText => [...]

listOfText is an array reference containing the strings of text to be summarized. listOfText is only used if listOfStemmedTaggedSentences is undefined.

tokenWeight

tokenWeight => {}

tokenWeights is an optional hash reference that can provide the weights for the tokens provided by listOfStemmedTaggedSentences or listOfText. If tokenWeights is not defined then the weight of a token is just its frequency of occurrence in the filtered text. If textRankParameters is defined, then the token weights are computed using Text::Categorize::Textrank.

textRankParameters

textRankParameters => undef

If textRankParameters is defined, then the token weights for the sumBasic algorithm are computed using Text::Categorize::Textrank. The parameters to use for Text::Categorize::Textrank, excluding the listOfTokens parameters, can be set using the hash reference defined by textRankParameters. For example, textRankParameters => {directedGraph => 1} would make the textrank weights be computed using a directed token graph.

INSTALLATION

Use CPAN to install the module and all its prerequisites:

perl -MCPAN -e shell
>install Text::Summarize

BUGS

Please email bugs reports or feature requests to bug-text-summarize@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Summarize. The author will be notified and you can be automatically notified of progress on the bug fix or feature request.

AUTHOR

Jeff Kubina<jeff.kubina@gmail.com>

COPYRIGHT

The full text of the license can be found in the LICENSE file included with this module.

KEYWORDS

information processing, summary, summaries, summarization, summarize, sumbasic, textrank

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)