The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::EN::Sentence - Module for splitting text into sentences.

SYNOPSIS

        use Lingua::EN::Sentence qw( get_sentences add_acronyms );

        add_acronyms(('lt','gen'));             ## adding support for 'Lt. Gen.'
        my $sentences=get_sentences($text);     ## Get the sentences.
        foreach my $sentence (@$sentences) {
                ## do something with $sentence
        }

DESCRIPTION

The Lingua::EN::Sentence module contains the function get_sentences, which splits text into its constituent sentences, based on a regular expression and a list of abbreviations (built in and given).

Certain well know exceptions, such as abreviations, may cause incorrect segmentations. But some of them are already integraded into this code and are being taken care of. Still, if you see that there are words causing the get_sentences() to fail, you can add those to the module, so it notices them.

FUNCTIONS

All functions used should be requested in the 'use' clause. None is exported by default.

get_sentences( $text )

The get sentences function takes a scalar containing ascii text as an argument and returns a reference to an array of sentences that the text has been split into. Returned sentences will be trimmed (beginning and end of sentence) of white-spaces. Strings with no alpha-numeric characters in them, won't be returned as sentences.

add_acronyms( @acronyms )

This function is used for adding acronyms not supported by this code. Please see `Acronym/Abbreviations list' somewhere in this document for the abbreviations already supported by this module.

get_acronyms( )

This function will return the defined list of acronyms.

set_acronyms( @my_acronyms )

This function replaces the predefined acroym list with the given list.

Acronym/Abbreviations list

Currently supported acronym lists are:

        PEOPLE ( 'jr', 'mr', 'mrs', 'ms', 'dr', 'prof' )
        INSTITUTES ( 'dept', 'univ' )
        COMPANIES ( 'inc', 'ltd' )
        MISC ( 'vs', 'etc', 'no' )

If I come across a good general-purpose list - I'll incorporate it into this module. Feel free to suggest such lists.

SEE ALSO

        Text::Sentence

AUTHOR

Shlomo Yona <Shlomo.Yona@Siftology.com>

COPYRIGHT

Copyright (c) 2001 Siftology Inc.. All rights reserved.

This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 34:

'=item' outside of any '=over'

Around line 52:

You forgot a '=back' before '=head1'