The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::ZH::Summarize - Summarizing bodies of Chinese text

SYNOPSIS

    use Lingua::ZH::Summarize;

    print summarize( $text );                    # Easy, no? :-)
    print summarize( $text, maxlength => 500 );  # 500-byte summary
    print summarize( $text, wrap => 75 );        # Wrap output to 75 col.

DESCRIPTION

This is a simple module which makes an unscientific effort at summarizing Chinese text. It recognizes simple patterns which look like statements, abridges them, and concatenates them into something vaguely resembling a summary. It needs more work on large bodies of text, but it seems to have a decent effect on small inputs at the moment.

Lingua::ZH::Summarize exports one function, summarize(), which takes the text to summarize as its first argument, and any number of optional directives in name => value form. The options it'll take are:

maxlength

Specifies the maximum length, in bytes, of the generated summary.

wrap

Prettyprints the summary output by wrapping it to the number of columns which you specify. This requires the Lingua::ZH::Wrap module.

Needless to say, this is a very simple and not terribly universally effective scheme, but it's good enough for a first draft, and I'll bang on it more later. Like I said, it's not a scientific approach to the problem, but it's better than nothing.

SEE ALSO

Lingua::ZH::Toke, Lingua::ZH::Wrap, Lingua::EN::Summarize

ACKNOWLEDGEMENTS

Algorithm adapted from the Lingua::EN::Summarize module by Dennis Taylor, <dennis@funkplanet.com>.

AUTHORS

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Copyright 2003 by Autrijus Tang <autrijus@autrijus.org>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html