Lingua::ZH::Summarize - Summarizing bodies of Chinese text


    use Lingua::ZH::Summarize;

    print summarize( $text );                    # Easy, no? :-)
    print summarize( $text, maxlength => 500 );  # 500-byte summary
    print summarize( $text, wrap => 75 );        # Wrap output to 75 col.


This is a simple module which makes an unscientific effort at summarizing Chinese text. It recognizes simple patterns which look like statements, abridges them, and concatenates them into something vaguely resembling a summary. It needs more work on large bodies of text, but it seems to have a decent effect on small inputs at the moment.

Lingua::ZH::Summarize exports one function, summarize(), which takes the text to summarize as its first argument, and any number of optional directives in name => value form. The options it'll take are:


Specifies the maximum length, in bytes, of the generated summary.


Prettyprints the summary output by wrapping it to the number of columns which you specify. This requires the Lingua::ZH::Wrap module.

Needless to say, this is a very simple and not terribly universally effective scheme, but it's good enough for a first draft, and I'll bang on it more later. Like I said, it's not a scientific approach to the problem, but it's better than nothing.


Lingua::ZH::Toke, Lingua::ZH::Wrap, Lingua::EN::Summarize


Algorithm adapted from the Lingua::EN::Summarize module by Dennis Taylor, <>.


Autrijus Tang <>


Copyright 2003 by Autrijus Tang <>.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.