NAME

Lingua::Word::Parser - Parse a word into scored known and unknown parts

VERSION

version 0.0804

SYNOPSIS

 use Lingua::Word::Parser;

 # With a database source:
 my $p = Lingua::Word::Parser->new(
    word   => 'abioticaly',
    dbname => 'fragments',
    dbuser => 'akbar',
    dbpass => 's3kr1+',
 );

 # With a file source:
 $p = Lingua::Word::Parser->new(
    word => 'abioticaly',
    file => 'eg/lexicon.dat',
 );

 my $known  = $p->knowns;
 my $combos = $p->power;
 my $score  = $p->score;    # Stringified output
 #my $score  = $p->score_parts; # "Raw" output

 # The best guess is the last sorted scored set:
 print Dumper $score->{ [ sort keys %$score ]->[-1] };

DESCRIPTION

A Lingua::Word::Parser breaks a word into known affixes.

A word-part lexicon file must have "regular-expression definition" lines of the form:

 a(?=\w)        opposite
 ab(?=\w)       away
 (?<=\w)o(?=\w) combining
 (?<=\w)tic     possessing

Please see the included eg/lexicon.dat example file.

A database lexicon must have records as above, but with the column names, affix and definition. Please see the included eg/word_part.sql example file.

METHODS

new()

  $x = Lingua::Word::Parser->new(%arguments);

Create a new Lingua::Word::Parser object.

Arguments and defaults:

  word:   undef
  dbuser: undef
  dbpass: undef
  dbname: undef
  dbtype: mysql
  dbhost: localhost

knowns()

 my $known = $p->knowns;

Find the known word parts and their bitstring masks.

power()

 my $combos = $p->power();

Find the set of non-overlapping known word parts by considering the power set of all masks.

score()

  $score = $p->score();
  $score = $p->score( $open_separator, $close_separator);

Score the known vs unknown word part combinations into ratios of characters and chunks, word familiarity, partitions and definitions.

This method sets the score member to a list of hashrefs with keys:

  partition
  definition
  score
  familiarity

If not given, the $open_separator and $close_separator are '<' and '>' by default.

score_parts()

  $score_parts = $p->score_parts();
  $score_parts = $p->score_parts( $open_separator, $close_separator );
  $score_parts = $p->score_parts( $open_separator, $close_separator, $line_terminator );

Score the known vs unknown word part combinations into ratios of characters and chunks, word familiarity, partitions and definitions.

If not given, the $open_separator and $close_separator are '<' and '>' by default.

The $line_terminator can be any string, like a newline (\n or an HTML line-break), but is the empty string ('') by default.

SEE ALSO

Lingua::TokenParse - The predecessor of this module.

http://en.wikipedia.org/wiki/Affix is the tip of the iceberg...

https://github.com/ology/Word-Part a friendly Dancer user interface.

The t/* and eg/* files in this distribution!

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.