The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Lingua::NATools::Dict - Perl extension to encapsulate Dict interface

SYNOPSIS

  use Lingua::NATools::Dict;

  $dic = Lingua::NATools::Dict::open("file.bin");

  $dic->save($filename);
  $dic->close;

  $dic->add($dic2);

  $dic->size();

  $dic->exists($id);
  $dic->occ($id);
  $dic->vals($id);

  $dic->for_each( sub{ ... } );

DESCRIPTION

The Dict files (with extension .bin) created by NATools, are mapping from identifiers of words on one corpus, to identifiers of words on another corpus. Thus, all operations performed by this module uses identifiers instead of words.

You can open the dictionary using

  $dic = Lingua::NATools::Dict::open("dic.bin");

Then, all operations are available by methods, in a OO fashion. After using the dictionary, do not forget to close it using

  $dic->close().

The add method receives a dictionary object and adds it with the current contents. Notice that both dictionaries need to be congruent relatively to word identifiers. After adding, do not forget to save, if you with, with

   $dic->save("new.dic.bin");

The size method returns the total number of words on the corpus (the sum of all word occurrences). To get the number of occurrences for a specific word, use the occ method, passing as parameter the word identifier.

To check if an identifier exists in the dictionary, you can use the exists method which returns a boolean value.

The vals method returns an hash of probable translations for the identifier supplied AS A ARRAY REFERENCE. The hash contains as keys the identifiers of the possible translations, and as values their probability of being a translation.

Finally, the for_each method makes you able to cycle through all word on the dictionary. It receives a funcion reference as argument.

  $dic->for_each( sub{ ... } );

Each time the function is called, the following is passed as @_:

  word => $id , occ => $occ , vals => $vals

where $id is the word identifier, $occ the result of calling occ with that word, and $vals is the result of calling vals with that word.

SEE ALSO

See perl(1) and NATools documentation.

AUTHOR

Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>

COPYRIGHT AND LICENSE

Copyright 2002-2012 by NATURA Project http://natura.di.uminho.pt

This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.