NAME

OpenOffice::Wordlist - Read/write OpenOffice.org wordlists

SYNOPSIS

This module allows reading and writing of OpenOffice.org wordlists (dictionaries).

For example:

use OpenOffice::Wordlist;

my $dict = OpenOffice::Wordlist->new;
$dict->read(".openoffice.org/3/user/wordlist/standard.dic");

# Print all words.
foreach my $word ( @{ $dict->words } ) {
    print $word, "\n";
}

# Add some words.
$dict->append( "openoffice", "great" );

# Write a new dictionary.
$dict->write("new.dic");

When used as a program this module will read all dictionaries given on the command line and write the resultant list of words to standard output. For example,

$ perl OpenOffice/Wordlist.pm standard.dic

METHODS

$dict = new( [ type => 'WDSWG6', language => 2057, neg => 0 ] )

Creates a new dict object.

Optional arguments:

type => 'WBSWG6' or 'WBSWG2' or 'WBSWG5'.

'WBSWG6' (default) indicates a UTF-8 encoded dictionary, the others indicate a ISO-8859.1 encoded dictionary.

language => code

The code for the language. I assume there's an extensive list of these codes somewhere. Some values determined experimentally:

 255   All
1031   German (Germany)
1036   French (France)
1043   Dutch (Netherlands)
2047   English UK
2057   English USA

neg => 0 or 1

Whether the dictionary contains exceptions (neg = 1) or regular words (neg = 0).

If language and neg are not specified they are taken from the first file read, if any.

$dict->read( $file )

Reads the contents of the indicated file.

$dict->append( @words )

Append a list of words to the dictionary. To avoid unpleasant surprises, the words must be encoded in Perl's internal encoding.

The arguments may be constant strings or references to lists of strings.

$dict->words

Returns a reference to the list of words in the dictionary,

The words are encoded in Perl's internal encoding.

$dict->write( $file [ , $type ] )

Writes the contents of the object to a new dictionary.

Arguments: The name of the file to be written, and (optionally) the type of the file to be written (one of 'WBSWG6', 'WBSWG5', 'WBSWG2') overriding the type of the dictionary as establised at create time.

EXAMPLE

This example reads all dictionaries that are supplied on the command file, merges them, and writes a new dictionary.

my $dict = OpenOffice::Wordlist->new( type => 'WBSWG6' );
$dict->read( shift );
foreach ( @ARGV ) {
  my $extra = OpenOffice::Wordlist->new->read($_);
  $dict->append( $extra->words );
}
$dict->write("new.dic");

Settings like the language and exceptions are copied from the file that is initially read.

AUTHOR

Johan Vromans, <jv at cpan.org>

BUGS

There's currently no checking done on dictionary types arguments.

Please report any bugs or feature requests to bug-openoffice-wordlist at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=OpenOffice-Wordlist. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc OpenOffice::Wordlist

You can also look for information at:

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

Copyright 2010 Johan Vromans, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.