Text::Shoebox::Lexicon - an object-oriented interface to Shoebox lexicons


  use Text::Shoebox::Lexicon;
  my $lex = Text::Shoebox::Lexicon->read_file( "haida.sf" );
  my @entries = $lex->entries;
  print "See, it has ", scalar( @entries ), " entries!\n";


On object of class Text::Shoebox::Lexicon represents a SF-format lexicon. This mostly just means it's a container for a list of Text::Shoebox::Entry objects, which represent the entries in this lexicon.

This class (plus Text::Shoebox::Entry) exists basically to provide an OO interface around Text::Shoebox -- but you're free to directly use Text::Shoebox instead if you prefer a functional interface.


$lex = Text::Shoebox::Lexicon->new;

This method returns a new Text::Shoebox Lexicon object, containing an empty list of entries.

$lex->read_file( $filespec );

This reads entries from $filespec (e.g., "./whatever.sf") into $lex. If $filespec doesn't exist or isn't readable, then this dies.

$lex = Text::Shoebox::Lexicon->read_file( $filespec );

This constructs a new lexicon object and reads entries from $filespec into it. I.e., it's basically a shortcut for:

               $lex = Text::Shoebox::Lexicon->new;
$lex->read_handle( $filehandle );
$lex = Text::Shoebox::Lexicon->read_handle( $filehandle );

These work just like read_file except that the argument should be a filehandle instead of a filespec string.

$lex->write_file( $filespec );

This writes the entries from $lex to the given filespec. If they can't be written, this dies.

$lex->write_handle( $filehandle );

These work just like write_file except that the argument should be a filehandle instead of a filespec string.


This prints (not returns!) a dump of the contents of $lex.

@them = $lex->entries;

This returns a list of the entry objects in $lex.

$them = $lex->entries_as_lol;

This returns a reference to the array of entry objects in $lex.

This can be useful for doing things like push @$them, $newentry;.

This is your only way of altering the entry-list in $lex, other than read_file and read_handle!

Other Attributes

A lexicon object is mainly for just holding a list of entries. But besides that list, it also contains these attributes, which you usually don't have to know about:

The "no_scrunch" attribute

Right after read_file (or read_handle) has finished reading entries, it goes over all of them and calls $e->scrunch on each. (See Text::Shoebox::Entry for an explanation of the scrunch method.) But you can override this by calling $lex->no_scrunch(1) to set the "no_scrunch" method to a true value.

(You can also explicitly turn this off with $lex->no_scrunch(0), or check it with $lex->no_scrunch().)

The "rs" attribute

When Text::Shoebox::Lexicon reads or writes a lexicon, it normally lets Text::Shoebox determine the right value for the newline string (also known as the "RS", even tho for SF format it's not a record separator at all), and that's usually the right thing.

But if that's not working right and you need to override that newline-guessing (notably, this might be necessary with read_handle, which isn't as good as guessing as read_file is), then you can set the lexicon's rs attribute directly, with $lex->rs("\cm\cj"). Or you can even force it to the system-default value with just $lex->rs($/). Or you can just check the value of the rs attribute with just $lex->rs().


Copyright 2004, Sean M. Burke, all rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Sean M. Burke,

I hasten to point out, incidentally, that I am not in any way affiliated with the Summer Institute of Linguistics.