NAME

DiaColloDB::EnumFile - diachronic collocation db, symbol<->integer enum

SYNOPSIS

 ##========================================================================
 ## PRELIMINARIES
 
 use DiaColloDB::EnumFile;
 
 ##========================================================================
 ## Constructors etc.
 
 $cldb = CLASS_OR_OBJECT->new(%args);
 
 ##========================================================================
 ## I/O: open/close (file)
 
 $enum_or_undef = $enum->open($base,$flags);
 $enum_or_undef = $enum->close();
 $bool = $enum->opened();
 $bool = $enum->dirty();
 $bool = $enum->loaded();
 $bool = $enum->rollback();
 $bool = $enum->flush();
 \@i2s = $enum->toArray();
 $enum = $enum->fromArray(\@i2s);
 $enum = $enum->fromHash(\%s2i);
 $enum = $enum->fromEnum($enum2);
 $bool = $enum->load();
 $enum = $enum->save();
 
 ##========================================================================
 ## I/O: header
 
 @keys = $coldb->headerKeys();
 $bool = $enum->loadHeaderData($hdr);
 
 ##========================================================================
 ## I/O: text
 
 $enum = $CLASS_OR_OBJECT->loadTextFh($fh,%opts);
 $bool = $enum->saveTextFh($fh,%opts);
 
 ##========================================================================
 ## Methods: population (in-memory only)
 
 $size = $enum->size();
 $newsize = $enum->setsize($newsize);
 $newsize = $enum->addSymbols(@symbols);
 $newsize = $enum->appendSymbols(@symbols);
 $newsize = $enum->addEnum($enum2_or_undef);
 
 ##========================================================================
 ## Methods: lookup
 
 $s_or_undef = $enum->i2s($i);
 $i_or_undef = $enum->s2i($s);
 \@is = $enum->re2i($regex);
 

DESCRIPTION

DiaColloDB::EnumFile provides an object-oriented interface to static symbol<->integer mappings using direct file I/O for lookup. See DiaColloDB::EnumFile::MMap for a fast implementation using mmap().

Globals & Constants

Variable: @ISA

DiaColloDB::EnumFile inherits from DiaColloDB::Persistent.

Constructors etc.

new
 $enum = CLASS_OR_OBJECT->new(%args);

%args, object structure:

 base => $base,       ##-- database basename; use files "${base}.es", "${base}.esx", "${base}.eix", "${base}.hdr"
 perms => $perms,     ##-- default: 0666 & ~umask
 flags => $flags,     ##-- default: 'r'
 pack_i => $pack_i,   ##-- integer pack template (default='N')
 pack_o => $pack_o,   ##-- file offset pack template (default='N')
 pack_l => $pack_l,   ##-- string-length pack template (default='n')
 pack_s => $pack_s,   ##-- string pack template (default=undef) for text i/o
 size => $size,       ##-- number of mapped symbols, like scalar(@i2s)
 utf8 => $bool,       ##-- true iff strings are stored as utf8 (default, used by re2i())
 ##
 ##-- in-memory construction and caching
 s2i => \%s2i,        ##-- maps symbols to integers
 i2s => \@i2s,        ##-- maps integers to symbols
 dirty => $bool,      ##-- true if in-memory structures are not in-sync with file data
 loaded => $bool,     ##-- true if file data has been loaded to memory
 shared => $bool,     ##-- true to avoid closing filehandles on close() or DESTROY() (default=false)
 ##
 ##-- pack lengths (after open())
 len_i => $len_i,     ##-- packsize($pack_i)
 len_o => $len_o,     ##-- packsize($pack_o)
 len_l => $len_l,     ##-- packsize($pack_l)
 len_sx => $len_sx,   ##-- $len_o + $len_i
 ##
 ##-- filehandles (after open())
 sfh  => $sfh,        ##-- $base.es  : pack("(${pack_l}/A)*", @$i2s)
 ixfh => $ixfh,       ##-- $base.eix : [$i] => pack("${pack_o}",          $offset_in_sfh_of_string_with_id_i)
 sxfh => $sxfh,       ##-- $base.esx : [$j] => pack("${pack_o}${pack_i}", $offset_in_sfh_of_string_with_sortindex_j_and_id_i, $i)
DESTROY

destructor implicitly calls close().

promote
 $enum = $enum->promote($class,$force)

Promotes $enum to class $class. If $force is false (default), promotion via ref($enum)."::MMap" will be disabled.

I/O: open/close (file)

See also DiaColloDB::Persistent.

open
 $enum_or_undef = $enum->open($base,$flags);
 $enum_or_undef = $enum->open($base);
 $enum_or_undef = $enum->open();

opens file(s), clears {loaded} flag.

close
 $enum_or_undef = $enum->close();

closes the enum, implicitly calling flush() if opened for writing.

opened
 $bool = $enum->opened();

returns true iff enum is opened.

dirty
 $bool = $enum->dirty();

returns true iff some in-memory structures haven't been flushed to disk.

loaded
 $bool = $enum->loaded();

returns true iff in-memory structures have been populated from disk

rollback
 $bool = $enum->rollback();
  • drops in-memory structures

  • invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference).

  • clears {dirty} flag

flush
 $bool = $enum->flush();
 $bool = $enum->flush($force);
  • flush in-memory structures to disk

  • no-op unless $force or $enum->dirty() is true

  • clobbers any old disk-file contents with in-memory maps

  • enum must be opened in write-mode

  • invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference)

  • clears {dirty} flag

toArray
 \@i2s = $enum->toArray();

return an ARRAY-ref representing the mapping; array items are still byte-encoded.

fromArray
 $enum = $enum->fromArray(\@i2s);

clobbers $enum contents, steals \@i2s

fromHash
 $enum = $enum->fromHash(\%s2i);

clobbers $enum contents, steals \%s2i

fromEnum
 $enum = $enum->fromEnum($enum2);

clobbers $enum contents, does NOT steal $enum2->{i2s}

load
 $bool = $enum->load();

loads files to memory; enum must be opened

save
 $enum = $enum->save();
 $enum = $enum->save($base);

saves enum to $base; really just a wrapper for open() and flush()

See also DiaColloDB::Persistent.

I/O: header

headerKeys
 @keys = $coldb->headerKeys();

keys to save as header

loadHeaderData
 $bool = $enum->loadHeaderData($hdr);

instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.

I/O: text

loadTextFh
 $enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh);
 $enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh, %opts);

Loads from text file with lines of the form "ID SYMBOL...". clobbering enum contents. %opts locally clobber %$enum, especially:

 pack_s => $pack_s
saveTextFh
 $bool = $enum->saveTextFh($fh,%opts);
  • save from text file with lines of the form "ID SYMBOL..."

  • %opts locally clobber %$enum, especially:

     pack_s => $pack_s

Methods: population (in-memory only)

size
 $size = $enum->size();

wraps {size} key

setsize
 $newsize = $enum->setsize($newsize);

realy just wraps {size} key

addSymbols
 $newsize = $enum->addSymbols(@symbols);
 $newsize = $enum->addSymbols(\@symbols);
  • adds all symbols in @symbols which don't already have an ID

  • enum must be loaded to memory

appendSymbols
 $newsize = $enum->appendSymbols(@symbols);
 $newsize = $enum->appendSymbols(\@symbols);

adds all symbols in @symbols in order, messily re-mapping them if they already have an ID.

addEnum
 $newsize = $enum->addEnum($enum2_or_undef);

ensures all symbols from $enum2_or_undef are defined (undef:'')

Methods: lookup

i2s
 $s_or_undef = $enum->i2s($i);

Returns symbol for ID $i, or undef if no such symbol exists. In-memory cache overrides file contents.

s2i
 $i_or_undef = $enum->s2i($s);
 $i_or_undef = $enum->s2i($s, $ilo,$ihi);

Returns ID for symbol $s. Binary search between sorted symbol positions $ilo and $ihi (default=full enum). In-memory cache overrides file content.s

re2i
 \@is = $enum->re2i($regex);

Gets indices for all strings matching $regex.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2016 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::EnumFile::MMap(3pm), DiaColloDB::EnumFile::FixedLen(3pm), DiaColloDB::EnumFile::FixedMap(3pm), DiaColloDB::EnumFile::Tied(3pm), DiaColloDB::Persistent(3pm), DiaColloDB(3pm), perl(1), ...