The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

File::Locate::Iterator -- read "locate" database with an iterator

SYNOPSIS

 use File::Locate::Iterator;
 my $it = File::Locate::Iterator->new;
 while (defined (my $entry = $it->next)) {
   print $entry,"\n";
 }

DESCRIPTION

File::Locate::Iterator reads a "locate" database file in iterator style. Each next() call on the iterator returns the next entry from the database.

Locate databases normally hold filenames as a way of finding files faster than churning through directories on the filesystem. Optional glob, suffix and regexp options on the iterator can restrict the entries returned.

Only "LOCATE02" format files are supported, per current versions of GNU locate, not the previous "slocate" format.

Iterators from this module are stand-alone, they don't need any of the various iterator frameworks. See Iterator::Locate, Iterator::Simple::Locate and MooseX::Iterator::Locate to inter-operate with those frameworks, to use their style or convenient ways to grep, map or manipulate iterated sequences.

FUNCTIONS

Constructor

$it = File::Locate::Iterator->new (key=>value,...)

Create and return a new locate database iterator object. The following optional key/value pairs can be given,

database_file (string, default the system locate database)
database_fh (handle ref)

The file to read, either as filename or file handle. The default is the default_database_file below.

    $it = File::Locate::Iterator->new
            (database_file => '/foo/bar.db');

A filehandle is read with the usual PerlIO, so it can use layers and come from various sources but should be in binary mode.

database_str (string)

The database contents to read in the form of a byte string.

suffix (string)
suffixes (arrayref of strings)
glob (string)
globs (arrayref of strings)
regexp (string or regexp object)
regexps (arrayref of strings or regexp objects)

Restrict the entries returned to those with given suffix(es) or matching the given glob(s) or regexp(s). For example,

    # C code files on the system, .c and .h
    $it = File::Locate::Iterator->new
            (suffixes => ['.c','.h']);

If multiple patterns or suffixes are given then matches of any are returned.

Globs are in the style of the locate program which means fnmatch with no options (see File::FnMatch) and the pattern matching the full entry except a string with no wildcard "*", "?" or "[" can match anywhere.

    glob => '*.c'  # .c files, no .cxx files
    glob => '.c'   # fixed str, .cxx matches

Globs should be byte strings (not wide chars) since that's how the database entries are handled and also suspect fnmatch has no notion of charset coding in the strings or patterns.

use_mmap (string, default "if_sensible")

Whether to use mmap to access the database. This is fast and resource-efficient when it can be done. To use mmap you must have the File::Map module, the file must fit in available address space, and for a database_fh handle there mustn't be any transforming PerlIO layers. The options are

    undef           \
    "default"       | use mmap if sensible
    "if_sensible"   /
    "if_possible"   use mmap if possible, otherwise file I/O
    0               don't use mmap
    1               must use mmap, croak if cannot
    

Setting default, undef or omitted means if_sensible. if_sensible uses mmap if available, and the file size is reasonable, and for database_fh if it isn't already using an :mmap layer. if_possible skips those checks and just uses mmap whenever it can be done.

    $it = File::Locate::Iterator->new
            (use_mmap => 'if_possible');

When multiple iterators access the same file they share the mmap. The size check for if_sensible counts space in all File::Locate::Iterator mappings and won't go beyond 1/5 of available data space, which is assumed to be a quarter of the wordsize, so on a 32-bit system total at most 200Mb. if_possible and if_sensible restrict themselves to ordinary files because generally the file size on char specials is not reliable.

Operations

$entry = $it->next

Return the next entry from the database, or no values at end of file. An empty return means undef in scalar context or no values in array context so you can loop with either

    while (defined (my $filename = $it->next)) ...

or

    while (my ($filename) = $it->next) ...

The return is a byte string since it's normally a filename and as of Perl 5.10 filenames are handled as byte strings.

$it->rewind

Rewind $it back to the start of the database. The next $it->next call will return the first entry.

This is only possible when the underlying database file or handle is a plain file or something else seekable, perhaps with seekable PerlIO layers.

$filename = File::Locate::Iterator->default_database_file

Return the default database file used for new above. This is meant to be the same as the locate program uses and currently means $ENV{'LOCATE_PATH'} if set, otherwise /var/cache/locate/locatedb. In the future it might be possible to check how findutils has been installed.

OTHER NOTES

On some systems mmap may be a bit too effective, giving a process more of the CPU than other processes which make periodic system calls. This is an OS scheduling matter, but you might have to turn down the nice or ionice if doing a lot of mmapped work.

If an iterator using a file handle is cloned by a fork or new thread then generally it can be used by the parent or the child, but not both. If the handle is anything with a file descriptor then the underlying file position is shared by parent and child, so when one of them reads a block it upsets the position seen by the other. This problem affects almost all code working with file handles across fork or threads. Some CLONE code might let threads work correctly, though more slowly, but a fork is probably doomed.

Iterators using mmap work correctly for both forks and threads, although the mmap if_sensible size calculation and sharing is not thread-aware beyond those mmaps existing when the thread is forked off. Perhaps this will improve in the future.

A locate database is only designed to be read forwards, hence no prev method on the iterator. It's not possible to read backwards generally, since the start of a record can't be distinguished by its content, and the "front coding" means it might need data from other records an arbitrary distance yet further back.

ENVIRONMENT VARIABLES

LOCATE_PATH

Default locate database.

FILES

/var/cache/locate/locatedb

Default locate database, if LOCATE_PATH environment variable not set.

OTHER WAYS TO DO IT

File::Locate reads a locate database with callbacks instead. Whether you prefer callbacks or an iterator is a matter of style. Iterators let you write your own loop and have multiple searches in progress simultaneously.

The speed of an iterator is about the same as callbacks when File::Locate::Iterator is built with its XSUB code (requires Perl 5.10.0 or higher currently).

Iterators are good for cooperative coroutining like POE or Gtk where state must be held in some sort of variable to be progressed by calls from the main loop. Note that next() blocks on reading from the database, so the database generally should be a plain file rather than a socket or something, so as not to hold up a main loop.

If you have the recommended mmap (File::Map module) then iterators share an mmap of the database file. Otherwise currently each holds a separate open handle to the database, which means a file descriptor and PerlIO buffering per iterator. Sharing a handle and making each one seek to its desired position would be possible, but a seek drops buffered data and so would go slower. Some PerlIO trickery might transparently share an fd and hold buffered blocks from multiple file positions.

SEE ALSO

Iterator::Locate, Iterator::Simple::Locate, MooseX::Iterator::Locate

File::Locate, locate(1) and the GNU Findutils manual, File::FnMatch, File::Map

HOME PAGE

http://user42.tuxfamily.org/file-locate-iterator/index.html

COPYRIGHT

Copyright 2009, 2010 Kevin Ryde

File-Locate-Iterator is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

File-Locate-Iterator is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with File-Locate-Iterator. If not, see http://www.gnu.org/licenses/