The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::Locate::Harder - when you're determined to use a locate db

SYNOPSIS

   use File::Locate::Harder;

   my $flh = File::Locate::Harder->new();
   my $results_aref = $flh->locate( $search_term );

   # using a defined db location, plus some locate options
   my $flh = File::Locate::Harder->new( db => $db_file );
   my $results_aref = $flh->locate( $search_pattern,
                                    { case_insensitive => 1,
                                      regexp           => 1,
                                    } );

   # creating your own locate db, (in this example for doing tests)
   use Test::More;
   SKIP:
    {
      my $flh = File::Locate::Harder->new( db => undef );
      $flh->create_database( $path_to_tree_to_index, $db_file );

      if( $flh->check_locate ) {
         my $reason = "Can't get File::Locate::Harder to work";
         skip "Can't run 'locate'", $test_count;
      }
      my $results_aref = $flh->locate( $search_term );
      is_deeply( $results_aref, $expected_aref, "Found expected files");
    }

   # introspection (is it reading db directly, or shelling out to locate?)
   my $report = $flh->how_works;
   print "This is how File::Locate::Harder is doing locates: $report\n";

DESCRIPTION

File::Locate::Harder provides a generalized "locate" method to access the file system indexes used by the "locate" command-line utility. It is intended to be a relatively portable way for perl code to quickly ascertain what files are present on the current system.

This code is essentially a wrapper around multiple different techniques of accessing a locate database: it makes an effort to use the fastest method it can find that works.

The "locate" command is a well-established utility to find files quickly by using a special index database (typically updated via a cron-job). This module is an attempt at providing a perl front-end to "locate" which should be portable across most unix-like systems.

Behind the scenes, File::Locate::Harder silently tries many ways of doing the requested "locate" operation. If it can't establish contact with the file system's locate database, it will error out, otherwise you can be reasonably sure that a "locate" will return a valid result (including an empty set if the search matches nothing).

If possible, File::Locate::Harder will use the perl/XS module File::Locate to access the locate db directly, otherwise, it will attempt to shell out to a command line version of "locate".

If not told explicitly what locate db file to use, this module will try to find the file system's standard locate db using a number of reasonable guesses. If those all fail -- and it's possible for it to fail simply because file permissions make the db file effectively invisible -- as a last ditch effort, it will try shelling out to the command line "locate" without specifying a db for it (because it usually knows where to look).

Efficiency may be improved in some circumstances if you help File::Locate::Harder find the locate database, either by explicitly saying where it is (using the "db" attribute), or by setting the LOCATE_PATH environment variable. Also see the "introspection_results" method.

METHODS

new

Creates a new File::Locate::Harder object.

With no arguments, the newly created object (largely) has attributes that are undefined. All may be set later using accessors named according to the "set_*" convention.

Inputs:

An optional hashref, with named fields identical to the names of the object attributes. The attributes, in order of likely utility:

Settings for ways to run "locate"
case_insensitive

Like the usual command-line "-i".

regexp

The search term will be interpeted as a POSIX regexp

posix_extended

The search term is a regexp with the standard POSIX extensions.

Overall settings (for "locate", "create_database", etc)
db

Locate database file, with full path. Use this to work with a non-standard location, or set it to "undef" if you don't want this module to waste time looking for it (e.g. you might be planning to generate your own db via "create_database").

For internal use, testing, and so on:

The following items are lists used in the probing process which determines what works on the current system. These lists are defined with hardcoded defaults that will normally remain untouched, though are sometimes over-ridden for testing purposes.

locate_db_location_candidates

Likely places for a locate db. See "define_probe_parameters".

test_search_terms

Common terms in unix file paths. See "define_probe_parameters".

The following are status fields where the results of system probing are stored. The user not will normally be uninterested in these, though see "introspection_results" for a hint about performance improvements in repeated runs.

system_db_not_found

Could not find where the standard locate db is.

use_shell_locate

Shell out to locate and forget about using File::Locate

shell_locate_failed

So don't try probe_db_via_shell_locate again

shell_locate_cmd_idx

Integer: controls the choice of syntax of the locate shell cmd

init

Method that initializes object attributes and then locks them down to prevent accidental creation of new ones.

Not of interest to client coders, though inheriting code should have an init of it's own that calls this one.

locate

Simple interface to performs the actual "locate" operation in a robust, reliable way. Uses the locate db file indicated by the object's "db" attribute (which is set automatically if not manually overridden).

Input:

A term to search for in the file name or path.

Return:

An array reference of matching files with full paths.

create_database

Tries to create the locate database file indicated in the object data, indexing the tree indicated by a path given as an argument. A required second argument specifys the db file: the "db" field in the object is ignored by this method, though if the database is successfully created, the object's "db" field will be set to the newly created database.

Inputs:

(1) full path of tree of files to index (2) full path of db file to create

Return: false (undef) on failure.

introspection

check_locate

Returns true (1) if this module's 'locate' method is capable of working.

This is very similar to the "probe_db" method, except that with no arguments *and* an undefined object's db setting, this will initiate a "determine_system_db" run to try to find the standard system locate db.

Example usage:

  my $flh = File::Locate::Harder->new( { db => undef } );
  $flh->create_database( $tree_location, $db_file );
  if ( $flh->probe_db ) {
    my @files = $flh->locate( "want_this" ); # checks the newly created db,
                                             # just indexing $tree_location
    # ...
  }

  # Then later, if you want to search the whole file system...
  $flh->set_db( undef );
  if ( $flh->check_locate ) {
      my @hits = $flh->locate( "search_for_this" );
      * ...
  }

  # But even more convenient would be:
  if ( $flh->determine_system_db ) {
      my @hits = $flh->locate( "search_for_this" );
      * ...
  }

(Thus I suspect that this is a redundant, useless method.)

Rule of thumb: if you want to search the whole system, you can use check_locate to verify that "locate" will (most likely) work, but if you're using your own custom db (e.g. created via "create_database"), you might as well just use </probe_db>.

(Another rule of thumb: if this seems confusing, just ignore the issue for as long as you can.)

how_works

Returns a report on how this module has been doing "locate" operations (e.g. via the shell or the File::Locate module, and using which db).

introspection_results

Returns a hashref of the results of File::Locate::Harder's probing of the system's "locate" setup, so that it can be easily used again without re-doing that work.

Example:

  my $settings_href = $flh1->introspection_results;

  # save    $settings_href somehow (e.g. dump to yaml file)
  # restore $settings_href somehow

  my $flh2 = File::Locate::Harder->new( $settings_href );
shell_locate_version

Tries to determine the version of the shell's "locate" command.

This will work only with the GNU locate and Secure Locate variants, not the Free BSD.

Returns the version string on success, otherwise 0 for failure.

special purpose methods (usually, though not exclusively, for internal use)

locate_via_module

Uses the perl/XS module File::Locate to perform a locate operation on the given search term, using the db file indicated by the object's db attribute.

An optional second argument allows passing in a coderef, an anonymous routine that operates on each match (the match value is set to $_): this makes it possible to work with a large result without storing the entire set in memory.

Uses the three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) to control the way locate is performed.

locate_via_shell

Given a search term returns an array reference of matches found from a "locate" search.

An optional second argument containing the locate command's "options string" (e.g. "-i", "-r", "-re", etc) may be passed in (otherwise it is generated from object data).

This method uses object data settings: "db", "shell_locate_cmd_idx"

And indirectly (via "build_opts_for_locate_via_shell"): "case_insensitive", "regexp", "posix_extended"

methods largely for internal use

determine_system_db

Internally used routine: looks for a useable system-wide locate db.

Returns the path to the db if found, and as a side effect sets the object attribute "db".

probe_db

For when the locate db file you're interested in is known, and you want to initialize access for it (and as a side-effect, find out if it works).

Input: db file name with full path (optional, defaults to object's setting).

Return: for success, the db file name, on failure undef.

Side-effect: set's use_shell_locate if the access via module didn't work.

probe_db_via_module_locate

Looks to see if it can find anything in the given db by using the File::Locate module.

probe_db_via_shell_locate

Tries the series of standard test searches by shelling out to the command-line form of locate to make sure that it can be used.

Tries to use the locate db file indicated by the objects "db" attribute, but this can be over-ridden with an optional argument.

Under some circumstances, the db may remain undefined, but this method will return "1" for success if it appears that command-line locate works in any case.

As a side-effect, saves the "shell_locate_cmd_idx" that indicates a form of the locate command that has been observed to work.

Returns: undef for failure, and for success either the db or 1 (because locate can work even if this code can't figure out what db file it's using).

generate_locate_cmd

Given an ordered list of four required parameters, returns a form of the locate command which can (in theory) be fed to the shell. In practice these different forms are expected to fail (some harder than others) on various different platforms, so some experimentation may be needed to find a form that works (which is the job of "probe_db_via_shell_locate").

Inputs:

  $cmd_idx: integer index (beginning with 0) that chooses the
            form of a command to return.

  $search_term: string (or possibly regexp) to search for.

  $db: full path to the locate db to search.

  $opt_str: options string, defaults to values generated by
            build_opts_for_locate_via_shell

Example usage:

  for ($i=0; $i<=$self->generate_locate_cmd; $i++) {
     my $locate_cmd =
       $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str );
     my @result = `$locate_cmd 2 > /dev/null `;
     if ( scalar(@result) > 0 ) {
       return $i;
     }
  }

Note: the various forms of locate are discussed below in "locate shell command"

Special case:

with no arguments (specifically, with $cmd_idx undefined) returns the count of avaliable command forms minus 1 ($#cmd_forms);

build_opts_for_locate_via_shell

Converts the three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) into the command-line options string for locate.

The "posix_extended" feature is not supported for locates via the shell, and if used will issue a warning.

build_opts_for_locate_via_module

Converting three object attribute toggles ("case_insensitive", </"regexp">, </"posix_extended">) into the form that the File::Locate::locate requires: returns an array.

initialization utilities

define_probe_parameters

An internal method, used during the object init process.

Defines two arrays that are used to control the locate db "probe" process: the test_search_terms and the locate_db_location_candidates.

The locate_db_location_candidates are likely places for a system's locate db. See "details" below.

The test_search_terms are common terms in unix file paths, which we can check to see if what looks like the locate database really is one. See "checking if a form of locate works" below.

basic setters and getters

db

Getter for object attribute system_db

set_db

Setter for object attribute set_db

As a side-effect, unsets the shell_locate_failed flag (what if the last db file was bad, and this current setting will work?).

EXPERIMENTAL

Having some trouble straightening out the above code as-written. Going to work on some experimental routines here, that might have a use somewhere.

work_via

Try the db various ways, make a recommendation on how to access it. Return string: 'module' or 'shell'.

Q: how to handle the shell-but-undef-db case? A1: could be a third how-type 'shell_unknown' A2: could be some sort of meta-field, a "system_db_indeterminate" flag

automatic accessor generation

AUTOLOAD

Platforms

It's likely that this package will work on any unix-like system (including cygwin), though on some there might be a need for additional installation and setup (e.g. a "findutils" package).

Development was done on two varieties of linux (aka GNU/linux): Knoppix (32bit) on a Turion and Kubuntu on an Opteron machine. This covered two major varieties of the "locate" command: GNU locate and Secure Locate.

A serious attempt was made to support BSD locate on Freebsd, but the testing has not been completed.

Note: at present the File::Locate module appears to fail silently on 64bit platforms, so there the command-line shell locate will always be used.

MOTIVATION

This module uses File::Locate, which is a a perl XS interface to read locate (or slocate) dbs without shellling out to the command-line "locate" program.

File::Locate has one great limitation: it must be told which locate db to use (by explicit parameter, or by environment variable), it has no notion of a default location. Further, as of this writing, it appears to be limited to 32bit systems.

This module then is a wrapper around File::Locate that tries a number of common locations for the locate database, and instead of just giving up, it also tries the command-line locate, which has it's own ways of knowing where the database can be (configuration file, compiled-in default, or command-line parameter).

The intention here is to make this module as portable as possible... it might, for example, be useful to use in portable CPAN modules that need to look for things in the filesystem.

(As a case in point: the job of File::Locate::Harder would be a lot easier if it could use "locate" to find the locate db...).

Additional Examples

forcing locate via File::Locate module or via shell command

  my $flh = File::Locate::Harder->new();
  $result_via_module = $flh->locate_via_module( $term );
  $result_via_shell  = $flh->locate_via_shell(  $term );

using the coderef feature of the File::Locate module

  my $count = 0;
  $flh->locate_via_module( $term, sub { $count++ } );
  print "There are $count matches of $term\n";


  $flh->locate_via_module( $term,
          sub { $count++ if $_ =~ m{ ^ /home }x } );
  print "There are $count matches of $term located in /home\n";

speeding up multiple searches if you know you're using shell locate

This reduces the number of calls to build_opts_for_locate_via_shell:

  my @searches = qw( .bashrc .bash_profile .emacs default.el );
  my $flh = File::Locate::Harder->new();
  my $opt_str = $self->build_opts_for_locate_via_shell;
  foreach my $term (@searches) {
    $result_via_shell  = $flh->locate_via_shell( $term, $opt_str );
  }

SEE ALSO

File::Locate

Manual pages: locate, slocate, and/or updatedb.

NOTES

architecture

The general philosophy in use here is to just try things that are likely to work and then just try something else if they fail. This is probably better than attempting to guess which form of locate to use based on the current platform, because (a) no one (to my knowledge) has a capabilities database that specifies which locate is found on which platform (b) different variants may be installed at the whim of a sysadmin (c) there may after all be variants of locate I've never encountered.

So checking ^O is of limited utility, and similarly, some of the existing forms of locate lack introspection features (e.g. you can't get freebsd's locate to tell you what version it is).

details

The object creation process "new" and "init" determines how to do system-wide locates, and saves it's conclusions for use by future calls of the locate method on this object.

Some of this elaborate initialization process can be short-circuited if it's told which db file to use, or even just giving it an "db" option with an undefined value. That's convenient for cases where you want to use this module to create a locate db of your own (there's no point in scoping for a system-wide db if we're going to use a specialized one).

If the db location is not known, the search process begins with making guesses about likely locations it might be found. It goes through this list:

 /var/lib/slocate/slocate.db  -- Secure Locate under Kubuntu
 /var/cache/locate/locatedb   -- GNU locate, under Knoppix
 /var/db/locate.database      -- BSD locate, under FreeBSD
 /usr/var/locatedb            -- mentioned: File::Locate docs and cygwin lists
 /var/lib/locatedb            -- mentioned on insecure.org
 /usr/local/var/locatedb      -- Solaris with findutils installed
 /var/lib/locate/locatedb     -- mentioned on a Debian list in 2000
 /var/spool/locate/locatedb   -- speculative mention on a cygwin list

So that's three names, in 8 locations. It also tries other permutations on speculation:

 /var/cache/locate/slocate.db
 /var/db/slocate.db
 /usr/var/slocate.db
 /usr/local/var/slocate.db
 /var/lib/locate/slocate.db
 /var/spool/locate/slocate.db

 /var/lib/slocate/locate.database
 /var/cache/locate/locate.database
 /usr/var/locate.database
 /usr/local/var/locate.database
 /var/lib/locate/locate.database
 /var/spool/locate/locate.database

 /var/lib/slocate/locatedb
 /var/db/locatedb

Each of these possibilites is checked for simple file-existance, and then checked to see if one works. (See "checking if a form of locate works" below.)

locate shell command

If attempts at using File::Locate fails, the system falls back to shelling out to the locate command (which really should already know how to find the system-wide db, either from a compiled-in default or a config file setting).

But the locate shell command has it's own problems. There are at least three variants, with some slight differences between GNU locate, slocate and freebsd locate.

The current architecture of locate_via_shell tries all of them in a certain order, and remembers the one that worked last time.

Briefly, here are the variations we need to account for:

-d or --database

-d is essentially more general, because freebsd has it but does not have --database. So, we try "-d" first, but also try "--database" just in case.

-q for quiet

As of this writing, with slocate, if you tell it explicitly which db to use, that works, but you also get an ignorable error about how you don't have permissions to mess with the system wide database. You can get this warning to go away with the "-q" option, but neither Gnu locate or freebsd has it, and if you use it with them it's a fatal error. So here we try to use "-q" first, and if that dies, we run without it.

And still other variations exist in requesting version information. The FreeBSD form does not understand "--version", and in fact doesn't seem to have any sort of version option.

(Ah, Cross-platform programming is such a joy.)

checking if a form of locate works

In order to check that a system-wide locate is working, we probe for files we know (or strongly suspect) will be there on the system. This module tries a series of guesses of decreasing specificity (there's no point in getting a huge number of hits if they're not needed), then bails out on the list if a result is recieved.

The list in use here begins with files in the standard perl library (which should accompany almost any installation of perl, unless they were removed for some reason):

  MakeMaker
  SelfStubber
  DynaLoader

It then begins looking for strings that should be relatively common on most systems:

  README
  tmp
  bin
  the
  htm
  txt
  home

The presumption is that if there are no hits on those searchs on a system-wide database, something is very wrong, and that particular form of "locate" just isn't working.

File::Locate

By using File::Locate with () to supress import, we need to call 'locate' like so:

   File::Locate::locate

which makes it easy for us to define a new 'locate' method of our own.

The proceedural syntax of File::Locate::locate has it's ugly aspects, but the documentation is usually clear:

  my @mp3s = File::Locate::locate "mp3", "/usr/var/locatedb";

  # do regex search
  @hits = File::Locate::locate "^/usr", -rex => 1, "/usr/var/locatedb";

  @hits = File::Locate::locate "^/usr", -rexopt => 'ie', "/usr/var/locatedb";
  # i - case insensitive
  # e - POSIX extended regexps (say what?)

Note: it isn't abundantly clear from the documentation if -rexopt has to be used with -rex, but it appears that this is the case. (And there is a syntax diagram that indicates this).

Another oddity, though: there doesn't seem to be a way to do a case-insensitive search without using regexps. (Note: none of the tests use the "-rexopt" feature.)

A very cool touch is that you can hand it a coderef, and avoid building up a big result set:

   File::Locate::locate "*.mp3", sub { print "MP3 found: $_n" };

Note: the order of arguments to File::Locate::locate is supposed to be irrelevant.

creating a database

Creating your own private locate database isn't done very often, but this module tries to support it largely for purposes of writing portable tests (we can't know what files are installed on a remote system, so it's difficult to know what a locate operation should have found... *unless* we generate a small locate database of our own that tracks a known set of files that we ship with the tests).

Unfortunately there are several different invocation forms for doing this, depending on the variant of locate you have installed. As usual, we try everything we can think of, and only give up if none of them work.

  my @cmd = ( "slocate -U $location -o $db",
              "updatedb --require-visibility 0 --output=$db --database-root='$location'",
              "updatedb --output=$db --localpaths='$location'",
            );

It probably comes as no surprise that "slocate" and "updatedb" have different forms. I was, uh, *interested* to see that my updatedb works differently now (2010) than when I wrote this code in 2007.

The man page for the version of updatedb installed on my Ubuntu "jaunty" box has a version of "update" db written by: "Miloslav Trmac <mitr@redhat.com>" where the option I need is called "--database-root", I see that the old option name I was using, "--localpaths", was used by a version written by "Glenn Fowler <gsf@research.att.com>".

Also, with the RedHat version-- which looks as though it thinks of itself as "mlocate"-- the "-require-visibility 0" option is recommended for the creation of a small, private locate db.

system status fields

The system status fields (the one's that can be saved or inspected via introspection_results) no doubt seem redundant:

 db
 system_db_not_found
 use_shell_locate
 shell_locate_failed
 shell_locate_cmd_idx

It's possible that they *are* somewhat redundant: they were invented on-the-fly during development on an ad hoc basis.

However, despite the way it looks, this set is resistant to being reduced in size. Two-valued logic has it's limitations: for our immediate purpose, there has to be ways to distinguish between "I don't know what this value is, and you should try to find out" and "I don't know what this value is, and it isn't worth trying to find it." For example, the "db" field alone isn't good enough, it needs to be supplemented with information about what we've done to try to determine the "db".

As for "use_shell_locate" and "shell_locate_failed": "shell_locate_failed" is used largely to skip doing a probe via shell if it's failed before (possibly it's name should be expanded to "shell_locate_probe_failed"). Even if the system has been explicitly told to work via the shell, it's still necessary to do a probe to find out which form of the shell locate command will work ("shell_locate_cmd_idx").

AUTHOR

Joseph Brenner, <doom@kzsu.stanford.edu>, 29 May 2007

COPYRIGHT AND LICENSE

Copyright (C) 2007, 2010 by Joseph Brenner

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.

BUGS

None reported... yet.