package File::Locate::Harder;
use base qw( Class::Base );

=head1 NAME

File::Locate::Harder - when you're determined to use a locate db

=head1 SYNOPSIS

   use File::Locate::Harder;

   my $flh = File::Locate::Harder->new();
   my $results_aref = $flh->locate( $search_term );

   # using a defined db location, plus some locate options
   my $flh = File::Locate::Harder->new( db => $db_file );
   my $results_aref = $flh->locate( $search_pattern,
                                    { case_insensitive => 1,
                                      regexp           => 1,
                                    } );

   # creating your own locate db, (in this example for doing tests)
   use Test::More;
   SKIP:
    {
      my $flh = File::Locate::Harder->new( db => undef );
      $flh->create_database( $path_to_tree_to_index, $db_file );

      if( $flh->check_locate ) {
         my $reason = "Can't get File::Locate::Harder to work";
         skip "Can't run 'locate'", $test_count;
      }
      my $results_aref = $flh->locate( $search_term );
      is_deeply( $results_aref, $expected_aref, "Found expected files");
    }

   # introspection (is it reading db directly, or shelling out to locate?)
   my $report = $flh->how_works;
   print "This is how File::Locate::Harder is doing locates: $report\n";



=head1 DESCRIPTION

File::Locate::Harder provides a generalized "locate" method to access
the file system indexes used by the "locate" command-line utility.
It is intended to be a relatively portable way for perl code to
quickly ascertain what files are present on the current system.

This code is essentially a wrapper around multiple different techniques
of accessing a locate database: it makes an effort to use the fastest
method it can find that works.

The "locate" command is a well-established utility to find files
quickly by using a special index database (typically updated via a
cron-job).  This module is an attempt at providing a perl front-end
to "locate" which should be portable across most unix-like systems.

Behind the scenes, File::Locate::Harder silently tries many ways
of doing the requested "locate" operation.  If it can't establish
contact with the file system's locate database, it will error
out, otherwise you can be reasonably sure that a "locate" will
return a valid result (including an empty set if the search matches
nothing).

If possible, File::Locate::Harder will use the perl/XS module
L<File::Locate> to access the locate db directly, otherwise, it
will attempt to shell out to a command line version of "locate".

If not told explicitly what locate db file to use, this module will
try to find the file system's standard locate db using a number of
reasonable guesses.  If those all fail -- and it's possible for it to
fail simply because file permissions make the db file effectively
invisible -- as a last ditch effort, it will try shelling out to the
command line "locate" without specifying a db for it (because it
usually knows where to look).

Efficiency may be improved in some circumstances if you help
File::Locate::Harder find the locate database, either by explicitly
saying where it is (using the "db" attribute), or by setting the
LOCATE_PATH environment variable.  Also see the L</"introspection_results">
method.

=head2 METHODS

=over

=cut

use 5.006;
use strict;
use warnings;
use Carp;
use Data::Dumper;
use Hash::Util qw( lock_keys unlock_keys );
use File::Path     qw(mkpath);
use File::Basename qw(fileparse basename dirname);

# Note: File::Locate is now "require"ed during init instead of "use"ed.

our $VERSION = '0.06';

# for autoload generated accessors
our $AUTOLOAD;
my %ATTRIBUTES = ();

=item new

Creates a new File::Locate::Harder object.

With no arguments, the newly created object (largely) has
attributes that are undefined.  All may be set later using
accessors named according to the "set_*" convention.

Inputs:

An optional hashref, with named fields identical to the names of
the object attributes.  The attributes, in order of likely utility:

=over

=item  Settings for ways to run "locate"

=over

=item case_insensitive

Like the usual command-line "-i".

=item regexp

The search term will be interpeted as a POSIX regexp

=item posix_extended

The search term is a regexp with the standard POSIX extensions.

=back

=item Overall settings (for "locate", "create_database", etc)

=over

=item db

Locate database file, with full path.  Use this to work with a
non-standard location, or set it to "undef" if you don't want this
module to waste time looking for it (e.g. you might be planning to
generate your own db via L</create_database>).

=back

=item For internal use, testing, and so on:

The following items are lists used in the probing process which
determines what works on the current system.  These lists are
defined with hardcoded defaults that will normally remain
untouched, though are sometimes over-ridden for testing
purposes.

=over

=item locate_db_location_candidates

Likely places for a locate db.  See L</define_probe_parameters>.

=item test_search_terms

Common terms in unix file paths. See L</define_probe_parameters>.

=back

The following are status fields where the results of system probing
are stored.  The user not will normally be uninterested in these,
though see L</"introspection_results"> for a hint about performance
improvements in repeated runs.

=over

=item system_db_not_found

Could not find where the standard locate db is.

=item use_shell_locate

Shell out to locate and forget about using File::Locate

=item shell_locate_failed

So don't try probe_db_via_shell_locate again

=item shell_locate_cmd_idx

Integer: controls the choice of syntax of the locate shell cmd

=back

=back

=cut

# Note: "new" is inherited from Class::Base, and
# calls the following "init" routine automatically.

=item init

Method that initializes object attributes and then locks them
down to prevent accidental creation of new ones.

Not of interest to client coders, though inheriting code should have
an init of it's own that calls this one.

=cut

sub init {
  my $self = shift;
  my $args = shift;
  unlock_keys( %{ $self } );

  # all object attributes, including arguments that become attributes
  my @attributes =
    (
        'db',  # locate database file, with full path

        # results of system introspection on how 'locate' works here
        'system_db_not_found',
        'use_shell_locate',      # shell out to locate, forget File::Locate
        'shell_locate_failed',   # so don't try probe_db_via_shell_locate again
        'shell_locate_cmd_idx',  # integer, controls syntax of locate shell cmd

        # lists to try in sequence until one works
        'test_search_terms',     # common terms on perl/unix systems
        'locate_db_location_candidates',  # likely places for a locate db

        # options settings for different styles of "locate"
        'case_insensitive',
        'regexp',
        'posix_extended',
     );

  # transform args into attributes, if on the approved list
  foreach my $field (@attributes) {
    $ATTRIBUTES{ $field } = 1;
    $self->{ $field } = $args->{ $field };
  }
  # (all accessors are now fair game to use here)

  $self->define_probe_parameters;
     # that is, the test_search_terms and locate_db_location_candidates

  # Try to load module "File::Locate", if it fails we'll try shell locate
  eval { require File::Locate };
  if ($@) {
    $self->set_use_shell_locate( 1 );
  }

  my $probe_db = 1;
  # check for defined db field, but undef value
  if ( grep{ m/^db$/ } (keys %{ $args } )  ||
       ( not( defined( $args->{db} ) ) ) ) {
    # if found we should *not* probe for a file-system db
    $probe_db = 0;
  }

  # two issues: determining which db to use
  # and how to use it db (i.e. via module or shell)
  my $db;
  if ( $probe_db ) {
    if ( $db = $args->{ db } || $ENV{ LOCATE_PATH } ) {
      $self->set_db( $db );

      # even if we're told which db to use, must still determine how
      # But: we can't probe it if it's not created yet,
      # And: there's no point if we already know we're going via shell
      if ( -e $db  &&
           ( not ( $self->use_shell_locate ) ) ) {

        # will we use db fast way (module) or slow but sure (shell)
        $self->probe_db( $db ); # note: sets use_shell_locate
          # TODO check return for failure?

      }
    } elsif ( $db = $self->determine_system_db( ) ) {
      # using the standard file system locate db
    } elsif ( $self->probe_db_via_shell_locate ) {
             # the db is unknown to us, but locate may still know
      $self->set_use_shell_locate( 1 );
    } else {
      croak "File::Locate::Harder is not working. " .
        "Problem with 'locate' installation?";
    }
  } # end if don't probe db

  lock_keys( %{ $self } );
  return $self;
}

=item locate

Simple interface to performs the actual "locate" operation
in a robust, reliable way.  Uses the locate db file indicated
by the object's "db" attribute (which is set automatically if
not manually overridden).

Input:

A term to search for in the file name or path.

Return:

An array reference of matching files with full paths.

=cut

sub locate {
  my $self           = shift;
  my $search_term    = shift;
  my $locate_options = shift;

  # apply the current locate options but preserve object settings
  my $original_settings = {
        case_insensitive => $self->case_insensitive,
        regexp           => $self->regexp,
        posix_extended   => $self->posix_extended,
     };

  foreach my $field (keys (%{ $locate_options })){
    my $setter = "set_$field";
    $self->$setter( $locate_options->{ $field } );
  }

  # farm out the locate operation to "via_shell" or "via_module"
  my $result = [];
  if ( $self->use_shell_locate ) {
    $result = $self->locate_via_shell(  $search_term );
  } else {
    $result = $self->locate_via_module( $search_term );
  }

  # restore the original object settings of locate options
  foreach my $field (keys (%{ $original_settings })){
    my $setter = "set_$field";
    $self->$setter( $original_settings->{ $field } );
  }

  return $result;
}

=item create_database

Tries to create the locate database file indicated in the object
data, indexing the tree indicated by a path given as an argument.  A
required second argument specifys the db file: the "db" field in the
object is ignored by this method, though if the database is
successfully created, the object's "db" field will be set to the
newly created database.

Inputs:

(1) full path of tree of files to index
(2) full path of db file to create

Return:
false (undef) on failure.

=cut

sub create_database {
  my $self     = shift;
  my $location = shift;
  my $db       = shift;

  mkpath( dirname( $db ));

  my @cmd = ( "slocate -U $location -o $db",
              "updatedb --require-visibility 0 --output=$db --database-root='$location'",
              "updatedb --output=$db --localpaths='$location'",
            );

  my $status = undef;
  TRY_AGAIN:
  foreach my $cmd (@cmd) {
    $self->debug("Trying cmd:\n$cmd\n");
    my $ret;
    $ret = system( "$cmd 2>/dev/null" );
    if ( $ret != 0  ) {
      $self->debug( "Failed locate db create command:\n  $cmd\n" );
      $self->debug( "\$\?: $?\n" ) if $?;
      next TRY_AGAIN;
    } else {
      $status = 1;
      $self->set_db( $db );
      last;
    }
  }
  if ( not( $status ) ) {
    carp "Could not create db: $db to index $location";
  }
  if ( -e $db ) {
    my $mtime = (stat $db)[9];
    my $timestamp = ( localtime($mtime) );
    $self->debug("Looks like database has been created: $db at $timestamp\n");
  }

  return $status;
}

=back

=head2 introspection

=over

=item check_locate

Returns true (1) if this module's 'locate' method is capable of working.

This is very similar to the L</probe_db> method, except that with no
arguments *and* an undefined object's db setting, this will
initiate a L</determine_system_db> run to try to find the standard
system locate db.

Example usage:

  my $flh = File::Locate::Harder->new( { db => undef } );
  $flh->create_database( $tree_location, $db_file );
  if ( $flh->probe_db ) {
    my @files = $flh->locate( "want_this" ); # checks the newly created db,
                                             # just indexing $tree_location
    # ...
  }

  # Then later, if you want to search the whole file system...
  $flh->set_db( undef );
  if ( $flh->check_locate ) {
      my @hits = $flh->locate( "search_for_this" );
      * ...
  }

  # But even more convenient would be:
  if ( $flh->determine_system_db ) {
      my @hits = $flh->locate( "search_for_this" );
      * ...
  }

(Thus I suspect that this is a redundant, useless method.)

Rule of thumb: if you want to search the whole system, you can use check_locate
to verify that L</locate> will (most likely) work, but if you're using your own
custom db (e.g. created via L</create_database>), you might as well just use
</probe_db>.

(Another rule of thumb: if this seems confusing, just ignore the issue
for as long as you can.)

=cut

sub check_locate {
  my $self = shift;

  my $db = shift || $self->db || $self->determine_system_db;
  my $ret = $self->probe_db( $db );
  return $ret;
}

=item how_works

Returns a report on how this module has been doing "locate"
operations (e.g. via the shell or the File::Locate module,
and using which db).

=cut

sub how_works {
  my $self = shift;
  my $db = $self->db | 'unknown';
  my $report = '';
  if ( $self->use_shell_locate ) {
    my $version = $self->shell_locate_version || '';
    $report = "We shell out to locate version: $version\n using the locate db: $db\n";
  } else {
    $report = "Using File::Locate with the locate db: $db\n";
  }
  return $report;
}

=item introspection_results

Returns a hashref of the results of File::Locate::Harder's
probing of the system's "locate" setup, so that it can be
easily used again without re-doing that work.

Example:

  my $settings_href = $flh1->introspection_results;

  # save    $settings_href somehow (e.g. dump to yaml file)
  # restore $settings_href somehow

  my $flh2 = File::Locate::Harder->new( $settings_href );

=cut

sub introspection_results {
  my $self     = shift;
  my $settings =
    {
       db                   => $self->db,
       system_db_not_found  => $self->system_db_not_found,
       use_shell_locate     => $self->use_shell_locate,
       shell_locate_failed  => $self->shell_locate_failed,
       shell_locate_cmd_idx => $self->shell_locate_cmd_idx,
    };
  return $settings;
}

=item shell_locate_version

Tries to determine the version of the shell's "locate" command.

This will work only with the GNU locate and Secure Locate
variants, not the Free BSD.

Returns the version string on success, otherwise 0 for failure.

=cut

sub shell_locate_version {
  my $self = shift;

  my @cmd = ( 'locate --version',  # gnu & slocate
              'locate -V',         # slocate
            );                     # note: freebsd has no version option

  my $ret = 0;
  CMD:
  foreach my $cmd (@cmd) {
    $self->debug("Trying cmd:\n$cmd\n");
    chomp(
          $ret = `$cmd`
         );
    if ($ret) {
      last CMD;
    } else {
      $self->debug( "Failed locate version request of form:\n  $cmd\n" );
      $self->debug( "\$\?: $?\n" ) if $?;
      next CMD;
    }
  }

  if ( not( $ret ) ) {
    carp "Could not get version of locate shell command";
  }
  return $ret;
}

=back

=head2 special purpose methods (usually, though not exclusively, for internal use)

=over

=item locate_via_module

Uses the perl/XS module L<File::Locate> to perform a locate
operation on the given search term, using the db file
indicated by the object's db attribute.

An optional second argument allows passing in a coderef,
an anonymous routine that operates on each match (the match
value is set to $_): this makes it possible to work with
a large result without storing the entire set in memory.

Uses the three object attribute toggles
(L</"case_insensitive">, </"regexp">, </"posix_extended">)
to control the way locate is performed.

=cut

sub locate_via_module {
  my $self        = shift;
  my $search_term = shift;
  my $coderef     = shift;

  my $db = $self->db;

  my @opts =  $self->build_opts_for_locate_via_module;

  if( not( $coderef ) ) {
    my @results = File::Locate::locate( $search_term, @opts, $db );
    return \@results;
  } else {
    my $ret = File::Locate::locate( $search_term, @opts, $db, $coderef );
    return $ret;
  }
}

=item locate_via_shell

Given a search term returns an array reference of matches found
from a "locate" search.

An optional second argument containing the locate command's
"options string" (e.g. "-i", "-r", "-re", etc) may be passed
in (otherwise it is generated from object data).

This method uses object data settings:
L</"db">, L</"shell_locate_cmd_idx">

And indirectly (via L</build_opts_for_locate_via_shell>):
L</"case_insensitive">, L</"regexp">, L</"posix_extended">

=cut

sub locate_via_shell {
  my $self              = shift;
  my $search_term       = shift;
  my $opt_str_override  = shift;

  unless( defined( $self->shell_locate_cmd_idx ) ) {
    $self->probe_db_via_shell_locate; # side effect: determine cmd_idx
  }

  my $cmd_idx = $self->shell_locate_cmd_idx;
  my $db      = $self->db;
  my $opt_str = $opt_str_override || $self->build_opts_for_locate_via_shell;

  my ($locate_cmd, @results);

    $locate_cmd =
      $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str );
    chomp(
          @results = `$locate_cmd`
         );

  return \@results;
}

=back

=head2 methods largely for internal use

=over

=item determine_system_db

Internally used routine: looks for a useable system-wide locate db.

Returns the path to the db if found, and as a side effect sets the
object attribute "db".

=cut

# Note: for efficiency reasons, this trys to access all
# candidates via module before falling back on via shell. That's
# the reason this routine does not use the probe_db method

sub determine_system_db {
  my $self = shift;
  if ( $self->system_db_not_found ) {
    return;  # might as well bail if we've failed before
  }

  my $candidates = $self->locate_db_location_candidates;
  my @exist = grep { -e $_ } @{ $candidates };

  foreach my $db (@exist) {
    if( $self->probe_db_via_module_locate( $db ) ) {
      $self->set_db( $db );
      return $db;
    }
  }
  foreach my $db (@exist) {
    if( $self->probe_db_via_shell_locate( $db ) ) {
      # $self->set_use_shell_locate(1); ### TODO -- why not do this here
      $self->set_db( $db );
      return $db;
    }
  }
  $self->set_system_db_not_found( 1 );
  return;
}


=item probe_db

For when the locate db file you're interested in is known,
and you want to initialize access for it (and as a side-effect,
find out if it works).

Input: db file name with full path (optional, defaults to object's setting).

Return: for success, the db file name, on failure undef.

Side-effect: set's use_shell_locate if the access via module
didn't work.

=cut

sub probe_db {
  my $self = shift;

  my $db = shift || $self->db;

  # will we use db fast way (module) or slow but sure (shell)
  if ( $self->probe_db_via_module_locate ) {
    # File::Locate module works, so use it
    return $db; # success
  } elsif ( $self->probe_db_via_shell_locate ) {
    $self->set_use_shell_locate( 1 );
    return $db; # success
  } else {
    return;     # failed
  }
}



=item probe_db_via_module_locate

Looks to see if it can find anything in the given db by using
the File::Locate module.

=cut

sub probe_db_via_module_locate {
  my $self = shift;
  my $db   = shift || $self->db;

  # bail immediately if we've already know via_module doesn't work
  if ( $self->use_shell_locate ) {
    return;
  }

  my @test_search_terms =
    @{ $self->test_search_terms  };

  foreach my $search_term (@test_search_terms) {
    my $found_one;
    eval {
      # in scalar context, this 'locate' returns a boolean
      $found_one = File::Locate::locate( $search_term, $db );
    };
    if ($@) { # traps errors reported by the File::Locate module
      $self->debug("File::Locate::locate had a problem with $db:\n$@");
      return;
    }
    if ( $found_one ) {
      return $db;
    } else {
       $self->debug("File::Locate::locate found no $search_term via $db:\n$?");
    }
  }
  return;
}

=item probe_db_via_shell_locate

Tries the series of standard test searches by shelling out to
the command-line form of locate to make sure that it can be used.

Tries to use the locate db file indicated by the objects "db"
attribute, but this can be over-ridden with an optional argument.

Under some circumstances, the db may remain undefined, but this
method will return "1" for success if it appears that command-line
locate works in any case.

As a side-effect, saves the L</"shell_locate_cmd_idx"> that
indicates a form of the locate command that has been observed
to work.

Returns: undef for failure, and for success either the db or 1
(because locate can work even if this code can't figure out what
db file it's using).

=cut

sub probe_db_via_shell_locate {
  my $self = shift;

  if ( $self->shell_locate_failed ) {
    return;  # bail now if we've failed before
  }

  my $default_db;
  if ( $self->system_db_not_found ) {
    $default_db = undef;  # locate may find the system db even if we can't
  } else {
    $default_db = $self->db;
  }
  my $db   = shift || $default_db;
  my $true = $db || 1;

  my $opt_str = $self->build_opts_for_locate_via_shell;

  # Nested loops of trials
  # The outer loop: different syntax variations of the locate cmd
  # The inner loop: a series of terms to try searching for.

  my $test_search_terms_aref = $self->test_search_terms;
  my @test_search_terms;
  @test_search_terms =
    @{ $test_search_terms_aref } if $test_search_terms_aref;

  my $lim = $self->generate_locate_cmd;
  for (my $cmd_idx = 0; $cmd_idx <= $lim; $cmd_idx++) {

    foreach my $search_term (@test_search_terms) {

      my $locate_cmd =
        $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str );
      chomp(
            my @hits = `$locate_cmd 2>/dev/null`
           );

      if ( scalar( @hits )  > 0 ) {
        $self->set_shell_locate_cmd_idx( $cmd_idx );
        return $true;
      }
    }
  }
  $self->set_shell_locate_failed( 1 );
  return;
}

=item generate_locate_cmd

Given an ordered list of four required parameters, returns a form
of the locate command which can (in theory) be fed to the shell.
In practice these different forms are expected to fail (some
harder than others) on various different platforms, so some
experimentation may be needed to find a form that works (which
is the job of L</probe_db_via_shell_locate>).

Inputs:

  $cmd_idx: integer index (beginning with 0) that chooses the
            form of a command to return.

  $search_term: string (or possibly regexp) to search for.

  $db: full path to the locate db to search.

  $opt_str: options string, defaults to values generated by
            build_opts_for_locate_via_shell

Example usage:

  for ($i=0; $i<=$self->generate_locate_cmd; $i++) {
     my $locate_cmd =
       $self->generate_locate_cmd( $cmd_idx, $search_term, $db, $opt_str );
     my @result = `$locate_cmd 2 > /dev/null `;
     if ( scalar(@result) > 0 ) {
       return $i;
     }
  }

Note: the various forms of locate are discussed below in
L</"locate shell command">

Special case:

with no arguments (specifically, with $cmd_idx undefined) returns
the count of avaliable command forms minus 1 ($#cmd_forms);

=cut

sub generate_locate_cmd {
  my $self        = shift;
  my $cmd_idx     = shift;
  my $search_term = shift || '';  # suppressing warnings about subbing undefs
  my $db          = shift || '';
  my $opt_str     = shift || $self->build_opts_for_locate_via_shell || '';

  $self->debug("cmd_idx: $cmd_idx\n") if defined( $cmd_idx );
  $self->debug("generate_locate_cmd: " .
               "db: $db  search_term: $search_term\n");

  my @shell_locate_cmds;
  if( $db ) {
    @shell_locate_cmds =
      (
       "locate -q -d '$db' $opt_str $search_term",
       "locate -d '$db' $opt_str $search_term",
       "locate -q --database='$db' $opt_str $search_term",
       "locate  --database='$db' $opt_str $search_term",
      );
  } else {
    @shell_locate_cmds =
      (
       "locate -q $opt_str $search_term",
       "locate $opt_str $search_term",
       "locate -q $opt_str $search_term",
       "locate $opt_str $search_term",
      );
  }

  my $limit = $#shell_locate_cmds;
  if ( not( defined( $cmd_idx ) ) ) {
    return $limit;
  }

  if ( $cmd_idx > $limit ) {
    return; # undef
  }

  my $cmd = $shell_locate_cmds[ $cmd_idx ];
  $self->debug("generate_locate_cmd: returned cmd:\n$cmd\n");
  return $cmd;
}

=item build_opts_for_locate_via_shell

Converts the three object attribute toggles
(L</"case_insensitive">, </"regexp">, </"posix_extended">)
into the command-line options string for locate.

The "posix_extended" feature is not supported for locates
via the shell, and if used will issue a warning.

=cut

sub build_opts_for_locate_via_shell {
  my $self     = shift;
  my $opt_str  = '';
  if ( $self->case_insensitive ) {
    $opt_str .= 'i';
  };
  if ( $self->regexp ) {
    $opt_str .= 'r';
  }
  if ( $self->posix_extended ) {
    carp("Can't use posix extended regexps with locate via the shell");
  };
  $opt_str = "-$opt_str" if $opt_str;
  return $opt_str;
}

=item build_opts_for_locate_via_module

Converting three object attribute toggles
(L</"case_insensitive">, </"regexp">, </"posix_extended">)
into the form that the File::Locate::locate
requires: returns an array.

=cut

sub build_opts_for_locate_via_module {
  my $self = shift;
  my $rexopt_str = '';
  my @opts = ();
  if ( $self->case_insensitive ) {
    $rexopt_str .= 'i';
  };
  if ( $self->posix_extended ) {
    $rexopt_str .= 'e';
  };
  if ( $self->regexp || $rexopt_str ) { # any -rexopt (even 'i') implies
                                        # a need for -rex
    @opts = (-rex => 1);
  }
  push @opts, (-rexopt => $rexopt_str) if $rexopt_str;
  return @opts;
}

=back

=head2 initialization utilities

=over

=item define_probe_parameters

An internal method, used during the object init process.

Defines two arrays that are used to control the locate db "probe"
process: the test_search_terms and the
locate_db_location_candidates.

The locate_db_location_candidates are likely places for a
system's locate db.  See L</details> below.

The test_search_terms are common terms in unix file paths,
which we can check to see if what looks like the locate
database really is one.  See L</"checking if a form of locate works">
below.

=cut

sub define_probe_parameters {
  my $self = shift;

  # common strings in file paths on perl/unix systems,
  # in roughly increasing likelihood of size of search result
  my @test_search_terms =
    qw(
        MakeMaker
        SelfStubber
        DynaLoader
        README
        tmp
        bin
        the
        htm
        txt
        home
        e
        /
     );
  $self->set_test_search_terms( \@test_search_terms );

  # some places one might look for the system's
  # locate db, in roughly increasing order of likelihood
  my @candidates =
    qw(
        /var/lib/slocate/slocate.db
        /var/cache/locate/locatedb
        /var/db/locate.database
        /usr/var/locatedb
        /var/lib/locatedb
        /usr/local/var/locatedb
        /var/lib/locate/locatedb
        /var/spool/locate/locatedb

        /var/cache/locate/slocate.db
        /var/db/slocate.db
        /usr/var/slocate.db
        /usr/local/var/slocate.db
        /var/lib/locate/slocate.db
        /var/spool/locate/slocate.db

        /var/lib/slocate/locate.database
        /var/cache/locate/locate.database
        /usr/var/locate.database
        /usr/local/var/locate.database
        /var/lib/locate/locate.database
        /var/spool/locate/locate.database

        /var/lib/slocate/locatedb
        /var/db/locatedb
     );
  $self->set_locate_db_location_candidates( \@candidates );
  return $self;
}

=back

=head2 basic setters and getters

=over

=item db

Getter for object attribute system_db

=item set_db

Setter for object attribute set_db

As a side-effect, unsets the shell_locate_failed flag
(what if the last db file was bad, and this current
setting will work?).

=cut

sub set_db {
  my $self = shift;
  my $db = shift;
  $self->{ db } = $db;
  $self->set_shell_locate_failed( undef );
  return $db;
}

=back

=head2 EXPERIMENTAL

Having some trouble straightening out the above code as-written.
Going to work on some experimental routines here, that might
have a use somewhere.

=over

=item work_via

Try the db various ways, make a recommendation on how to access it.
Return string: 'module' or 'shell'.

Q: how to handle the shell-but-undef-db case?
A1: could be a third how-type 'shell_unknown'
A2: could be some sort of meta-field, a "system_db_indeterminate" flag

=cut

sub work_via {
  my $self = shift;
  my $db   = shift    || $self->db;

  my $how;

  if (     $self->probe_db_via_module_locate( $db ) ) {
    $how = 'module';
  } else {
         if( $self->probe_db_via_shell_locate(  $db ) ) {
           $how = 'shell';
         } else {
           $how = 'shell_unknown_db';
         }
  }
  return $how;
}





=back

=head2 automatic accessor generation

=over

=item AUTOLOAD

=cut

sub AUTOLOAD {
  return if $AUTOLOAD =~ /DESTROY$/;  # skip calls to DESTROY ()

  my ($name) = $AUTOLOAD =~ /([^:]+)$/; # extract method name
  (my $field = $name) =~ s/^set_//;

  # check that this is a valid accessor call
  croak("Unknown method '$AUTOLOAD' called")
    unless defined( $ATTRIBUTES{ $field } );

  { no strict 'refs';

    # create the setter and getter and install them in the symbol table

    if ( $name =~ /^set_/ ) {

      *$name = sub {
        my $self = shift;
        $self->{ $field } = shift;
        return $self->{ $field };
      };

      goto &$name;              # jump to the new method.
    } elsif ( $name =~ /^get_/ ) {
      carp("Apparent attempt at using a getter with unneeded 'get_' prefix.");
    }

    *$name = sub {
      my $self = shift;
      return $self->{ $field };
    };

    goto &$name;                # jump to the new method.
  }
}

1;

=back

=head1 Platforms

It's likely that this package will work on any unix-like system
(including cygwin), though on some there might be a need for
additional installation and setup (e.g. a "findutils" package).

Development was done on two varieties of linux (aka GNU/linux):
Knoppix (32bit) on a Turion and Kubuntu on an Opteron machine.
This covered two major varieties of the "locate" command:
GNU locate and Secure Locate.

A serious attempt was made to support BSD locate on Freebsd,
but the testing has not been completed.

Note: at present the File::Locate module appears to fail silently
on 64bit platforms, so there the command-line shell locate will
always be used.


=head1 MOTIVATION

This module uses L<File::Locate>, which is a a perl XS interface
to read locate (or slocate) dbs without shellling out to the
command-line "locate" program.

File::Locate has one great limitation: it must be told which locate
db to use (by explicit parameter, or by environment variable), it
has no notion of a default location.  Further, as of this writing,
it appears to be limited to 32bit systems.

This module then is a wrapper around File::Locate that tries a
number of common locations for the locate database, and instead
of just giving up, it also tries the command-line locate, which
has it's own ways of knowing where the database can be
(configuration file, compiled-in default, or command-line
parameter).

The intention here is to make this module as portable as
possible...  it might, for example, be useful to use in portable
CPAN modules that need to look for things in the filesystem.

(As a case in point: the job of File::Locate::Harder would be a lot
easier if it could use "locate" to find the locate db...).

=head1 Additional Examples

=head2 forcing locate via File::Locate module or via shell command

  my $flh = File::Locate::Harder->new();
  $result_via_module = $flh->locate_via_module( $term );
  $result_via_shell  = $flh->locate_via_shell(  $term );

=head2 using the coderef feature of the File::Locate module

  my $count = 0;
  $flh->locate_via_module( $term, sub { $count++ } );
  print "There are $count matches of $term\n";


  $flh->locate_via_module( $term,
          sub { $count++ if $_ =~ m{ ^ /home }x } );
  print "There are $count matches of $term located in /home\n";

=head2 speeding up multiple searches if you know you're using shell locate

This reduces the number of calls to build_opts_for_locate_via_shell:

  my @searches = qw( .bashrc .bash_profile .emacs default.el );
  my $flh = File::Locate::Harder->new();
  my $opt_str = $self->build_opts_for_locate_via_shell;
  foreach my $term (@searches) {
    $result_via_shell  = $flh->locate_via_shell( $term, $opt_str );
  }

=head1 SEE ALSO

L<File::Locate>

Manual pages: L<locate>, L<slocate>, and/or L<updatedb>.

=head1 NOTES

=head2 architecture

The general philosophy in use here is to just try things that
are likely to work and then just try something else if they
fail.  This is probably better than attempting to guess which
form of locate to use based on the current platform, because (a)
no one (to my knowledge) has a capabilities database that
specifies which locate is found on which platform (b) different
variants may be installed at the whim of a sysadmin (c) there
may after all be variants of locate I've never encountered.

So checking ^O is of limited utility, and similarly, some of the
existing forms of locate lack introspection features (e.g. you
can't get freebsd's locate to tell you what version it is).

=head2 details

The object creation process "new" and "init" determines how to do
system-wide locates, and saves it's conclusions for use by future
calls of the locate method on this object.

Some of this elaborate initialization process can be
short-circuited if it's told which db file to use, or even just
giving it an "db" option with an undefined value.  That's
convenient for cases where you want to use this module to create
a locate db of your own (there's no point in scoping for a
system-wide db if we're going to use a specialized one).

If the db location is not known, the search process begins
with making guesses about likely locations it might be found.
It goes through this list:

 /var/lib/slocate/slocate.db  -- Secure Locate under Kubuntu
 /var/cache/locate/locatedb   -- GNU locate, under Knoppix
 /var/db/locate.database      -- BSD locate, under FreeBSD
 /usr/var/locatedb            -- mentioned: File::Locate docs and cygwin lists
 /var/lib/locatedb            -- mentioned on insecure.org
 /usr/local/var/locatedb      -- Solaris with findutils installed
 /var/lib/locate/locatedb     -- mentioned on a Debian list in 2000
 /var/spool/locate/locatedb   -- speculative mention on a cygwin list

So that's three names, in 8 locations.  It also tries other
permutations on speculation:

 /var/cache/locate/slocate.db
 /var/db/slocate.db
 /usr/var/slocate.db
 /usr/local/var/slocate.db
 /var/lib/locate/slocate.db
 /var/spool/locate/slocate.db

 /var/lib/slocate/locate.database
 /var/cache/locate/locate.database
 /usr/var/locate.database
 /usr/local/var/locate.database
 /var/lib/locate/locate.database
 /var/spool/locate/locate.database

 /var/lib/slocate/locatedb
 /var/db/locatedb

Each of these possibilites is checked for simple file-existance,
and then checked to see if one works. (See
L</"checking if a form of locate works"> below.)

=head2 locate shell command

If attempts at using L<File::Locate> fails, the system falls back
to shelling out to the locate command (which really should already
know how to find the system-wide db, either from a compiled-in
default or a config file setting).

But the locate shell command has it's own problems.  There are at
least three variants, with some slight differences between GNU
locate, slocate and freebsd locate.

The current architecture of locate_via_shell tries all of them
in a certain order, and remembers the one that worked last time.

Briefly, here are the variations we need to account for:

=over

=item -d or --database

-d is essentially more general, because freebsd has it but does
not have --database.  So, we try "-d" first, but also try "--database"
just in case.

=item -q for quiet

As of this writing, with slocate, if you tell it explicitly
which db to use, that works, but you also get an ignorable error
about how you don't have permissions to mess with the system
wide database.  You can get this warning to go away with the
"-q" option, but neither Gnu locate or freebsd has it, and if
you use it with them it's a fatal error.  So here we try to use "-q"
first, and if that dies, we run without it.

=back

And still other variations exist in requesting version information.
The FreeBSD form does not understand "--version", and in fact
doesn't seem to have any sort of version option.

(Ah, Cross-platform programming is such a joy.)

=head2 checking if a form of locate works

In order to check that a system-wide locate is working, we probe for
files we know (or strongly suspect) will be there on the system.
This module tries a series of guesses of decreasing specificity
(there's no point in getting a huge number of hits if they're not
needed), then bails out on the list if a result is recieved.

The list in use here begins with files in the standard perl library
(which should accompany almost any installation of perl, unless they
were removed for some reason):

  MakeMaker
  SelfStubber
  DynaLoader

It then begins looking for strings that should be relatively common
on most systems:

  README
  tmp
  bin
  the
  htm
  txt
  home

The presumption is that if there are no hits on those searchs on a
system-wide database, something is very wrong, and that particular
form of "locate" just isn't working.

=head2 File::Locate

By using File::Locate with () to supress import, we need to call
'locate' like so:

   File::Locate::locate

which makes it easy for us to define a new 'locate' method of
our own.

The proceedural syntax of File::Locate::locate has it's ugly aspects,
but the documentation is usually clear:

  my @mp3s = File::Locate::locate "mp3", "/usr/var/locatedb";

  # do regex search
  @hits = File::Locate::locate "^/usr", -rex => 1, "/usr/var/locatedb";

  @hits = File::Locate::locate "^/usr", -rexopt => 'ie', "/usr/var/locatedb";
  # i - case insensitive
  # e - POSIX extended regexps (say what?)

Note: it isn't abundantly clear from the documentation if
-rexopt has to be used with -rex, but it appears that this is
the case.  (And there is a syntax diagram that indicates this).

Another oddity, though: there doesn't seem to be a way to do a
case-insensitive search without using regexps.
(Note: none of the tests use the "-rexopt" feature.)

A very cool touch is that you can hand it a coderef, and avoid
building up a big result set:

   File::Locate::locate "*.mp3", sub { print "MP3 found: $_n" };

Note: the order of arguments to File::Locate::locate is supposed
to be irrelevant.

=head2 creating a database

Creating your own private locate database isn't done very often,
but this module tries to support it largely for purposes of writing
portable tests (we can't know what files are installed on a remote system,
so it's difficult to know what a locate operation should have found...
*unless* we generate a small locate database of our own that tracks
a known set of files that we ship with the tests).

Unfortunately there are several different invocation forms for doing this,
depending on the variant of locate you have installed.  As usual,
we try everything we can think of, and only give up if none of them work.

  my @cmd = ( "slocate -U $location -o $db",
              "updatedb --require-visibility 0 --output=$db --database-root='$location'",
              "updatedb --output=$db --localpaths='$location'",
            );

It probably comes as no surprise that "slocate" and "updatedb" have
different forms.  I was, uh, *interested* to see that my updatedb
works differently now (2010) than when I wrote this code in 2007.

The man page for the version of updatedb installed on my Ubuntu "jaunty"
box has a version of "update" db written by: "Miloslav Trmac <mitr@redhat.com>"
where the option I need is called "--database-root", I see that the
old option name I was using, "--localpaths", was used by a version
written by "Glenn Fowler <gsf@research.att.com>".

Also, with the RedHat version-- which looks as though it thinks
of itself as "mlocate"-- the "-require-visibility 0" option is
recommended for the creation of a small, private locate db.

=head2 system status fields

The system status fields (the one's that can be saved or inspected
via L<introspection_results>) no doubt seem redundant:

 db
 system_db_not_found
 use_shell_locate
 shell_locate_failed
 shell_locate_cmd_idx

It's possible that they *are* somewhat redundant: they were
invented on-the-fly during development on an ad hoc basis.

However, despite the way it looks, this set is resistant to being
reduced in size.  Two-valued logic has it's limitations: for our
immediate purpose, there has to be ways to distinguish between "I
don't know what this value is, and you should try to find out"
and "I don't know what this value is, and it isn't worth trying
to find it."  For example, the "db" field alone isn't good
enough, it needs to be supplemented with information about what
we've done to try to determine the "db".

As for "use_shell_locate" and "shell_locate_failed":
"shell_locate_failed" is used largely to skip doing a probe via
shell if it's failed before (possibly it's name should be
expanded to "shell_locate_probe_failed").  Even if the system has
been explicitly told to work via the shell, it's still necessary
to do a probe to find out which form of the shell locate command
will work ("shell_locate_cmd_idx").

=head1 AUTHOR

Joseph Brenner, E<lt>doom@kzsu.stanford.eduE<gt>,
29 May 2007

=head1 COPYRIGHT AND LICENSE

Copyright (C) 2007, 2010 by Joseph Brenner

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.8.2 or,
at your option, any later version of Perl 5 you may have available.

=head1 BUGS

None reported... yet.

=cut