The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Metadata::ByInode::Indexer - customizable file and directory indexer

DESCRIPTION

part of Metadata::ByInode not meant to be used alone!

index()

First argument is an absolute file path.

If this is a dir, will recurse - NON inclusive that means the dir *itself* will NOT be indexed

if it is a file, will do just that one.

returns indexed files count

by default the indexer does not index hidden files to index hidden files,

 $m = new Metadata::ByInode::Indexer({ 
   abs_dbfile => '/tmp/mbi_test.db', 
   index_hidden_files => 1 
 });
 
 $m->index('/path/to/what'); # dir or file
 

USING THE INDEXER

by deafault we just record abs_loc, filename, ontime(timestamp we recorded it on) you can use the method rule() which returns a File::Find::Rule object, to do neat things..

        my $i = new Metadata::ByInode({ abs_dbfile => '/tmp/dbfile.db' });

        $i->finder->name( qr/\.mp3$|\.avi$/ );

        $i->index('/home/myself'); 

This would only index mp3 and avi files in your home dir.

finder()

returns File::Find::Rule object, you can feed it rules before calling index()

CREATING YOUR OWN INDEXER

index_extra()

If you want to invent your own indexer, then this is the method to override. For every file found, this method is run, it just inserts data into the record for that file. By default, all files will have 'filename', 'abs_loc', and 'ondisk', which is a timestamp of when the file was seen (now).

for example, if you want the indexer to record mime types, you should override the index_extra method as..

        package Indexer::WithMime;
        use File::MMagic;               
        use base 'Metadata::ByInode::Indexer';

        
        sub index_extra {
        
                my $self = shift;       
      
                # get hash with current record data
      my $record = $self->_record;      

                # by default, record holds 'abs_loc', 'filename', and 'ondisk'
      
           # ext will be the distiction between dirs here
                if ($record->{filename}=~/\.\w{1,4}$/ ){ 
                                
                                my $m = new File::MMagic;
                                my $mime = $m->checktype_filename( 
               $record->{abs_loc} .'/'. $record->{filename} 
            );
                                
                                if ($mime){ 
                                   # and now we append to the record another key and value pair
                                        $self->_set('mime_type',$mime);                                         
                                }               
                }
        
                return 1;       
        }

Then in your script

        use Indexer::WithMime;

        my $i = new Indexer::WithMime({ abs_dbfile => '/home/myself/dbfistartedle.db' });

        $i->index('/home/myself');

        # now you can search files by mime type residing somewhere in that dir

   $i->search({ mime_type => 'mp3' });

   #or 
   $i->search({ 
      mime_type => 'mp3',
      filename => 'u2',
   });

_teststop()

returns how many files to index before stop only happens if DEBUG is on. default val is 1000, to change it, provide new argument before indexing.

        $self->_teststop(10000); # now set to 10k

You may also pass this ammount to the constructor

        my $i = new Metadata::ByInode( { _teststop => 500, abs_dbfile => '/tmp/index.db' });

_find_abs_paths()

argument is abs path to what base dir to scan to index, returns abs paths to all within no hidden files are returned

Returns array ref with abs paths:

        $self->_find_abs_paths('/var/wwww');

_save_stat_data()

By default we do not save stat data, if you want to, then pass as argument to constructor:

        my $i = new Metadata::ByInode({ save_stat_data => 1 });

This will create for each entry indexed;

        ctime mtime is_dir is_file is_text is_binary size

If you are indexing 1k files, this makes little difference. But if you are indexing 1million, It makes a lot of difference in time.

CHANGES

The previous version used the system find to get a list of what to index, now we use File::Find::Rule

SEE ALSO

Metadata::ByInode and Metadata::ByInode::Search

                $self->{_open_handle}->{recursive_delete}->execute("$abs_path%");
                my $rows_deleted = $self->{_open_handle}->{recursive_delete}->rows;
                ### $rows_deleted       
                $self->dbh->commit;
                
        DOING A SUBSELECT LIKE THIS TAKES FOREEEEEVVVVVEEERRRRRRRRRRRR
                
                my $delete = $self->dbh->prepare( 
                                q{DELETE FROM metadata WHERE inode IN( 
                                        SELECT inode FROM (select * from metadata) as temptable WHERE temptable.mkey='abs_loc' AND temptable.mvalue LIKE ?)}
                ) or croak( "_delete_treeslice() ". $self->dbh->errstr );
                  
                print STDERR "done.\n" if DEBUG;
                
                print STDERR "executing.. " if DEBUG;
                $delete->execute("$abs_path\%");
                print STDERR "done.\n" if DEBUG;
                
                my $rows_deleted = $delete->rows;
                ## $rows_deleted        
                $self->dbh->commit;     

AUTHOR

Leo Charre leocharre at cpan dot org