NAME

MMM::Text::Search - Perl module for indexing and searching text files and web objects

SYNOPSIS

  use MMM::Text::Search;
          
  my $srch = new MMM::Text::Search {    #for indexing...
        #index main file location...  
                IndexPath => "/tmp/myindex.db",
        #local files... (optional)
                FileMask  => '(?i)(\.txt|\.htm.?)$',
                Dirs      => [ "/usr/doc", "/tmp" ] ,
                FollowSymLinks => 0|1, (default = 0)
        #web objects... (optional)
                URLs      => [ "http://localhost/", ... ],
                Level     => recursion-level (0=unlimited)              
        #common options...              
                IgnoreLimit =>  0.3,   (default = 2/3)
                Verbose => 0|1                          
        };
  
  $srch->start_indexing_session();
        
  $srch->commit_indexing_session();
  
  $srch->index_default_locations();
        
  $srch->index_content( { title =>   '...', 
                          content=>  '...', 
                          id =>      '...'  } );
         
  $srch->makeindex;
       (Obsolete.) 


        
        

  my $srch = new MMM::Text::Search (  #for searching....
                  "/tmp/myindex.db", verbose_flag );
  
  my $hashref = $srch->query("pizza","ciao", "-pasta" );  
  my $hashref = $srch->advanced_query("(pizza OR ciao) AND NOT pasta");  

  $srch->errstr()       # returns last error 
                        # (only query syntax-errors for the moment being)

  
  $srch->dump_word_stats(\*FH)

DESCRIPTION

Indexing

When a session is closed the following files will have been created (assuming IndexPath = /path/myindex.db, see constructor):
```
        /path/myindex.db             word index database
        /path/myindex-locations.db   filename/URL database
        /path/myindex-titles.db      html title database
        /path/myindex.stopwords      stop-words list
        /path/myindex.filelist       readable list of indexed files/URLs
        /path/myindex.deadlinks      broken http links

        [... lots of important things missing ... ]
```
start_indexing_session() starts session.

commit_indexing_session() commits and closes current session.

index_default_locations() indexes all files and URLs specified on construction.

index_content() pushes content into indexing engine. Argument must have the following structure
```
 { title =>   '...', content=>  '...', id =>      '...'  }
```
makeindex() is obsolete. Equivalent to: $srch->start_indexing_session(); $srch->index_default_locations(); $srch->commit_indexing_session();

dump_word_stats(\*FH) dumps all words sorted by occurence frequency using FH file handle (or STDOUT if no parameter is specified). Stop-words get a frequency value of 1.

Searching

Both query() and advanced_query() return a reference to a hash with the following structure:

        (
         ignored  => [ string, string, ... ], # ignored words
         searched => [ string, string, ... ], # words searched for
         entries    => [  hashref, hashref, ... ] # list of records 
                                                # found
         )

The 'entries' element is a reference to an array of hashes, each having the following structure:

        (
         location => string,  # file path or URL or anything
         score    => number,  # score 
         title    => string   # HTML title               
        )

NOTES

Note on implementation: The technique used for indexing is substantially derived from that exposed by Tim Kientzle on Dr. Dobbs magazine.

BUGS

Many, I guess.

AUTHOR

Max Muzi <maxim@comm2000.it>

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)