Michael De La Rue


WWW::Link::Selector - link selection functions.


    use MLDBM qw(DB_File);
    use CDB_File::BiIndex;
    use WWW::Link::Selector;
    use WWW::Link::Reporter;

    #generate a function which uses lists of regexs to include or
    #exclude links
    $::include=WWW::Link::Selector::gen_include_exclude @::exclude, @::include;

    $::index = new CDB_File::BiIndex $::page_index, $::link_index
    $::linkdbm = tie %::links, "MLDBM", $::links, O_RDONLY, 0666, $::DB_HASH
      or die $!;
    $::reporter=new WWW::Link::Reporter::HTML \*STDOUT, $::index;

    #generate a function which will use all 
    $::selectfunc = WWW::Link::Selector::generate_select_func
      ( \%::links, $::reporter, $::include, $::index, );

    #report on all selectedlinks


This is a package (not a class though) which builds functions for selecting links to give information about to a user. So far there are two ways of doing this, either scanning the entire database or using an index to get the information.


This function creates a url function which will act on each of the links for the urls given in its arguments. If any of the arguments have spaces it split them into different urls around that space.

This function generates a selector function which works in one of two modes.

In the first, no index is given and it recurses through all of the links in the database.

In the second it generates a selection function which recurses through the index working on each url.

For each url it finds, it calls the given link reporter if the include_func returns true for that url.

This function returns a function which iterates through all of the links found in the index, calling $reporter->examine() for each link.

In this select function, the include_func is a function which is called on each page url in our own pages to decide whether or not to report the link.

gen_include_exclude (@exclude, @include)

This function generates a function which will return false if any of the regexps in the exclude_listre match and even then will return false unless one of the regexps in the include listref matches.

If the first list is empty then all links matching the include list will be accepted.

If the second list is empty, then all links not matching the exlcude list will be accepted.

The fuction generated can be used by generate_select_func (see above).