The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

MyCPAN::Indexer - Index a Perl distribution

SYNOPSIS

use MyCPAN::Indexer;

DESCRIPTION

get_indexer()

A stand in for run_components later on.

run( DISTS )

Takes a list of distributions and indexes them.

examine_dist

Given a distribution, unpack it, look at it, and report the findings. It does everything except the looking right now, so it merely croaks. Most of this needs to move out of run and into this method.

examine_dist_steps

Return a list of 3-element anonymous arrays that tell examine_dists what to do. The elements of each anonymous array are:

1) the method to call (must be in indexing class or its parent classes)
2) a text description of the method
3) if a failure in that step should stop the exam: true or false
clear_run_info

Clear anything recorded about the run.

setup_run_info( DISTPATH )

Given a distribution path, record various data about it, such as its size, mtime, and so on.

Sets these items in dist_info: dist_file dist_size dist_basename dist_basename dist_author

set_run_info( KEY, VALUE )

Set something to record about the run. This should only be information specific to the run. See set_dist_info to record dist info.

run_info( KEY )

Fetch some run info.

clear_dist_info

Clear anything recorded about the distribution.

setup_dist_info( DISTPATH )

Given a distribution path, record various data about it, such as its size, mtime, and so on.

Sets these items in dist_info: dist_file dist_size dist_basename dist_basename dist_author

check_dist_size

Some indexers might want to stop if the dist size is 0 (or some other value). In particular, you can't unpack zero byte dists, so if you are expecting to look at the dist files, a 0 sized dist is a problem.

set_dist_info( KEY, VALUE )

Set something to record about the distribution. This should only be information specific to the distribution. See set_run_info to record run info.

dist_info( KEY )

Fetch some distribution info.

unpack_dist( DISTPATH )

Given a distribution path, this determines the archive type, unpacks it into a temporary directory, and records what it did.

Sets these items in dist_info:

dist_archive_type
dist_extract_path

Sets these items in run_info, when appropriate:

unpack_dist_archive_zip_error
extraction_error

This method returns false if any of these steps fail:

  • The distribution file is not there

  • The distribution file does not uncompress

  • The archive does not unpack

  • The archive unpacks, but there are no files in the extraction directory

get_unpack_dir

Get a directory where you can unpack the archive.

Sets these items in dist_info:

unpack_dir
find_dist_dir

Looks at dist_info's unpack_dir and guesses where the module distribution is. This accounts for odd archiving people may have used, like putting all the good stuff in a subdirectory.

Sets these items in dist_info: dist_dir

get_file_list

Returns as an array reference the list of files in MANIFEST.

Sets these items in dist_info: manifest

unless( -e 'MANIFEST' or -e 'MANIFEST.SKIP' ) {
	$logger->error( "No Makefile.PL or Build.PL" );
	$_[0]->set_dist_info( 'manifest', [] );

	return;
	}
get_file_info( FILE )

Collect various meta-information about a file and store it in a hash. Returns the hash reference.

get_blib_file_list

Returns as an array reference the list of files in blib. You need to call something like run_build_file first.

Sets these items in dist_info: blib

look_in_lib

Look in the lib/ directory for .pm files.

look_in_cwd

Look for .pm files in the current workign directory (and not in sub-directories). This is more common in older Perl modules.

look_in_cwd_and_lib

This is instantly deprecated. It's glue until I can figure out a better solution.

look_in_meta_yml_provides

As an almost-last-ditch effort, decide to beleive META.yml if it has a provides entry. There's no reason to trust that the module author has told the truth since he is only interested in advertising the parts he wants you to use.

look_for_pm

This is a last ditch effort to find modules by looking everywhere, starting in the current working directory.

parse_meta_files

Parses the META.yml and returns the YAML object.

Sets these items in dist_info: META.yml

find_module_techniques

Returns a list of 2-element anonymous arrays that lists method names and string descriptions of the way that the find_modules should look for module files.

If you don't like the techniques, such as run_build_file, you can overload this and return a different set of techniques.

find_modules

Find the module files. First, look in blib/. If there are no files in blib/, look in lib/. If there are still none, look in the current working directory.

find_tests

Find the test files. Look for test.pl or .t files under t/.

run_build_file

This method is one stop shopping for calls to choose_build_file, setup_build, run_build.

choose_build_file

Guess what the build file for the distribution is, using Distribution::Guess::BuildSystem.

Sets these items in dist_info:

build_file         - the build file to use
build_system_guess - the Distribution::Guess::BuildSystem object
setup_build

Runs the build setup file (Build.PL, Makefile.PL) to prepare for the build. You need to run choose_build_file first.

Sets these items in dist_info:

build_file_output
run_build

Run the build file (Build.PL, Makefile). Run setup_build first.

Sets these items in dist_info:

build_output

# Why is this here and how is it different from what I just did?

my( $runner ) = grep { -e } qw( ./Build Makefile );
$logger->debug( "runner is [$runner]" );

$_[0]->run_something( $runner, 'build_modules_output' ) if $runner;
make_meta_file

Run the build file (Build.PL, Makefile) to create the META.yml file. Run setup_build first.

Sets these items in dist_info: build_meta_output make_meta_file_output

run_something( COMMAND, KEY )

Run the shell command and record the output in the dist_info for KEY. This merges the outputs into stdout and closes stdin by redirecting /dev/null into COMMAND.

run_build_target( TARGET )

Run the shell command and record the output in the dist_info for KEY. This merges the outputs into stdout and closes stdin by redirecting /dev/null into COMMAND.

run_perl_program( PROGRAM, KEY )

Run the shell command and record the output in the dist_info for KEY. This merges the outputs into stdout and closes stdin by redirecting /dev/null into COMMAND.

get_module_info_tasks

Returns a list of anonymous arrays that tell get_module_info what to do. Each anonymous array holds:

0. method to call
1. description of technique

The default list includes extract_module_namespaces, extract_module_version, and extract_module_dependencies. If you don't like that list, you can prune or expand it in a subclass.

get_module_info( FILE )

Collect meta informantion and package information about a module file. It starts by calling get_file_info, then adds more to the hash, including the version and package information.

get_test_info( FILE )

Collect meta informantion and package information about a test file. It starts by calling get_file_info, then adds more to the hash, including the version and package information.

count_lines( FILE )

Counts the lines in a file and categorizes them as code, comment, documentation, or blank.

This returns a hash:

{
total         => ...,
code          => ...,
comment       => ...,
documentation => ...,
blank         => ...,
}
file_magic( FILE )

Guesses and returns the MIME type for the file, using File::MMagic if it's available. If that module is not available, it returns nothing.

Utility functions

These functions aren't related to examining a distribution directly.

cleanup

Removes the unpack_dir. You probably don't need this if File::Temp cleans up its own files.

report_dist_info

Write a nice report. This isn't anything useful yet. From your program, take the object and dump it in some way.

get_caller_info

This method is mostly for the $logger->trace method in Log4perl. It figures out which information to report in the log message, acconting for all the levels or magic in between.

get_md5_of_file_contents
getppid

Get the parent process ID. This is a method because I have to do special things for Windows. For Windows, just return -1 for now.

TO DO

Count the lines in the files. I think this is currently broken.
Code stats? Lines of code, lines of pod, lines of comments

SOURCE AVAILABILITY

This code is in Github:

git://github.com/briandfoy/mycpan-indexer.git

AUTHOR

brian d foy, <bdfoy@cpan.org>

COPYRIGHT AND LICENSE

Copyright © 2008-2018, brian d foy <bdfoy@cpan.org>. All rights reserved.

You may redistribute this under the terms of the Artistic License 2.0.