App::lcpan::Manual::Internals - App::lcpan internals
version 1.065.000
Indexing is done in several steps. The last step (parsing release files) is done in at least 3 passes. We can skip one or more of these passes to save time, if we don't need the information that the passes gather.
First, we parse authors/01mailrc.txt.gz and insert the data into author table. Some DarkPANs like those produced by OrePAN have authors/00whois.xml instead.
author
Then we parse modules/02packages.details.txt.gz, which is the main meat of CPAN index. This file links package (module) names to release tarballs. A snippet from the file:
... Log::ger 0.037 P/PE/PERLANCAR/Log-ger-0.037.tar.gz Log::ger::App 0.014 P/PE/PERLANCAR/Log-ger-App-0.014.tar.gz Log::ger::DBI::Query 0.001 P/PE/PERLANCAR/Log-ger-DBI-Query-0.001.tar.gz Log::ger::Filter 0.037 P/PE/PERLANCAR/Log-ger-0.037.tar.gz Log::ger::Filter::Code 0.037 P/PE/PERLANCAR/Log-ger-0.037.tar.gz ...
We insert these records to file table, so each release file gets a numeric file ID, and module table, so each module gets a numeric module ID as well as link to its file ID.
file
module
At this point, we haven't parsed distribution names yet because that will need information from META.{json,yaml} inside the release files.
Then we start to examine the release files. This is done in several passes and you have the option to skip some of the passes. The third step is done in multiple passes because in pass 2, we want to collect all known scripts first to be able to detect links to scripts in POD (collected in pass 1). Also some passes are more high-level and/or experimental and/or optional.
First we list the content of each release archive and store the results into the content table. This will allow us to check whether a distribution has a distribution metadata file (META.yml or META.json), whether a distribution contains scripts, and so on.
content
We populate the script table by heuristically including content which from its name looks like script, e.g.:
script
script/foo bin/whatever
We then extract the distribution metadata files (either META.json or META.yaml) and store the information contained in these metadata files into the database. These include the distribution name (written to the dist table) and the dependency information (written to the dep table).
dist
dep
At the end of this first pass, we have a pretty useful database already. One of the main uses of lcpan is to provide dependency information. You can skip the other passes if you want.
In the second pass, we extract modules and script files inside each release file into a temporary directory, then parse their POD. This pass usually takes several times the amount of time it takes to complete the first pass. At the time of this writing (2020-04-19) on my computer, the first pass takes about 14 minutes and the second pass takes 72 minutes. A big release file that contains thousands of (mostly autogenerated) module files (yes, they exist; see Paws for example) can take 25 minutes on its own. You might want to skip those files if you do not expect to ever need to deal with the module/distribution; see the lcpan update documentation. For example, in lcpan.conf you can put:
lcpan update
skip_index_file_patterns = ^Paws-\d skip_index_file_patterns = ^Google-Ads-GoogleAds-Client-\d skip_index_file_patterns = ^Google-Ads-AdWords-Client-\d skip_index_file_patterns = ^eBay-API-\d skip_index_file_patterns = ^Microsoft-AdCenter-\d skip_index_file_patterns = ^VMOMI-\d
By parsing POD, we get: module/script abstract (stored into module table) and mentions (i.e. a POD that links to another POD, stored in mention table). The mentions information is mainly useful to know how related a module is to another (see lcpan related-mods subcommand).
mention
lcpan related-mods
In this pass, we try to extract subroutine names in modules. This requires the use of a source code lexer (lcpan uses Compiler::Lexer). On my computer, this pass takes another 19 minutes. At the time of this writing, this pass is experimental and not enabled by default.
perlancar <perlancar@cpan.org>
This software is copyright (c) 2021, 2020 by perlancar@cpan.org.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install App::lcpan::Manual, copy and paste the appropriate command in to your terminal.
cpanm
cpanm App::lcpan::Manual
CPAN shell
perl -MCPAN -e shell install App::lcpan::Manual
For more information on module installation, please visit the detailed CPAN module installation guide.