The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CPAN::Search::Lite::Populate - create and populate database tables

DESCRIPTION

This module is responsible for creating the tables (if setup is passed as an option) and then for inserting, updating, or deleting (as appropriate) the relevant information from the indices of CPAN::Search::Lite::Info and CPAN::Search::Lite::PPM and the state information from CPAN::Search::Lite::State. It does this through the insert, update, and delete methods associated with each table.

Note that the tables are created with the setup argument passed into the new method when creating the CPAN::Search::Lite::Index object; existing tables will be dropped.

TABLES

The tables used are described below.

mods

This table contains module information, and is created as

  mod_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  dist_id SMALLINT UNSIGNED NOT NULL
  mod_name VARCHAR(100) NOT NULL
  mod_abs TINYTEXT
  doc bool
  mod_vers VARCHAR(10)
  dslip CHAR(5)
  chapterid TINYINT(2) UNSIGNED
  PRIMARY KEY (mod_id)
  FULLTEXT (mod_abs)
  KEY (dist_id)
  KEY (mod_name(100))
  • mod_id

    This is the primary (unique) key of the table.

  • dist_id

    This key corresponds to the id of the associated distribution in the dists table.

  • mod_name

    This is the module's name.

  • mod_abs

    This is a description, if available, of the module.

  • doc

    This value, if true, signifies that documentation for the module exists, and is located, eg, in dist_name/Foo/Bar.pm for a module Foo::Bar in the dist_name distribution.

  • src

    This value, if true, signifies that the source code for the module exists, and is located, eg, in dist_name/Foo/Bar.pm for a module Foo::Bar in the dist_name distribution.

  • mod_vers

    This value, if present, gives the version of the module.

  • dslip

    This is a 5 character string expressing the dslip (development, support, language, interface, public license) information.

  • chapterid

    This number corresponds to the chapter id of the module, if present.

dists

This table contains distribution information, and is created as

  dist_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  stamp TIMESTAMP(8)
  auth_id SMALLINT UNSIGNED NOT NULL
  dist_name VARCHAR(90) NOT NULL
  dist_file VARCHAR(110) NOT NULL
  dist_vers VARCHAR(20)
  dist_abs TINYTEXT
  size MEDIUMINT UNSIGNED NOT NULL
  birth DATE NOT NULL
  readme bool
  changes bool
  meta bool
  install bool
  PRIMARY KEY (dist_id)
  FULLTEXT (dist_abs)
  KEY (auth_id)
  KEY (dist_name(90))
  • dist_id

    This is the primary (unique) key of the table.

  • stamp

    This is a timestamp for the table indicating when the entry was either inserted or last updated.

  • auth_id

    This corresponds to the CPAN author id of the distribution in the auths table.

  • dist_name

    This corresponds to the distribution name (eg, for My-Distname-0.22.tar.gz, dist_name will be My-Distname).

  • dist_file

    This corresponds to the CPAN file name.

  • dist_vers

    This is the version of the CPAN file (eg, for My-Distname-0.22.tar.gz, dist_vers will be 0.22).

  • dist_abs

    This is a description of the distribtion. If not directly supplied, the description for, eg, Foo::Bar, if present, will be used for the Foo-Bar distribution.

  • size

    This corresponds to the size of the distribution, in bytes.

  • birth

    This corresponds to the last modified time of the distribution, in the form YYYY/MM/DD.

  • readme

    This value, if true, indicates that a README file for the distribution is available.

  • changes

    This value, if true, indicates that a Changes file for the distribution is available.

  • meta

    This value, if true, indicates that a META.yml file for the distribution is available.

  • install

    This value, if true, indicates that an INSTALL file for the distribution is available.

auths

This table contains CPAN author information, and is created as

  auth_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  cpanid VARCHAR(20) NOT NULL
  fullname VARCHAR(40) NOT NULL
  email TINYTEXT
  PRIMARY KEY (auth_id)
  FULLTEXT (fullname)
  KEY (cpanid(20))
  • auth_id

    This is the primary (unique) key of the table.

  • cpanid

    This gives the CPAN author id.

  • fullname

    This is the full name of the author.

  • email

    This is the supplied email address of the author.

chaps

This table contains chapter information associated with distributions. PAUSE allows one, when registering modules, to associate a chapter id with each module (see the mods table). This information is used here to associate chapters (and subchapters) with distributions in the following manner. Suppose a distribution Quantum-Theory contains a module Beta::Decay with chapter id 55, and another module Laser with chapter id 87. The Quantum-Theory distribution will then have two entries in this table - chapterid of 55 and subchapter of Beta, and chapterid of 87 and subchapter of Laser.

The table is created as follows.

  chap_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  chapterid TINYINT UNSIGNED NOT NULL
  dist_id SMALLINT UNSIGNED NOT NULL
  subchapter TINYTEXT
  KEY (dist_id)
  • chap_id

    This is the primary (unique) key of the table.

  • chapterid

    This number corresponds to the chapter id.

  • dist_id

    This is the id corresponding to the distribution in the dists table.

  • subchapter

    This is the subchapter.

reqs

This table lists the prerequisites of the distribution, as found in the META.yml file (if supplied - note that only relatively recent versions of ExtUtils::MakeMaker or Module::Build generate this file when making a distribution). The table is created as

  req_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  dist_id SMALLINT UNSIGNED NOT NULL
  mod_id SMALLINT UNSIGNED NOT NULL
  req_vers VARCHAR(10)
  KEY (dist_id)
  • req_id

    This is the primary (unique) key of the table.

  • dist_id

    This corresponds to the id of the distribution in the dists table.

  • mod_id

    This corresponds to the id of the prerequisite module in the mods table.

  • req_vers

    This is the version of the prerequisite module, if specified.

ppms

This table contains information on Win32 ppm packages available in the repositories specified in $repositories of CPAN::Search::Lite::Util. The table is created as

  ppm_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
  dist_id SMALLINT UNSIGNED NOT NULL
  rep_id TINYINT(2) UNSIGNED NOT NULL
  ppm_vers VARCHAR(20)
  KEY (dist_id)
  • ppm_id

    This is the primary (unique) key of the table.

  • dist_id

    This is the id of the distribution appearing in the dists table.

  • rep_id

    This is the id of the repository appearing in the $repositories data structure.

  • ppm_vers

    This is the version of the ppm package found.

reps

This table contains information on the Win32 ppm repositories specified in $repositories of CPAN::Search::Lite::Util. The table is created as

  rep_id SMALLINT UNSIGNED NOT NULL
  abs TINYTEXT
  browse TINYTEXT
  perl VARCHAR(10)
  alias VARCHAR(20)
  KEY (rep_id)
  • rep_id

    This is the primary (unique) key of the table, and corresponds to the rep_id of the ppms table.

  • abs

    This is a description of the repository.

  • browse

    This is a URL where one can browse the repository.

  • perl

    This specifies the perl version the repository corresponds to.

  • alias

    This specifies a short alias for the repository.

chapters

This contains information on the chapters. The table is created as

  chapterid SMALLINT UNSIGNED NOT NULL
  chap_link TINYTEXT
  KEY (chapterid)
  • chapterid

    This is the id of the distribution appearing in the dists table.

    This is the primary (unique) key of the table, and corresponds to the chapterid of the dists, mods, and chaps table.

  • chap_link

    This is a description of the chapter that chapterid corresponds to (eg, File_Handle_Input_Output).

CATEGORIES

When uploading a module to PAUSE, there exists an option to assign it to one of 24 broad categories. However, many modules have not been assigned such a category, for one reason or another. When populating the tables, the AI::Categorizer module is used to guess a possible category for those modules that haven't been assigned one, based on a training set based on the modules that have been assigned a category (see <AI::Categorizer> for general details). If this guess is above a configurable threshold (see CPAN::Search::Lite::Index, the guess is accepted and subsequently inserted into the database, as well as updating the categories associated with the module's distribution.

SEE ALSO

CPAN::Search::Lite::Index