DocSet::Config - A superclass that handles object's configuration and data


  use DocSet::Config ();
  my $mime = $self->ext2mime($ext);
  my $class = $self->conv_class($src_mime, $dst_mime);


  my @files = $self->files_to_copy(files_to_copy);
  my @files = $self->expand_dir();
  $self->set($key => $val);
  $self->set_dir($dir_name => $val);
  $val = $self->get($key);

#XXX my @docsets = $self->docsets(); #XXX my @links = $self->links(); #XXX my @chapters = $self->src_chapters(); my @chapters = $self->trg_chapters();

  my $sitemap = $self->sitemap();
  my $cache = $self->cache(); 
  $package = $self->path2package($path);
  my @objects = $self->stored_objects();


This objects lays in the base of the DocSet class and provides configuration and internal data storage/retrieval methods.

At the end of this document the generic configuration file is explained.


META: to be completed (see SYNOPSIS meanwhile)

  • ext2mime

  • conv_class

  • read_config

  • merge_config

  • options

  • files_to_copy

  • expand_dir

  • set

  • set_dir

  • get

  • get_file

  • get_dir

  • docsets

  • links

  • src_chapters

  • trg_chapters

  • cache

  • path2package

  • object_store

  • stored_objects


Each DocSet has its own configuration file.


Currently the configuration file is a simple perl script that is expected to declare an array @c with all the docset properties in it. Later on more configuration formats will be supported.

We use the @c array because some of the configuration attributes may be repeated, so the hash datatype is not suitable here. Otherwise this array looks exactly like a hash:

  key1 => val1,
  key2 => val2,
  keyN => valN

Of course you can declare any other perl variables and do whatevery you want, but after the config file is run, it should have @c set.

Don't forget to end the file with 1;.

Declare once attributes

The following attributes must be declared at least in the top-level config.cfg file:

  • dir

         dir => {
                 # the resulting html files directory
                 dst_html   => "dst_html",
                 # the resulting ps and pdf files directory (and special
                 # set of html files used for creating the ps and pdf
                 # versions.)
                 dst_ps     => "dst_ps",
                 # the resulting split version html files directory
                 dst_split_html => "dst_split_html",
                 # location of the templates relative to the root dir
                 # (searched left to right)
                 tmpl       => [qw(tmpl/custom tmpl/std tmpl)],
                 # search path for pods, etc. must put more specific paths first!
                 search_paths => [qw(
                 # what extensions to search for
                 search_exts => [qw(pod pm html)],
  • file

         file => {
                  # the html2ps configuration file
                  html2ps_conf  => "conf/html2ps.conf",

Generally you should specify these only in the top-level config file, and only specify these again in sub-level config files, if you want to override things for the sub-docset and its successors.

DocSet must attributes

The following attributes must be declared in every docset configuration:

  • id

    a unique id of the docset. The uniquness should be preserved across any parallel docsets.

  • stitle

    the short title of the docset, used in the menu and the navigation breadcrumb. If it's not specified the title attribute is used instead.

  • title

    the title of the docset. If it's not specified the stitle attribute is used instead.

  • abstract

    a short abstract

DocSet Components

Any DocSet components can be repeated as many times as wanted. This allows to mix various types of nodes and still have oredered the way you want. You can have a chapter followed by a docset and followed by a few more chapters and ended with a link.

The value of each component can be either a single item or a reference to an array of items.

  • docsets

    the docset can recursively include other docsets, simply list the directories the other docsets can be found in (where the config.cfg file can be found)

  • chapters

    Each chapter can be specified as a path to its source document.

  • links

    The docset supports hyperlinks. Each link must be declared as a hash reference with keys: id, link, title and abstract.

    If you want to link to an external resource start the link, with URI (e.g. http://). But this attribute also works for local links, for example, if the same generated page should be linked from more than one place, or if there is some non parsed object that needs to be linked to after it gets copied via copy_glob attribute in the same or another docset.

  • sitemap

    Sitemap is a special kind of chapter rendered by calling the sitemap template, which usually traverses the caches and builds a nested tree of all documents in the docset and below it. Note that if using this attribute in the inner docsets, it'll work the same as using it in the outmost docset, but the tree will show only the from the inner docset and below it. DWIM.

    The specification is exactly like the links attribute, but there can be only one sitemap entry per config file, therefore its value is a reference to a hash with the same keys as the links nodes. In the example below you can see how it get specified. The only thing to think about is the link entry:

      link     => 'sitemap.html',

    which says where the file will be generated relative to the directory config.cfg resides in. So normally you will just use the same entry as the one in the example that follows.

    As we mentioned, the autogenerated sitemap will be automatically linked together with chapters, docsets and links, depending on where the sitemap attribute has been added in the configuration file. Of course if you desire to link to the sitemap in a different way, you can always define it in the hidden container, as it'll be explained later.

  • changes

      changes => 'Changes.pod',

    The changes attribute accepts a single element which is a source chapter for the changes file. The only difference from the hidden chapter is that it's possible to access directly to its navigation object from within the index templates, via:

        changes_id = doc.nav.index_node.extra.changes;
        IF changes_id;
           changes_nav = doc.nav.by_id(changes_id);

    Now changes_nav points to the changes chapter, similar to doc.nav. So for example you can retrieve a link to it as:

    or the title as:


    This element was added as an improvement over the inclusion of the Changes.pod chapter or alike along with all other chapters because usually people don't want to see changes and when the docset pdf is created huge changes files can be an unwanted burden, so now if this attribute is included, the pdf for the docset won't include this file in it.

This is an example:

     docsets =>  ['docs', 'cool_docset'],
     chapters => [
     docsets => [
     chapters => 'foo/bar/zed.pod',
     changes => 'Changes.pod',
     links => [
          id       => 'asf',
          link     => '',
          title    => 'The ASF Projects',
          abstract => "There many other ASF Projects",
     sitemap => {
         id       => 'sitemap',
         link     => 'sitemap.html',
         title    => "The Site Map",
         abstract => "You reach any document on our site from this sitemap",

Since normally books consist of parts which group chapters by a common theme, we support this feature as well. So the index can now be generated as:

  part I: Installation
  * Starting
  * Installing

  part II: Troubleshooting
  * Debugging
  * Errors
  * ASF
  * Offline Help

This happens only if this feature is used, otherwise a plain flat toc is used: to enable this feature simply splice nodes with declaration of a new group using the group attribute:

  group => 'Installation',
  chapters => [qw(start.pod install.pod)],

  group => 'Troubleshooting',
  chapters => [qw(debug.pod errors.pod)],
  links    => [
          id       => 'asf',
          link     => '',
          title    => 'The ASF Projects',
          abstract => "There many other ASF Projects",
  chapters => ['offline_help.pod'],

Hidden Objects

docsets and chapters can be marked as hidden. This means that they will be normally processed but won't be linked from anywhere.

Since the hidden objects cannot belong to any group and it doesn't matter when they are listed in the config file, you simply put one or more docsets and chapters into a special attribute hidden which of course can be repeated many times just like most of the attributes.

For example:

  chapters => [qw(start.pod install.pod)],
  hidden => {
      chapters => ['offline_help.pod'],
      docsets  => ['hidden_docset'],

The cool thing is that the hidden docsets and chapters will see all the unhidden objects, so those who know the "secret" URL will be able to navigate back to the non-hidden objects transparently.

This feature could be useful for example to create pages normally not accessed by users. For example if you want to create a page used for the Apache's ErrorDocument handler, you want to mark it hidden, because it shouldn't be linked from anywhere, but once the user hit it (because a non-existing URL has been entered) the user will get a perfect page with all the proper navigation widgets (menu, etc) in it.


Sometimes you want different docsets to be run under different command line options. This is impossible to accomplish from the command line, therefore the options that are different from the default can be set inside the config.cfg files. For example if we have a project which includes two docsets: one to be rendered as slides and the other as handouts. Since the slides mode is off by default, all we need to do is to add:

    options => {
        slides_mode => 1,

in the config.cfg file of that docset. Now when the whole project is built without specifying the slides mode on the command line, this docset and its sub-docsets will be built using the slides mode. Of course sub-sets can override their parent's setting, for example in our example by saying:

    options => {
        slides_mode => 0,

Note that merging of the global (command line options) and local (docset specific options) is done using the OR operator, meaning that if either of the two or both set an option, it's set. Otherwise it's not set. This works in that way, because the command line options only turn options on, they don't turn them off.

Therefore with our example, if the slides mode will be turned on the command line, the whole project will be built in the slides mode. So essentially the command line options override the local options.

META: currently the merging happens only in DocSet::Source::POD, other places only check the global command line options. This can be adjusted as needed, without breaking anything. To find out the list of options see %options in bin/docset_build.

Copy unmodified

Usually the generated UI includes images, CSS files and of course some files must be copied without any modifications, like files including pure code, archives, etc. There are two attributes to handle this:

  • copy_glob

    Accepts a reference to an array of files and directories to copy. The items of the array are run through glob(), therefore wild characters can be used to match only certain files. But be careful since if you say:


    and there are some hidden files (and dirs) that need to be copied, they won't be copied, since * doesn't match them.

    For example:

         # non-pod/html files or dirs to be copied unmodified
         copy_glob => [

    will copy the file style.css and all the files and directories under the images/ directory into the parallel tree at the destination directory.

  • copy_skip

    While copy_glob allows specifying complete dirs with potentially many nested sub-dirs to be copied, this becomes inconvenient if we want to copy all but a few files in these directories. The copy_skip rule comes to help. It accepts a reference to an array of regular expressions that will be applied to each candidate to be copied as suggested by the copy_glob attribute. If the regular expression matches the file won't be copied.

    One of the useful examples would be:

         copy_skip => [
             '(?:^|\/)CVS(?:\/|$)', # skip cvs control files
             '#|~',                 # skip emacs backup files

    META: does copy_skip apply to all sub-docsets, if sub-docsets specify their own copy_glob?

    Make sure to escape / chars.

Extra Features

If you want in the index file include a special top and bottom sections in addition to the linked list of the docset contents, you can do:

     body => {
         top => 'index_top.html',
         bot => 'index_bot.html',

any of top and bot sub-attributes are optional. If these source docs are for example in HTML, they have to be written in a proper HTML, so the parser will be able to extract the body. Of course these can be POD or other formats as well. But all is taken from these files are their bodies, so the title and other meta-data are ignored.


Stas Bekman <stas (at)>