The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Web::Sitemap - Simple way to generate sitemap files with paging support

SYNOPSIS

        use Web::Sitemap;

        my $sm = Web::Sitemap->new(
                output_dir => '/path/for/sitemap',

                ### Options ###

                temp_dir    => '/path/to/tmp',
                loc_prefix  => 'http://my_domain.com',
                index_name  => 'sitemap',
                file_prefix => 'sitemap.',

                # mark for grouping urls
                default_tag => 'my_tag',


                # add <mobile:mobile/> inside <url>, and appropriate namespace (Google standard)
                mobile      => 1,

                # add appropriate namespace (Google standard)
                images      => 1,

                # additional namespaces (scalar or array ref) for <urlset>
                namespace   => 'xmlns:some_namespace_name="..."',

                # location prefix for files-parts of the sitemap (default is loc_prefix value)
                file_loc_prefix  => 'http://my_domain.com',

                # specify data input charset
                charset => 'utf8',

                move_from_temp_action => sub {
                        my ($temp_file_name, $public_file_name) = @_;

                        # ...some action...
                        #
                        # default behavior is
                        # File::Copy::move($temp_file_name, $public_file_name);
                }

        );

        $sm->add(\@url_list);


        # When adding a new portion of URL, you can specify a label for the file in which these will be URL

        $sm->add(\@url_list1, tag => 'articles');
        $sm->add(\@url_list2, tag => 'users');


        # If in the process of filling the file number of URL's will exceed the limit of 50 000 URL or the file size is larger than 50MB, the file will be rotate

        $sm->add(\@url_list3, tag => 'articles');


        # After calling finish() method will create an index file, which will link to files with URL's

        $sm->finish;

DESCRIPTION

This module is an utility for generating indexed sitemaps.

Each sitemap file can have up to 50 000 URLs or up to 50MB in size (after decompression) according to sitemaps.org. Any page that exceeds that limit must use sitemap index files instead.

Web::Sitemap generates a single sitemap index with links to multiple sitemap pages. The pages are automatically split when they reach the limit and are always gzip compressed. Files are created in form of temporary files and copied over to the destination directory, but the copy action can be hooked into to change that behavior.

INTERFACE

Web::Sitemap only provides OO interface.

Methods

new

        my $sitemap = Web::Sitemap->new(output_dir => $dirname, %options);

Constructs a new Web::Sitemap object that will generate the sitemap.

Files will be put into output_dir. This argument is required.

Other optional arguments include:

  • temp_dir

    Path to a temporary directory. Must already exist and be writable. If not specified, a new temporary directory will be created using File::Temp.

  • loc_prefix

    A location prefix for all the urls in the sitemap, like 'http://my_domain.com'. Defaults to an empty string.

  • index_name

    Name of the sitemap index (basename without the extension). Defaults to 'sitemap'.

  • file_prefix

    Prefix for all sitemap files containing URLs. Defaults to 'sitemap.'.

  • default_tag

    A default tag that will be used for grouping URLs in files when they are added without an explicit tag. Defaults to 'pages'.

  • mobile

    Will add a mobile namespace to the sitemap files, and each URL will contain <mobile:mobile/>. This is a Google standard. Disabled by default.

  • images

    Will add images namespace to the sitemap files. This is a Google standard. Disabled by default.

  • namespace

    Additional namespaces to be added to the sitemap files. This can be a string or an array reference containing strings. Empty by default.

  • file_loc_prefix

    A prefix that will be put before the filenames in the sitemap index. This will not cause files to be put in a different directory, will only affect the sitemap index. Defaults to the value of loc_prefix.

  • charset

    Encoding to be used for writing the files. Defaults to 'utf8'.

  • move_from_temp_action

    A coderef that will change how the files are handled after successful generation. Will be called once for each generated file and be passed these arguments: $temporary_file_path, $destination_file_path.

    By default it will copy the files using File::Copy::move.

add

        $sitemap->add(\@links, tag => $tagname);

Adds more links to the sitemap under $tagname (can be ommited - defaults to pages or the one specified in the constructor).

Links can be simple scalars (URL strings) or a hashref. See "new" in Web::Sitemap::Url for a list of possible hashref arguments.

Can be called multiple times.

finish

        $sitemap->finish;

Finalizes the sitemap creation and calls the function to move temporary files to the output directory.

EXAMPLES

Support for Google images format

Format 1

        $sitemap->add([{
                loc => 'http://test1.ru/',
                images => {
                        caption_format => sub {
                                my ($iterator_value) = @_;
                                return sprintf('Vasya - foto %d', $iterator_value);
                        },
                        loc_list => [
                                'http://img1.ru/',
                                'http://img2.ru'
                        ]
                }
        }]);

Format 2

        $sitemap->add([{
                loc => 'http://test11.ru/',
                images => {
                        caption_format_simple => 'Vasya - foto',
                        loc_list => ['http://img11.ru/', 'http://img21.ru']
                }
        }]);

Format 3

        $sitemap->add([{
                loc => 'http://test122.ru/',
                images => {
                        loc_list => [
                                { loc => 'http://img122.ru/', caption => 'image #1' },
                                { loc => 'http://img133.ru/', caption => 'image #2' },
                                { loc => 'http://img144.ru/', caption => 'image #3' },
                                { loc => 'http://img222.ru', caption => 'image #4' }
                        ]
                }
        }]);
);

AUTHOR

Mikhail N Bogdanov <mbogdanov at cpan.org >

CONTRIBUTORS

In no particular order:

Ivan Bessarabov

Bartosz Jarzyna (@brtastic)

LICENSE

This module and all the packages in this module are governed by the same license as Perl itself.