NAME
Web::Sitemap - Simple way to generate sitemap files with paging support
SYNOPSIS
use Web::Sitemap;
my $sm = Web::Sitemap->new(
output_dir => '/path/for/sitemap',
### Options ###
temp_dir => '/path/to/tmp',
loc_prefix => 'http://my_domain.com',
index_name => 'sitemap',
file_prefix => 'sitemap.',
# mark for grouping urls
default_tag => 'my_tag',
# add <mobile:mobile/> inside <url>, and appropriate namespace (Google standard)
mobile => 1,
# add appropriate namespace (Google standard)
images => 1,
# additional namespaces (scalar or array ref) for <urlset>
namespace => 'xmlns:some_namespace_name="..."',
# location prefix for files-parts of the sitemap (default is loc_prefix value)
file_loc_prefix => 'http://my_domain.com',
# specify data input charset
charset => 'utf8',
move_from_temp_action => sub {
my ($temp_file_name, $public_file_name) = @_;
# ...some action...
#
# default behavior is
# File::Copy::move($temp_file_name, $public_file_name);
}
);
$sm->add(\@url_list);
# When adding a new portion of URL, you can specify a label for the file in which these will be URL
$sm->add(\@url_list1, tag => 'articles');
$sm->add(\@url_list2, tag => 'users');
# If in the process of filling the file number of URL's will exceed the limit of 50 000 URL or the file size is larger than 50MB, the file will be rotate
$sm->add(\@url_list3, tag => 'articles');
# After calling finish() method will create an index file, which will link to files with URL's
$sm->finish;
DESCRIPTION
This module is an utility for generating indexed sitemaps.
Each sitemap file can have up to 50 000 URLs or up to 50MB in size (after decompression) according to sitemaps.org. Any page that exceeds that limit must use sitemap index files instead.
Web::Sitemap generates a single sitemap index with links to multiple sitemap pages. The pages are automatically split when they reach the limit and are always gzip compressed. Files are created in form of temporary files and copied over to the destination directory, but the copy action can be hooked into to change that behavior.
INTERFACE
Web::Sitemap only provides OO interface.
Methods
new
my $sitemap = Web::Sitemap->new(output_dir => $dirname, %options);
Constructs a new Web::Sitemap object that will generate the sitemap.
Files will be put into output_dir. This argument is required.
Other optional arguments include:
temp_dir
Path to a temporary directory. Must already exist and be writable. If not specified, a new temporary directory will be created using File::Temp.
loc_prefix
A location prefix for all the urls in the sitemap, like 'http://my_domain.com'. Defaults to an empty string.
index_name
Name of the sitemap index (basename without the extension). Defaults to 'sitemap'.
file_prefix
Prefix for all sitemap files containing URLs. Defaults to 'sitemap.'.
default_tag
A default tag that will be used for grouping URLs in files when they are added without an explicit tag. Defaults to 'pages'.
mobile
Will add a mobile namespace to the sitemap files, and each URL will contain
<mobile:mobile/>
. This is a Google standard. Disabled by default.images
Will add images namespace to the sitemap files. This is a Google standard. Disabled by default.
namespace
Additional namespaces to be added to the sitemap files. This can be a string or an array reference containing strings. Empty by default.
file_loc_prefix
A prefix that will be put before the filenames in the sitemap index. This will not cause files to be put in a different directory, will only affect the sitemap index. Defaults to the value of
loc_prefix
.charset
Encoding to be used for writing the files. Defaults to 'utf8'.
move_from_temp_action
A coderef that will change how the files are handled after successful generation. Will be called once for each generated file and be passed these arguments:
$temporary_file_path, $destination_file_path
.By default it will copy the files using File::Copy::move.
add
$sitemap->add(\@links, tag => $tagname);
Adds more links to the sitemap under $tagname (can be ommited - defaults to pages
or the one specified in the constructor).
Links can be simple scalars (URL strings) or a hashref. See "new" in Web::Sitemap::Url for a list of possible hashref arguments.
Can be called multiple times.
finish
$sitemap->finish;
Finalizes the sitemap creation and calls the function to move temporary files to the output directory.
EXAMPLES
Support for Google images format
Format 1
$sitemap->add([{
loc => 'http://test1.ru/',
images => {
caption_format => sub {
my ($iterator_value) = @_;
return sprintf('Vasya - foto %d', $iterator_value);
},
loc_list => [
'http://img1.ru/',
'http://img2.ru'
]
}
}]);
Format 2
$sitemap->add([{
loc => 'http://test11.ru/',
images => {
caption_format_simple => 'Vasya - foto',
loc_list => ['http://img11.ru/', 'http://img21.ru']
}
}]);
Format 3
$sitemap->add([{
loc => 'http://test122.ru/',
images => {
loc_list => [
{ loc => 'http://img122.ru/', caption => 'image #1' },
{ loc => 'http://img133.ru/', caption => 'image #2' },
{ loc => 'http://img144.ru/', caption => 'image #3' },
{ loc => 'http://img222.ru', caption => 'image #4' }
]
}
}]);
);
AUTHOR
Mikhail N Bogdanov <mbogdanov at cpan.org >
CONTRIBUTORS
In no particular order:
Ivan Bessarabov
Bartosz Jarzyna (@brtastic)
LICENSE
This module and all the packages in this module are governed by the same license as Perl itself.