The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MediaWiki::Bot - a Wikipedia bot framework written in Perl

SYNOPSIS

use MediaWiki::Bot;

my $editor = MediaWiki::Bot->new('Account'); $editor->login('Account', 'password'); $editor->revert('Wikipedia:Sandbox', 'Reverting vandalism', '38484848');

DESCRIPTION

MediaWiki::Bot is a framework that can be used to write Wikipedia bots.

Many of the methods use the MediaWiki API (http://en.wikipedia.org/w/api.php).

AUTHOR

The MediaWiki::Bot team (Alex Rowe, Jmax, Oleg Alexandrov, Dan Collins, Mike.lifeguard) and others.

COPYING

Copyright (C) 2006, 2007 by the MediaWiki::Bot team

This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

METHODS

new($options_hashref)

Calling MediaWiki::Bot->new() will create a new MediaWiki::Bot object.

  • agent sets a custom useragent

  • assert sets a parameter for the AssertEdit extension (commonly 'bot'). Refer to http://mediawiki.org/wiki/Extension:AssertEdit.

  • operator allows the bot to send you a message when it fails an assert, and will be integrated into the default useragent (which may not be used if you set agent yourself). The message will tell you that $useragent is logged out, so use a descriptive one if you set it.

  • maxlag allows you to set the maxlag parameter (default is the recommended 5s). Please refer to the MediaWiki documentation prior to changing this from the default.

  • protocol allows you to specify 'http' or 'https' (default is 'http'). This is commonly used with the domain and path settings below.

  • host sets the domain name of the wiki to connect to.

  • path sets the path to api.php (with no trailing slash).

  • login_data is a hashref of data to pass to login(). See that section for a description.

For example:

    my $bot = MediaWiki::Bot->new(
        useragent   => 'MediaWiki::Bot 3.0.0 (User:Mike.lifeguard)',
        assert      => 'bot',
        protocol    => 'https',
        host        => 'secure.wikimedia.org',
        path        => 'wikipedia/meta/w',
        login_data  => { username => "Mike's bot account", password => "password" },
    );

For backward compatibility, you can specify up to three parameters:

    my $bot = MediaWiki::Bot->new('MediaWiki::Bot 2.3.1 (User:Mike.lifeguard)', $assert, $operator);

This deprecated form will never do auto-login or autoconfiguration.

set_wiki($host[,$path[,$protocol]])

Set what wiki to use. $host is the domain name; $path is the path before api.php (usually 'w'); $protocol is either 'http' or 'https'. For example:

    $bot->set_wiki('de.wikipedia.org', 'w');

will tell it to use http://de.wikipedia.org/w/index.php. The default settings are 'en.wikipedia.org' with a path of 'w'. You can also pass a hashref using keys with the same names as these parameters. To use the secure server:

    $bot->set_wiki(
        protocol    => 'https',
        host        => 'secure.wikimedia.org',
        path        => 'wikipedia/meta/w',
    );

For backward compatibility, you can specify up to two parameters in this deprecated form:

    $bot->set_wiki($host, $path);

login($login_hashref)

Logs the use $username in, optionally using $password. First, an attempt will be made to use cookies to log in. If this fails, an attempt will be made to use the password provided to log in, if any. If the login was successful, returns true; false otherwise.

    $bot->login(
        {
            username => $username,
            password => $password,
        }
    ) or die "Login failed";

Once logged in, attempt to do some simple auto-configuration. At present, this consists of:

  • Warning if the account doesn't have the bot flag, and isn't a sysop account.

  • Setting the use of apihighlimits if the account has that userright.

  • Setting an appropriate default assert.

You can skip this autoconfiguration by passing autoconfig = 0>

For backward compatibility, you can call this as

    $bot->login($username, $password);

This deprecated form will never do autoconfiguration.

set_highlimits($flag)

Tells MediaWiki::Bot to start/stop using APIHighLimits for certain queries.

    $bot->set_highlimits(1);

logout()

The logout procedure deletes the login tokens and other browser cookies.

    $bot->logout();

edit($options_hashref)

Puts text on a page. If provided, use a specified edit summary, mark the edit as minor, as a non-bot edit, or add an assertion. An MD5 hash is sent to guard against data corruption while in transit.

    my $text = $bot->get_text('My page');
    $text .= "\n\n* More text\n";
    $bot->edit({
        page    => 'My page',
        text    => $text,
        summary => 'Adding new content',
    });

You can also call this using the deprecated form:

    $bot->edit($page, $text, $summary, $is_minor, $assert, $markasbot);

get_history($pagename[,$limit])

Returns an array containing the history of the specified page, with $limit number of revisions. The array structure contains 'revid', 'user', 'comment', 'timestamp_date', and 'timestamp_time'.

get_text($pagename,[$revid,$section_number])

Returns an the wikitext of the specified page. If $revid is defined, it will return the text of that revision; if $section_number is defined, it will return the text of that section. A blank page will return wikitext of "" (which evaluates to false in Perl, but is defined); a nonexistent page will return undef (which also evaluates to false in Perl, but is obviously undefined). You can distinguish between blank and nonexistent by using defined():

    my $wikitext = $bot->get_text('Page title');
    print "Wikitext: $wikitext\n" if defined $wikitext;

get_id($pagename)

Returns the id of the specified page. Returns undef if page does not exist.

    my $pageid = $bot->get_id("Main Page");
    croak "Page doesn't exist\n" if !defined($pageid);

get_pages(\@pages)

Returns the text of the specified pages in a hashref. Content of undef means page does not exist. Also handles redirects or article names that use namespace aliases.

    my @pages = ('Page 1', 'Page 2', 'Page 3');
    my $thing = $bot->get_pages(\@pages);
    foreach my $page (keys %$thing) {
        my $text = $thing->{$page};
        print "$text\n" if defined($text);
    }

revert($pagename, $revid[,$summary])

Reverts the specified page to $revid, with an edit summary of $summary. A default edit summary will be used if $summary is omitted.

    $bot->revert('User:Mike.lifeguard', 1972950, 'rvv');

undo($pagename, $revid[,$summary[,$after]])

Reverts the specified $revid, with an edit summary of $summary, using the undo function. To undo all revisions from $revid up to but not including this one, set $after to another revid. If not set, just undo the one revision ($revid).

get_last($page, $user)

Returns the revid of the last revision to $page not made by $user. undef is returned if no result was found, as would be the case if the page is deleted.

    my $revid = $bot->get_last("User:Mike.lifeguard/sandbox", "Mike.lifeguard");
    print "$revid\n" if defined($revid);

update_rc([$limit])

Returns an array containing the Recent Changes to the wiki Main namespace. The array structure contains 'pagename', 'revid', 'oldid', 'timestamp_date', and 'timestamp_time'.

Returns an array containing a list of all pages linking to $page. The array structure contains 'title' and 'redirect' is defined if the title is a redirect. $filter can be one of: all (default), redirects (list only redirects), nonredirects (list only non-redirects). $ns is a namespace number to search (pass an arrayref to search in multiple namespaces). $options is a hashref as described by MediaWiki::API: Set max to limit the number of queries performed. Set hook to a subroutine reference to use a callback hook for incremental processing. Refer to the section on linksearch() for examples.

A typical query:

    my @links = $bot->what_links_here("Meta:Sandbox", undef, 1, {hook=>\&mysub});
    sub mysub{
        my ($res) = @_;
        foreach my $hash (@$res) {
            my $title = $hash->{'title'};
            my $is_redir = $hash->{'redirect'};
            print "Redirect: $title\n" if $is_redir;
            print "Page: $title\n" unless $is_redir;
        }
    }

Transclusions are no longer handled by what_links_here() - use list_transcludes() instead.

list_transclusions($page[,$filter[,$ns[,$options]]])

Returns an array containing a list of all pages transcluding $page. The array structure contains 'title' and 'redirect' is defined if the title is a redirect. $filter can be one of: all (default), redirects (list only redirects), nonredirects (list only non-redirects). $ns is a namespace number to search (pass an arrayref to search in multiple namespaces). $options is a hashref as described by MediaWiki::API: Set max to limit the number of queries performed. Set hook to a subroutine reference to use a callback hook for incremental processing. Refer to the section on linksearch() or what_links_here() for examples.

A typical query:

    $bot->list_transclusions("Template:Tlx", undef, 4, {hook => \&mysub});
    sub mysub{
        my ($res) = @_;
        foreach my $hash (@$res) {
            my $title = $hash->{'title'};
            my $is_redir = $hash->{'redirect'};
            print "Redirect: $title\n" if $is_redir;
            print "Page: $title\n" unless $is_redir;
        }
    }

get_pages_in_category($category_name)

Returns an array containing the names of all pages in the specified category. Does not go into sub-categories.

get_all_pages_in_category($category_name)

Returns an array containing the names of ALL pages in the specified category, including sub-categories.

linksearch($link[,$ns[,$protocol[,$options]]])

Runs a linksearch on the specified link and returns an array containing anonymous hashes with keys 'url' for the outbound URL, and 'title' for the page the link is on. $ns is a namespace number to search (pass an arrayref to search in multiple namespaces). You can search by $protocol (http is default). The optional $options hashref is fully documented in MediaWiki::API.

Set max in $options to get more than one query's worth of results:

    my $options = { max => 10, }; # I only want some results
    my @links = $bot->linksearch("slashdot.org", 1, undef, $options);
    foreach my $hash (@links) {
        my $url = $hash->{'url'};
        my $page = $hash->{'title'};
        print "$page: $url\n";
    }

You can also specify a callback function in $options:

    my $options = { hook => \&mysub, }; # I want to do incremental processing
    $bot->linksearch("slashdot.org", 1, undef, $options);
    sub mysub {
        my ($res) = @_;
        foreach my $hashref (@$res) {
            my $url  = $hashref->{'url'};
            my $page = $hashref->{'title'};
            print "$page: $url\n";
        }
    }

purge_page($pagename)

Purges the server cache of the specified page. Pass an array reference to purge multiple pages. Returns true on success; false on failure. If you really care, a true return value is the number of pages successfully purged. You could check that it is the same as the number you wanted to purge.- maybe some pages don't exist, or you passed invalid titles, or you aren't allowed to purge the cache:

    my @to_purge = ('Main Page', 'A', 'B', 'C', 'Very unlikely to exist');
    my $size = scalar @to_purge;

    print "all-at-once:\n";
    my $success = $bot->purge_page(\@to_purge);

    if ($success == $size) {
        print "@to_purge: OK ($success/$size)\n";
    }
    else {
        my $missed = @to_purge - $success;
        print "We couldn't purge $missed pages (list was: "
            . join(', ', @to_purge)
            . ")\n";
    }

    # OR
    print "\n\none-at-a-time:\n";
    foreach my $page (@to_purge) {
        my $ok = $bot->purge_page($page);
        print "$page: $ok\n";
    }

get_namespace_names

get_namespace_names returns a hash linking the namespace id, such as 1, to its named equivalent, such as "Talk".

Gets a list of pages which include a certain image.

is_blocked($user)

Checks if a user is currently blocked.

test_blocked($user)

Retained for backwards compatibility. Use is_blocked($user) for clarity.

test_image_exists($page)

Checks if an image exists at $page. 0 means no, 1 means yes, local, 2 means on commons, 3 means doesn't exist but there is text on the page.

get_pages_in_namespace($namespace_id,$page_limit)

Returns an array containing the names of all pages in the specified namespace. The $namespace_id must be a number, not a namespace name. Setting $page_limit is optional. If $page_limit is over 500, it will be rounded up to the next multiple of 500.

count_contributions($user)

Uses the API to count $user's contributions.

last_active($user)

Returns the last active time of $user in YYYY-MM-DDTHH:MM:SSZ

recent_edit_to_page($page)

Returns timestamp and username for most recent (top) edit to $page.

get_users($page, $limit, $revision, $direction)

Gets the most recent editors to $page, up to $limit, starting from $revision and goint in $direction.

was_blocked($user)

Returns 1 if $user has ever been blocked.

test_block_hist($user)

Retained for backwards compatibility. Use was_blocked($user) for clarity.

expandtemplates($page[, $text])

Expands templates on $page, using $text if provided, otherwise loading the page text automatically.

get_allusers($limit)

Returns an array of all users. Default limit is 500.

db_to_domain($wiki)

Converts a wiki/database name (enwiki) to the domain name (en.wikipedia.org).

    my @wikis = ("enwiki", "kowiki", "bat-smgwiki", "nonexistent");
    foreach my $wiki (@wikis) {
        my $domain = $bot->db_to_domain($wiki);
        next if !defined($domain);
        print "$wiki: $domain\n";
    }

You can pass an arrayref to do bulk lookup:

    my @wikis = ("enwiki", "kowiki", "bat-smgwiki", "nonexistent");
    my $domains = $bot->db_to_domain(\@wikis);
    foreach my $domain (@$domains) {
        next if !defined($domain);
        print "$domain\n";
    }

domain_to_db($wiki)

As you might expect, does the opposite of domain_to_db(): Converts a domain name into a database/wiki name.

ERROR HANDLING

All functions will return undef in any handled error situation. Further error data is stored in $bot->{error}->{code} and $bot->{error}->{details}.