{ items => \@items, pageno => $pageno, num_pages => $num_pages, nextlink => $nextlink, }
+{ id => $id, url => $url, }
($text, $pageurl, $listre)
Function:
($text, $scrapespec, $scrapepostpro)
($cjar, $html, $real_url, $vars, $varnamechange)
Parses out redirects done with Refresh header.
Refresh
Gets web content, iterating through redirects while capturing cookies.
Mail::POP3::Folder::webscrape - class that makes a website look like a POP3 mailbox
use Mail::POP3; my $m = Mail::POP3::Folder::webscrape->new( $user_name, $password, $starturl, # where the first form is found $userfieldnames, # listref same order as values supplied in USER $otherfields, # hash fieldname => value $listre, # field => RE; fields: pageno, num_pages, nextlink, itemurls $itemre, # hash extractfield => RE to get it from "page" $itempostpro, # extractfield => sub returns pairS of field/value $itemurl2id, # sub taking URL, returns unique, persistent item ID $itemformat, # takes item hash, returns email message $messagesize, );
This class makes a website look like a POP3 mailbox in accordance with the requirements of a Mail::POP3 server. It is entirely API-compatible with Mail::POP3::Folder::mbox.
The virtual e-mails will all be at least (the amount specified in the last parameter to new - recommend 2000) octets long, being padded to this length. While it should truncate if necessary, the class currently does not.
new
$user_name
The username is interpreted as a ":"-separated string, also "URL-encoded" such that spaces are encoded as "+" characters. The values supplied will be for variables named in the $userfieldnames parameter.
$userfieldnames
$password
The password is ignored.
$starturl
The webpage that contains the initial search form.
A reference to a list of the names of CGI variables whose values are supplied by the POP3 user in the username.
$otherfields
Reference to hash of CGI field mapped to value.
$listre
Reference to hash of fieldname mapped to regular expression for finding the relevant value on each search result page. The value is expected to be in $1. These fields must be defined: pageno, num_pages, nextlink, itemurls. The last may (obviously) match more than once.
$1
pageno
num_pages
nextlink
itemurls
$itemre
Reference to hash of fieldname mapped to regular expression for finding the relevant value on each item's page (as linked to by an itemurl as found from the above parameter), similar to the above. Any number of fields may be sought, and a hash of the fieldname to the found value will be passed to the item-formatting function below.
itemurl
$itempostpro
Reference to hash of fieldname mapped to reference to function that is called with the field name and value, and will return a list of one or more pairs of fieldname / value. Typical use might be to remove HTML from a result.
$itemurl2id
Reference to function that is called with each itemurl, and will return a unique, persistent identifier for that item, compatible with an RFC 1939 message ID.
$itemformat
Reference to function that is called for each item, taking two parameters: a reference to a hash of fieldname / value (as extracted by the "item RE" above), and the unique message-ID (as generated above); and will return the text of an email message describing that item.
$messagesize
The size of each message, in the style of Procrustes. This is so the class can return an accurate(ish) result for the POP3 command STAT knowing only the number of hits there have been, and not having downloaded and formatted every single item to see how large each one is - such an extra step would probably trigger timeouts.
A script webscrape is supplied in the scripts subdirectory of the distribution that can be used to test and develop a working configuration for this class.
webscrape
scripts
None extra are defined.
RFC 1939, Mail::POP3::Folder::mbox.
To install Mail::POP3, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Mail::POP3
CPAN shell
perl -MCPAN -e shell install Mail::POP3
For more information on module installation, please visit the detailed CPAN module installation guide.