Mail::POP3::Folder::webscrape - class that makes a website look like a POP3 mailbox
use Mail::POP3; my $m = Mail::POP3::Folder::webscrape->new( $user_name, $password, $starturl, # where the first form is found $userfieldnames, # listref same order as values supplied in USER $otherfields, # hash fieldname => value $listre, # field => RE; fields: pageno, num_pages, nextlink, itemurls $itemre, # hash extractfield => RE to get it from "page" $itempostpro, # extractfield => sub returns pairS of field/value $itemurl2id, # sub taking URL, returns unique, persistent item ID $itemformat, # takes item hash, returns email message $messagesize, );
The virtual e-mails will all be at least (the amount specified in the last parameter to
new - recommend 2000) octets long, being padded to this length. While it should truncate if necessary, the class currently does not.
The username is interpreted as a ":"-separated string, also "URL-encoded" such that spaces are encoded as "+" characters. The values supplied will be for variables named in the
The password is ignored.
The webpage that contains the initial search form.
A reference to a list of the names of CGI variables whose values are supplied by the POP3 user in the username.
Reference to hash of CGI field mapped to value.
Reference to hash of fieldname mapped to regular expression for finding the relevant value on each search result page. The value is expected to be in
$1. These fields must be defined:
itemurls. The last may (obviously) match more than once.
Reference to hash of fieldname mapped to regular expression for finding the relevant value on each item's page (as linked to by an
itemurlas found from the above parameter), similar to the above. Any number of fields may be sought, and a hash of the fieldname to the found value will be passed to the item-formatting function below.
Reference to hash of fieldname mapped to reference to function that is called with the field name and value, and will return a list of one or more pairs of fieldname / value. Typical use might be to remove HTML from a result.
Reference to function that is called with each
itemurl, and will return a unique, persistent identifier for that item, compatible with an RFC 1939 message ID.
Reference to function that is called for each item, taking two parameters: a reference to a hash of fieldname / value (as extracted by the "item RE" above), and the unique message-ID (as generated above); and will return the text of an email message describing that item.
The size of each message, in the style of Procrustes. This is so the class can return an accurate(ish) result for the POP3 command STAT knowing only the number of hits there have been, and not having downloaded and formatted every single item to see how large each one is - such an extra step would probably trigger timeouts.
webscrape is supplied in the
scripts subdirectory of the distribution that can be used to test and develop a working configuration for this class.
None extra are defined.
RFC 1939, Mail::POP3::Folder::mbox.