The base of your reader.
Reader:
Reader is described as the this that will read (parse) the information (pages).
So the reader parses documents. The reader holds all the logic used to crawl a web page.. it knows all the rules to get the content transformed into objects.
After the reader makes its job, it is suposed to pass the information into the Writer, which will then write to a file or write over a network, etc.
____________ __________ ___________ | Internet | <<=============== | Reader | | Writer | |__________| ===============>> |________| ===============>> |_________| reader requests The Writer: information and - saves parse.Then send - send email to the Writer - save stats
*** will be renamed to request_storage or something like that.
holds values that are passed between pages navigation.
ie: im collecting data for an object, and, there is some stuff on page#1 and some other stuff on page#2 and #3. Then i can use passed_key_values to pass keys and values to my next page.
holds the current session headers
shortcut for $self->robot->queue->append
shortcut for $self->robot->queue->prepend
shortcut for $self->robot->parser->tree
shortcut for $self->robot->parser->xml
To install HTML::Robot::Scrapper, copy and paste the appropriate command in to your terminal.
cpanm
cpanm HTML::Robot::Scrapper
CPAN shell
perl -MCPAN -e shell install HTML::Robot::Scrapper
For more information on module installation, please visit the detailed CPAN module installation guide.