Web::Query - Yet another scraping library like jQuery
use Web::Query; wq('http://google.com/search?q=foobar') ->find('h2') ->each(sub { my $i = shift; printf("%d) %s\n", $i+1, $_->text });
Web::Query is a yet another scraping framework, have a jQuery like interface.
Yes, I know ingy's pQuery. But it's just a alpha quality. It doesn't works. Web::Query built at top of the CPAN modules, HTML::TreeBuilder::XPath, LWP::UserAgent, and HTML::Selector::XPath.
So, this module uses HTML::Selector::XPath and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?).
THIS LIBRARY IS UNDER DEVELOPMENT. ANY API MAY CHANGE WITHOUT NOTICE.
This is a shortcut for Web::Query->new($stuff). This function is exported by default.
Web::Query->new($stuff)
Create new instance of Web::Query. You can make the instance from URL(http, https, file scheme), HTML in string, URL in string, URI object, and instance of HTML::Element.
This method throw the exception on unknown $stuff.
This method returns undefined value on non-successful response with URL.
Currently, the only option valid option is indent, which will be used as the indentation string if the object is printed.
Create new instance of Web::Query from instance of HTML::Element.
Create new instance of Web::Query from HTML.
Create new instance of Web::Query from URL.
If the response is not success(It means /^20[0-9]$/), this method returns undefined value.
You can get a last result of response, use the $Web::Query::RESPONSE.
$Web::Query::RESPONSE
Here is a best practical code:
my $url = 'http://example.com/'; my $q = Web::Query->new_from_url($url) or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line;
Create new instance of Web::Query from file name.
Get/Set the innerHTML.
Return the elements associated with the object as strings. If called in a scalar context, only return the string representation of the first element.
Get/Set the inner text.
Get/Set the attribute value in element.
This method find nodes by $selector from $q. $selector is a CSS3 selector.
Visit each nodes. $i is a counter value, 0 origin. $elem is iteration item. $_ is localized by $elem.
$i
$elem
$_
Creates a new array with the results of calling a provided function on every element.
Reduce the elements to those that pass the function's test.
Back to the before context like jQuery.
Return the number of DOM elements matched by the Web::Query object.
Return the parent node from $q.
$q
Return the first matching element.
This method constructs a new Web::Query object from the first matching element.
Return the last matching element.
This method constructs a new Web::Query object from the last matching element.
Delete the elements associated with the object from the DOM.
# remove all <blink> tags from the document $q->find('blink')->remove;
Replace the elements of the object with the provided replacement. The replacement can be a string, a Web::Query object or an anonymous function. The anonymous function is passed the index of the current node and the node itself (with is also localized as $_).
Web::Query
my $q = wq( '<p><b>Abra</b><i>cada</i><u>bra</u></p>' ); $q->find('b')->replace_with('<a>Ocus</a>); # <p><a>Ocus</a><i>cada</i><u>bra</u></p> $q->find('u')->replace_with($q->find('b')); # <p><i>cada</i><b>Abra</b></p> $q->find('i')->replace_with(sub{ my $name = $_->text; return "<$name></$name>"; }); # <p><b>Abra</b><cada></cada><u>bra</u></p>
You can specify your own instance of LWP::UserAgent.
$Web::Query::UserAgent = LWP::UserAgent->new( agent => 'Mozilla/5.0' );
new_from_url() is no longer throws exception on bad response from HTTP server.
Tokuhiro Matsuno <tokuhirom AAJKLFJEF@ GMAIL COM>
pQuery
Copyright (C) Tokuhiro Matsuno
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install Web::Query, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Web::Query
CPAN shell
perl -MCPAN -e shell install Web::Query
For more information on module installation, please visit the detailed CPAN module installation guide.