Web::PageMeta - get page open-graph / meta data
use Web::PageMeta; my $page = Web::PageMeta->new(url => "https://www.apa.at/"); say $page->title; say $page->image;
async fetch previews and images:
use Web::PageMeta; my @urls = qw( https://www.apa.at/ http://www.diepresse.at/ https://metacpan.org/ https://github.com/ ); my @page_views = map { Web::PageMeta->new( url => $_ ) } @urls; Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get; foreach my $pv (@page_views) { say 'title> '.$pv->title; say 'img_size> '.length($pv->image_data); } # alternativelly instead of Future->wait_all() use Future::Utils qw( fmap_void ); fmap_void( sub { return $_[0]->fetch_image_data_ft }, foreach => [@page_views], concurrent => 3 )->get;
Get (not only) open-graph web page meta data. can be used in both normal and async code.
For any other than 200 http status codes during data downloads, HTTP::Exception is thrown.
Constructor, only "url" is required.
HTTP url to fetch data from.
In addition to AnyEvent::HTTP timeout will also check time during download as the data are being downloaded and dies when over the limit. Default 5 minutes.
Will die when the document or image size is greater than this limit. Default 100MB.
User-Agent header to use for http requests. Default is one from Chrome 89.0.4389.90.
HashRef with extra http request headers.
Accepts optional HTTP::Cookies compatible object that must provide get_cookies() method. If set will send http cookie headers with each request.
get_cookies()
Returns title of the page.
Returns description of the page.
Returns image location of the page.
Returns image binary data of "image" link.
Will throw 404 exception if there is not "image" link.
Returns hash ref with all open-graph data.
Web::Scraper::LibXML object to fetch image, title or description from different than default location.
use Web::Scraper::LibXML; use Web::PageMeta; my $escraper = scraper { process_first '.slider .camera_wrap div', 'image' => '@data-src'; }; my $wmeta = Web::PageMeta->new( url => 'https://www.meon.eu/', extra_scraper => $escraper, );
Returns array ref with page [$body,$headers]. Can be useful for post-processing or special/additional data extractions.
Only text/html content-type is accepted for fetching.
text/html
Returns future object for fetching paga meta data. See "ASYNC USE". On done "page_meta" hash is returned.
Returns future object for fetching image data. See "ASYNC USE" On done "image_data" scalar is returned.
Returns future object for fetching page content and headers. See "ASYNC USE" On done "page_body_hdr" array ref is returned.
To run multiple page meta data or image http requests in parallel or to be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft returning Future object can be used. See "SYNOPSIS" or t/02_async.t for sample use.
https://ogp.me/
Jozef Kutej, <jkutej at cpan.org>
<jkutej at cpan.org>
Copyright 2021 jkutej@cpan.org
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
To install Web::PageMeta, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Web::PageMeta
CPAN shell
perl -MCPAN -e shell install Web::PageMeta
For more information on module installation, please visit the detailed CPAN module installation guide.