-
-
22 Jun 2021 08:28:14 UTC
- Distribution: Web-PageMeta
- Module version: 0.07
- Source (raw)
- Browse (raw)
- Changes
- How to Contribute
- Repository
- Issues (0)
- Testers (12 / 2 / 1)
- Kwalitee
Bus factor: 1- % Coverage
- License: perl_5
- Perl: v5.22.0
- Activity
24 month- Tools
- Download (29.09KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
- Dependencies
- AnyEvent::HTTP
- Encode
- Future
- Future::AsyncAwait
- Future::HTTP::AnyEvent
- HTML::TreeBuilder::LibXML
- HTTP::Exception
- List::Util
- Log::Any
- Moose
- MooseX::StrictConstructor
- MooseX::Types::URI
- Time::HiRes
- URI
- URI::QueryParam
- Web::Scraper
- Web::Scraper::LibXML
- namespace::autoclean
- and possibly others
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Web::PageMeta - get page open-graph / meta data
SYNOPSIS
use Web::PageMeta; my $page = Web::PageMeta->new(url => "https://www.apa.at/"); say $page->title; say $page->image;
async fetch previews and images:
use Web::PageMeta; my @urls = qw( https://www.apa.at/ http://www.diepresse.at/ https://metacpan.org/ https://github.com/ ); my @page_views = map { Web::PageMeta->new( url => $_ ) } @urls; Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get; foreach my $pv (@page_views) { say 'title> '.$pv->title; say 'img_size> '.length($pv->image_data); } # alternativelly instead of Future->wait_all() use Future::Utils qw( fmap_void ); fmap_void( sub { return $_[0]->fetch_image_data_ft }, foreach => [@page_views], concurrent => 3 )->get;
DESCRIPTION
Get (not only) open-graph web page meta data. can be used in both normal and async code.
For any other than 200 http status codes during data downloads, HTTP::Exception is thrown.
ACCESSORS
new
Constructor, only "url" is required.
url
HTTP url to fetch data from.
timeout
In addition to AnyEvent::HTTP timeout will also check time during download as the data are being downloaded and dies when over the limit. Default 5 minutes.
max_size
Will die when the document or image size is greater than this limit. Default 100MB.
user_agent
User-Agent header to use for http requests. Default is one from Chrome 89.0.4389.90.
extra_headers
HashRef with extra http request headers.
cookie_jar
Accepts optional HTTP::Cookies compatible object that must provide
get_cookies()
method. If set will send http cookie headers with each request.title
Returns title of the page.
description
Returns description of the page.
image
Returns image location of the page.
image_data
Returns image binary data of "image" link.
Will throw 404 exception if there is not "image" link.
page_meta
Returns hash ref with all open-graph data.
extra_scraper
Web::Scraper::LibXML object to fetch image, title or description from different than default location.
use Web::Scraper::LibXML; use Web::PageMeta; my $escraper = scraper { process_first '.slider .camera_wrap div', 'image' => '@data-src'; }; my $wmeta = Web::PageMeta->new( url => 'https://www.meon.eu/', extra_scraper => $escraper, );
page_body_hdr
Returns array ref with page [$body,$headers]. Can be useful for post-processing or special/additional data extractions.
Only
text/html
content-type is accepted for fetching.fetch_page_meta_ft
Returns future object for fetching paga meta data. See "ASYNC USE". On done "page_meta" hash is returned.
fetch_image_data_ft
Returns future object for fetching image data. See "ASYNC USE" On done "image_data" scalar is returned.
fetch_page_body_hdr_ft
Returns future object for fetching page content and headers. See "ASYNC USE" On done "page_body_hdr" array ref is returned.
ASYNC USE
To run multiple page meta data or image http requests in parallel or to be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft returning Future object can be used. See "SYNOPSIS" or t/02_async.t for sample use.
SEE ALSO
AUTHOR
Jozef Kutej,
<jkutej at cpan.org>
LICENSE AND COPYRIGHT
Copyright 2021 jkutej@cpan.org
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.
Module Install Instructions
To install Web::PageMeta, copy and paste the appropriate command in to your terminal.
cpanm Web::PageMeta
perl -MCPAN -e shell install Web::PageMeta
For more information on module installation, please visit the detailed CPAN module installation guide.