PDF::OCR2::Page
use PDF::OCR2; my $path = './file.pdf'; my $pdf = PDF::OCR2->new($path); my $page = $pdf->page(1); $page->abs_pdf; $page->abs_images; $page->abs_images_count; $page->text; $page->text_length;
Extract a pdf page document's text, from inside the document and if there are images, from the images via tesseract ocr.
Mostly meant to be used by PDF::OCR2.
If you pass abs_path argument to constructor, and the file is not on disk, returns undef.
Arg is hashref. Must have abs_pdf to pdf file. Optionally, argument is abs path to file. If no abs_pdf is provided or it does not exist on disk, throws exception.
my $p = PDF::OCR::Page->new('./file.pdf'); my $p = PDF::OCR::Page->new({ abs_pdf => './file.pdf' });
Argument is path to pdf representing one (1) page. Must be on disk. Perl setget method.
Returns aref of images, returns list in list context, array ref otherwise. Uses PDF::GetImages, slow.
my $imgs = $p->abs_images; my @imgs = $p->abs_images;
Takes no argument. Returns ammount of images found in page.
Takes no argument. Returns all text, images plus text. Returns empty string if none.
Takes no argument. Returns number, length of the text. Returns 0 if none.
Defaults shown.
Eval pdf with PDF::API2 for correctness/etc.
$PDF::OCR2::CHECK_PDF = 0;
Do not clean up trash when DESTROY
$PDF::OCR2::Page::NO_TRASH_CLEANUP = 0;
Debug on
$PDF::OCR2::Page::DEBUG = 0;
PDF::OCR2 - parent package.
Leo Charre leocharre at cpan dot org
Copyright (c) 2008 Leo Charre. All rights reserved.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, i.e., under the terms of the "Artistic License" or the "GNU General Public License".
This package is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
See the "GNU General Public License" for more details.
To install PDF::OCR2, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PDF::OCR2
CPAN shell
perl -MCPAN -e shell install PDF::OCR2
For more information on module installation, please visit the detailed CPAN module installation guide.