Catmandu::Importer::PDF - Catmandu importer to extract data from one pdf
#From the command line #Export pdf information, and text $ catmandu convert PDF --file input.pdf to YAML #In a script use Catmandu::Sane; use Catmandu::Importer::PDF; my $importer = Catmandu::Importer::PDF->new( file => "/tmp/input.pdf" ); $importer->each(sub{ my $pdf = $_[0]; #.. });
document: author: ~ creation_date: 1207274644 creator: PDFplus keywords: ~ metadata: ~ modification_date: 1421574847 producer: "Nobody at all" subject: ~ title: "Hello there" version: PDF-1.6 pages: - label: Cover Page height: 878 width: 595 text: "Hello world"
In order to install this package you need the following system packages installed
Requires Centos 7 at minimum. Centos 6 only has poppler-glib 0.12.
* perl-devel
* make
* gcc
* gcc-c++
* libyaml-devel
* libyaml
* poppler-glib ( >= 0.16 )
* poppler-glib-devel ( >= 0.16 )
* gobject-introspection-devel
Requires Ubuntu 14 at minimum.
* libpoppler-glib8
* libpoppler-glib-dev
* gobject-introspection
* libgirepository1.0-dev
* Catmandu::Importer::PDF returns one record, containing both document information, and page text
* Catmandu::Importer::PDFPages returns multiple records, each for each page
* Catmandu::Importer::PDFInfo returns one record, containing document information
* Due to a bug in older versions of Poppler (bug #94173), the creation_date and modification_date can be returned in local time, instead of utc.
* Some versions of Poppler add form feeds and newlines to a text line, while others don't.
Nicolas Franck <nicolas.franck at ugent.be>
<nicolas.franck at ugent.be>
Catmandu::Importer::PDFInfo, Catmandu::Importer::PDFPages, Catmandu, Catmandu::Importer , Poppler
To install Catmandu::Importer::PDF, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Catmandu::Importer::PDF
CPAN shell
perl -MCPAN -e shell install Catmandu::Importer::PDF
For more information on module installation, please visit the detailed CPAN module installation guide.