Nicolas Franck
and 1 contributors

NAME

Catmandu::Importer::PDF - Catmandu importer to extract data from one pdf

SYNOPSIS

    #From the command line

    #Export pdf information, and text

    $ catmandu convert PDF --file input.pdf to YAML

    #In a script

    use Catmandu::Sane;

    use Catmandu::Importer::PDF;

    my $importer = Catmandu::Importer::PDF->new( file => "/tmp/input.pdf" );

    $importer->each(sub{

        my $pdf = $_[0];
        #..

    });

EXAMPLE OUTPUT IN YAML

    document:
      author: ~
      creation_date: 1207274644
      creator: PDFplus
      keywords: ~
      metadata: ~
      modification_date: 1421574847
      producer: "Nobody at all"
      subject: ~
      title: "Hello there"
      version: PDF-1.6
    pages:
    - label: Cover Page
      height: 878
      width: 595
      text: "Hello world"

INSTALLATION

In order to install this package you need the following system packages installed

Centos

Requires Centos 7 at minimum. Centos 6 only has poppler-glib 0.12.

* perl-devel

* make

* gcc

* gcc-c++

* libyaml-devel

* libyaml

* poppler-glib ( >= 0.16 )

* poppler-glib-devel ( >= 0.16 )

* gobject-introspection-devel

Ubuntu

Requires Ubuntu 14 at minimum.

* libpoppler-glib8

* libpoppler-glib-dev

* gobject-introspection

* libgirepository1.0-dev

NOTES

* Catmandu::Importer::PDF returns one record, containing both document information, and page text

* Catmandu::Importer::PDFPages returns multiple records, each for each page

* Catmandu::Importer::PDFInfo returns one record, containing document information

KNOWN ISSUES

* Due to a bug in older versions of poppler-glib (bug #94173), the creation_date and modification_date can be returned in local time, instead of utc. This module tries to fix that.

* Some versions of Poppler add form feeds and newlines to a text line, while others don't.

AUTHORS

Nicolas Franck <nicolas.franck at ugent.be>

SEE ALSO

Catmandu::Importer::PDFInfo, Catmandu::Importer::PDFPages, Catmandu, Catmandu::Importer , Poppler