NAME
CAM::PDF::PageText - Extract text from PDF page tree
SYNOPSIS
my
= CAM::PDF->new(
$filename
);
my
$pageone_tree
=
->getPageContentTree(1);
CAM::PDF::PageText->render(
$pageone_tree
);
DESCRIPTION
This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.
All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.
LICENSE
Same as CAM::PDF
FUNCTIONS
- $pkg->render($pagetree)
- $pkg->render($pagetree, $verbose)
-
Turn a page content tree into a string. This is a class method that should be called like:
CAM::PDF::PageText->render(
$pagetree
);
AUTHOR
See CAM::PDF