PDF::OCR - DEPRECATED get ocr and images out of a pdf file River stage zero No dependents

Lets you get text out of pages in pdf documents. The whole process does not change your original pdf in any way. Please note this is only to get text out of images inside the pdf file, it does not check for genuine text inside the file- if any. For t...

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

PDF::OCR::Thorough - DEPRECATED extract text fom pdf document resorting to ocr as needed River stage zero No dependents

Unlike PDF::OCR which assumes each page in the pdf document is a page scan- This script is more "thorough". How it works 1) The original.pdf is copied to tmp.pdf 2) tmp.pdf is split into page1.pdf page2.pdf etc.. 3) For each pageX.pdf, first we try r...

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

PDF::OCR::Thorough::Cached - DEPRECATED save ocr to text file for easy retrieval River stage zero No dependents

This is just like PDF::OCR::Thorough, only the text is saved to a text file, so subseuent retrievals are snap quick. This inherits all the methods if PDF::OCR::Thorough $PDF::OCR::Thorough::Cached::ABS_CACHE_DIR Directory that will be the cache. The ...

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

pdf2ocr - get text content of pdf document images within River stage zero No dependents

Argument is a pdf file. This script assumes that each page in the pdf is one 8.5x11 page.. ONE image that's what the calculations are set up for....

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

ocr - read an image file and turn into text River stage zero No dependents

This is just an interface to make it quick an easy to get ocr output from an image file. No matter what image you provide, imagemagick convert is called to turn it into the format for tesseract....

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

pdfgetext - get text from pdf and resort to ocr as needed River stage zero No dependents

Get all text out of a pdf, even from images. This is basically a CLI interface to OCR::PDF::Thorough....

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

pdf2ocrturntotext River stage zero No dependents

This feeds the cache for files provided as argument...

LEOCHARRE/PDF-OCR-1.11 - 20 Apr 2009 13:01:05 GMT

7 results (0.039 seconds)