Apache::Tika - A perl interface to Apache Tika API
use Apache::Tika my $tika = Apache::Tika->new(); # Extract metadata and text from a pdf file open my $fh, '<:raw', '/local/file.pdf'; my $pdf = do { local $/; <$fh> }; close $fh; my $meta = $tika->meta($pdf); my $text = $tika->tika($pdf); # Extract text from a website my $response = LWP::UserAgent->get('http://some.web.site'); my $text = $tika->tika( $r->decoded_content('charset' => 'none'), $r->headers->header('content-type') );
This module provide Apache Tika api support
This constructs Apache::Tika object. You can specify the following options
Apache::Tika
Apache Tika server url (defaults to http://localhost:9998)
Custom useragent
The following api methods are available, to get more information about method responses visit http://wiki.apache.org/tika/TikaJAXRS
The $bytes parameter is always required and must contain the data to send to the server. The $contentType is optional, but if know the $bytes content-type (p.e. "text/html; charset=iso-8") you can send it to improve the tika response.
Apache Tika
To install Apache::Tika, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Apache::Tika
CPAN shell
perl -MCPAN -e shell install Apache::Tika
For more information on module installation, please visit the detailed CPAN module installation guide.