Text::Mining - Perl Tools for Text Mining Research
This document describes Text::Mining version 0.0.8
To run the shell:
use Text::Mining; my $tm = Text::Mining->new(); $tm->shell();
To use the objects:
use Text::Mining; my $tm = Text::Mining->new(); my $corpus = $tm->get_corpus({ corpus_name => 'Test' }); my $document = $corpus->add_document({ file_path => 'data/file42.txt' }); my $parser = Text::Mining::Parser->new({ parser => 'Text', algorithm => 'Base' });
Text::Mining manages multiple corpuses with unlimited documents and annotations and calculates representations of the documents using a variety of algorithms.
The primary design considerations are token provenance in the face of ever-changing protocols of analysis and pipeline automation for corpus recalculations.
The command line interface is self-describing via the "help" command. Copy the "kodos" script from package "scripts" directory to someplace in your path. Check the permissions and adjust as necessary. To start the shell, enter "kodos" at the prompt.
shell
$tm->shell();
Uses Term::Shell plus a few enhancements to provide a live environment for developing flexible and repreatable text mining protocols and manage multi-release projects encompassing multiple corpuses.
version
print $tm->version(), "\n";
Reports the version of Text::Mining.
create_corpus
get_corpus
my $corpus = $tm->get_corpus({ corpus_id = 1 }); my $corpus = $tm->get_corpus({ corpus_name = 'Test' });
Retrieves a corpus object from the database.
delete_corpus
$corpus->delete();
Deletes a corpus from the database. Deletes all related documents.
get_root_dir
print $tm->get_root_dir(), "\n";
Reports the root directory from the configuration file.
get_root_url
print $tm->get_root_url(), "\n";
Reports the root URL of the the webserver from the configuration file.
get_data_dir
print $tm->get_data_dir(), "\n";
Reports the main data directory from the configuration file.
get_submitted_document
print $tm->submitted_document(), "\n";
Reports the
count_submitted_waiting
print $tm->count_submitted_waiting(), "\n";
Reports the number of documents waiting to be included for a given corpus.
count_submitted_complete
print $tm->count_submitted_complete(), "\n";
Reports the number of documents ...
get_all_corpuses
my $corpuses = $tm->get_all_corpuses();
Returns the corpuses as DBI table.
get_corpus_id
print $corpus->get_corpus_id(), "\n";
Reports the corpus_id of the current_corpus
Text::Mining requires a set of configuration files stored at "~/.corpus":
shellrc
Currently holds pwd and current_corpus. Loaded when you start the shell. These settings are saved in real time with _updated_config();
shell_history
Holds the last 1,000 commands. Reloaded when you start the shell. Saved in postcmd().
Test::More version Class::Std Class::Std::Utils YAML Carp LWP::Simple Time::HiRes DBIx::MySperqlOO File::Spec
None reported.
No bugs have been reported.
Please report any bugs or feature requests to bug-text-mining@rt.cpan.org, or through the web interface at http://rt.cpan.org.
bug-text-mining@rt.cpan.org
Roger A Hall <rogerhall@cpan.org> Michael Bauer <mbkodos@gmail.com>
<rogerhall@cpan.org>
<mbkodos@gmail.com>
Copyright (c) 2009, the Authors. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENSE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
2 POD Errors
The following errors were encountered while parsing the POD:
'=item' outside of any '=over'
You forgot a '=back' before '=head1'
To install Text::Mining, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Mining
CPAN shell
perl -MCPAN -e shell install Text::Mining
For more information on module installation, please visit the detailed CPAN module installation guide.