The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RandomJungle::Jungle - Consolidated interface for Random Jungle input and output data

VERSION

Version 0.05

SYNOPSIS

RandomJungle::Jungle provides a simplified interface to access Random Jungle input and output data. See RandomJungle::Tree for methods relating to the classification trees produced by Random Jungle, and RandomJungle::File::DB for lower-level methods that are wrapped by this module.

        use RandomJungle::Jungle;

        my $rj = RandomJungle::Jungle->new( db_file => $file ) || die $RandomJungle::Jungle::ERROR;
        $rj->store( xml_file => $file, oob_file => $file, raw_file => $file ) || die $rj->err_str;
        my $href = $rj->summary_data(); # for loaded data

        my $href = $rj->get_filenames; # filenames specified in store()
        my $href = $rj->get_rj_input_params; # input params that were used when RJ was run
        my $aref = $rj->get_variable_labels; # (expected:  SEX PHENOTYPE var1 ...)
        my $aref = $rj->get_sample_labels; # from the IID column of the RAW file

        # Returns data for the specified sample, where $label is the IID from the RAW file
        my $href = $rj->get_sample_data_by_label( label => $label ) || warn $rj->err_str;

        my $aref = $rj->get_tree_ids;
        my $tree = $rj->get_tree_by_id( $id ) || warn $rj->err_str; # RJ::Tree object

        # Returns hash of arefs containing lists of tree IDs, by OOB state for the sample
        my $href = $rj->get_oob_for_sample( $label ) || warn $rj->err_str;

        # Returns the OOB state for a given sample and tree ID
        my $val = $rj->get_oob_state( sample => $label, tree_id => $id ) || warn $rj->err_str;

        # Returns a hash of arefs containing lists of sample labels, by OOB for the tree
        my $href = $rj->get_oob_for_tree( $tree_id ) || warn $rj->err_str;

        # Error handling
        $rj->set_err( 'Something went boom' );
        my $msg = $rj->err_str;
        my $trace = $rj->err_trace;

METHODS

new()

Creates and returns a new RandomJungle::Jungle object:

        my $rj = RandomJungle::Jungle->new( db_file => $file ) || die $RandomJungle::Jungle::ERROR;

The 'db_file' parameter is required. Returns undef and sets $ERROR on failure.

store()

This method loads data into the RandomJungle::File::DB database. All parameters are optional, so files can be loaded in a single call or in multiple calls. Each type of file can only be loaded once; subsequent calls to this method for a given file type will overwrite the previously-loaded data.

        $rj->store( xml_file => $file, oob_file => $file, raw_file => $file ) || die $rj->err_str;

Returns true on success. Sets err_str and returns false if an error occurred.

get_filenames()

Returns a hash reference containing the names of the files specified in store():

        my $href = $rj->get_filenames;

Keys in the href are db, xml, oob, and raw.

get_rj_input_params()

Returns a href of the input parameters used when Random Jungle was run:

        my $href = $rj->get_rj_input_params; # $href->{$param_name} = $param_value;

get_variable_labels()

Returns a reference to an array that contains the variable labels from the RAW file:

        my $aref = $rj->get_variable_labels; # (expected:  SEX PHENOTYPE var1 ...)

get_sample_labels()

Returns a reference to an array that contains the sample labels from the IID column of the RAW file:

        my $aref = $rj->get_sample_labels;

get_sample_data_by_label()

Returns a hash ref containing data for the sample specified by label => $label, where label is the IID from the RAW file. Sets err_str and returns undef if label is not specified or is invalid.

        my $href = $rj->get_sample_data_by_label( label => $label ) || warn $rj->err_str;

$href has the following structure: SEX => $val, PHENOTYPE => $val, orig_data => $line, (unsplit, unspliced) index => $i, (index in aref from get_sample_labels(), can be used to index into OOB matrix) classification_data => $aref, (can be passed to RandomJungle::Tree->classify_data)

get_tree_ids()

Returns an array ref of tree IDs (sorted numerically):

        my $aref = $rj->get_tree_ids;

get_tree_by_id()

Returns a RandomJungle::Tree object for the specified tree.

        my $tree = $rj->get_tree_by_id( $id ) || warn $rj->err_str;

Sets err_str and returns undef if tree ID is undef or invalid, or if an internal error occurred.

get_oob_for_sample()

Returns lists of tree IDs, by OOB state, for the specified sample label.

        my $href = $rj->get_oob_for_sample( $label ) || warn $rj->err_str;

The href contains the following keys, each of which point to an array reference containing tree IDs: sample_used_to_construct_trees => [], sample_not_used_to_construct_trees => [],

Sets err_str and returns undef if the specified sample cannot be found (invalid label) or on error.

get_oob_state()

Returns the OOB state for a given sample label and tree ID:

        my $val = $rj->get_oob_state( sample => $label, tree_id => $id ) || warn $rj->err_str;

Expected values are 0 (the sample is "in bag" for the tree) or 1 (the sample is "out of bag" for the tree).

Sets err_str and returns undef if sample or tree_id are not defined, or if sample label is invalid.

get_oob_for_tree()

Returns lists of sample labels, by OOB state, for the specified tree ID.

        my $href = $rj->get_oob_for_tree( $tree_id ) || warn $rj->err_str;

The href contains the following keys, each of which point to an array reference containing sample labels: in_bag_samples => [], oob_samples => [],

Sets err_str and returns undef if the specified tree ID cannot be found (invalid) or on error.

summary_data()

Returns an href containing a summary of the data that is loaded into the db:

        my $href = $rj->summary_data();

$href contains the output of other methods in this class, and it has the following structure:

        filenames => get_filenames(),
        rj_params => get_rj_input_params(),
        variable_labels => get_variable_labels() and see below,
        sample_labels   => get_sample_labels() and see below,
        tree_ids        => get_tree_ids() and see below,

The keys variable_labels, sample_labels, and tree_ids all point to hrefs. Each href has the following structure:

        all_labels => $aref, (for variable_labels and sample_labels)
        all_ids    => $aref, (for tree_ids only)
        first => $val, (the first element of the all* aref)
        last  => $val, (the last element of the all* aref)
        count => $val, (the size of the all* aref)

set_err()

Sets the error message (provided as a parameter) and creates a stack trace:

        $rj->set_err( 'Something went boom' );

err_str()

Returns the last error message that was set:

        my $msg = $rj->err_str;

err_trace()

Returns a backtrace for the last error that was encountered:

        my $trace = $rj->err_trace;

SEE ALSO

RandomJungle::Jungle, RandomJungle::Tree, RandomJungle::Tree::Node, RandomJungle::XML, RandomJungle::OOB, RandomJungle::RAW, RandomJungle::DB, RandomJungle::Classification_DB

AUTHOR

Robert R. Freimuth

COPYRIGHT

Copyright (c) 2011 Mayo Foundation for Medical Education and Research. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.