Robert Freimuth

NAME

RandomJungle::Tree - A Random Jungle classification tree

VERSION

Version 0.04

SYNOPSIS

RandomJungle::Tree represents a classification tree from Random Jungle. This class uses RandomJungle::Tree::Node to represent the nodes in the tree.

        use RandomJungle::Tree;

        my $tree = RandomJungle::Tree->new( %params ) || die $RandomJungle::Tree::ERROR;

        my $tree_id = $tree->id;

        # Returns the variables used in the tree
        my $aref = $tree->get_variables; # aref of indices
        my $href = $tree->get_variables( variable_labels => 1 ); # label => index

        # Classifies $data using this tree and returns either the predicted phenotype
        # or RandomJungle::Tree::Node object for the terminal node
        my $predicted_pheno = $tree->classify_data( $data );
        my $node_obj = $tree->classify_data( $data, as_node => 1 );
        my $node_obj = $tree->classify_data( $data, skip_validation => 1 );

        my $node_obj = $tree->get_node_by_vector_index( $vi ) || warn $tree->err_str;

        my $vi = $tree->get_root_node;
        my $node_obj = $tree->get_root_node( as_node => 1 );

        my $aref = $tree->get_all_nodes; # aref of vector indices
        my $aref = $tree->get_all_nodes( as_node => 1 ); # aref of node objects

        my $aref = $tree->get_terminal_nodes; # aref of vector indices
        my $aref = $tree->get_terminal_nodes( as_node => 1 ); # aref of node objects

        # Carps and returns undef on error (invalid index) or if called with index 0 (no parent)
        my $vi_of_parent = $tree->get_parent_of_vector_index( $vi );
        my $node_obj = $tree->get_parent_of_vector_index( $vi, as_node => 1 );

        # Returns an aref containing vector indices of all nodes in the path to the specified
        # vector index, beginning at the root of the tree and ending at the specified vector index.
        my $aref = $tree->get_path_to_vector_index( $vi ) || warn $tree->err_str;

        my $depth = $tree->get_depth_of_vector_index( $vi );

        # $href contains the max depth of the tree and a list of all vector indices at that depth
        my $href = $tree->max_node_depth;

        # Error handling
        $tree->set_err( 'Something went boom' );
        my $msg = $tree->err_str;
        my $trace = $tree->err_trace;

METHODS

new()

Creates and returns a new RandomJungle::Tree object:

        my $tree = RandomJungle::Tree->new( %params ) || die $RandomJungle::Tree::ERROR;

Required keys in %params: id => $tree_id (from the XML file) var_id_str => $str (from the XML file) values_str => $str (from the XML file) branches_str => $str (from the XML file)

Optional keys in %params: variable_labels => $aref (variables from the RAW file, excluding headers)

The required components of %params are returned from RandomJungle::File::XML->get_tree_data(). The aref for variable_labels can be obtained from RandomJungle::Jungle->get_variable_labels().

Sets $ERROR and returns undef on failure.

id()

Returns the tree ID:

        my $tree_id = $tree->id;

get_variables()

Returns the variables used in the tree. By default, returns an aref of indices (see RAW file). If 'variable_labels => 1' is specified in %params, returns a href { $label => $index } if variable_labels was specified in new(), or sets err_str and returns undef otherwise.

        my $aref = $tree->get_variables; # variable indices
        my $href = $tree->get_variables( variable_labels => 1 ); # $href->{$label} = $index

classify_data()

Classifies $data using this tree. Returns the terminal value (predicted phenotype) by default. If as_node => 1 is specified, returns a RandomJungle::Tree::Node object that represents the terminal node after classification. If skip_validation => is specified, the data validation step will be skipped; this is a performance improvement but if invalid data is present the classification will fail and undef will be returned. Use skip_validation with caution.

        my $predicted_pheno = $tree->classify_data( $data );
        my $node_obj = $tree->classify_data( $data, as_node => 1 );
        my $node_obj = $tree->classify_data( $data, skip_validation => 1 );

$data must be an arrayref containing the data values to be classified. The order of the columns must be the same as that which was used to construct the tree (see RAW file). Note: $data must not include header values (for FID, IID, PAT, and MAT).

$data can be obtained from RandomJungle::Jungle->get_sample_data_by_label().

Sets err_str and returns undef if an error occurs (e.g., $data contains a value that is not 0, 1, or 2).

get_node_by_vector_index()

Returns a RandomJungle::Tree::Node object for a given vector index (from the varID/values/branches arrays in the XML file).

        my $node_obj = $tree->get_node_by_vector_index( $vi );

Sets err_str and returns undef on error (invalid index).

get_root_node()

Returns the root node in the tree (vector index 0). The vector index is returned by default. If called with 'as_node => 1' a RandomJungle::Tree::Node object is returned.

        my $vi = $tree->get_root_node;
        my $node_obj = $tree->get_root_node( as_node => 1 );

get_all_nodes()

Returns an aref of all nodes in the tree. Vector indices are returned by default. If called with 'as_node => 1' RandomJungle::Tree::Node objects are returned.

        my $aref = $tree->get_all_nodes;
        my $aref = $tree->get_all_nodes( as_node => 1 );

get_terminal_nodes()

Returns an aref of all terminal nodes in the tree. Vector indices are returned by default. If called with 'as_node => 1' RandomJungle::Tree::Node objects are returned.

        my $aref = $tree->get_terminal_nodes;
        my $aref = $tree->get_terminal_nodes( as_node => 1 );

get_parent_of_vector_index()

Returns the parent of the node with the specified vector index. The vector index of the parent node is returned by default. If called with 'as_node => 1' a RandomJungle::Tree::Node object is returned.

        my $vi_of_parent = $tree->get_parent_of_vector_index( $vi );
        my $node_obj = $tree->get_parent_of_vector_index( $vi, as_node => 1 );

Sets err_str and returns undef on error (invalid index) or if called with index 0 (index 0 is the root node, which has no parent).

get_path_to_vector_index()

Returns an aref containing the vector indices of all nodes in the path to the specified vector index, beginning at the root of the tree and ending at the specified vector index.

        my $aref = $tree->get_path_to_vector_index( $vi );

Sets err_str and returns undef on error (invalid vector index).

get_depth_of_vector_index()

Returns the depth of the node with the specified vector index, where the root node has a depth of 1, the child nodes of the root have depth = 2, etc.

        my $depth = $tree->get_depth_of_vector_index( $vi );

Sets err_str and returns undef on error (invalid vector index).

max_node_depth()

Returns a hash reference that contains the max depth of the tree and a list of all vector indices at that depth.

        my $href = $tree->max_node_depth;

$href has the following structure: depth => $max_depth, vector_indices => $aref_of_vi,

where $aref_of_vi is an array reference that contains all vector indices at the max depth.

set_err()

Sets the error message (provided as a parameter) and creates a stack trace:

        $tree->set_err( 'Something went boom' );

err_str()

Returns the last error message that was set:

        my $msg = $tree->err_str;

err_trace()

Returns a backtrace for the last error that was encountered:

        my $trace = $tree->err_trace;

INTERNAL METHODS

_parse_var_id_string()

Parses the 'varID' string from the XML file and returns an array of variable indices.

        my @var_ids = $tree->_parse_var_id_string();

Note: @var_ids are indices (column numbers) of the variables within the RAW file, not variable labels.

The varID string is a required parameter of new().

_parse_branches_string()

Parses the 'branches' string from the XML file and returns an array of branch elements. Each element is a string of the format 'left,right', which are the vector indices of the child nodes of the current node.

        my @branches = $tree->_parse_branches_string();

The branches string is a required parameter of new().

_parse_values_string()

Parses the 'values' string from the XML file and returns an array of values which are used as thresholds for classifying genotype data.

        my @values = $tree->_parse_values_string();

The values string is a required parameter of new().

FUTURE IDEAS

$retval = create cytoscape file ( $out_filename )

Add caching for node depth and path to node (if used a lot)

SEE ALSO

RandomJungle::Jungle, RandomJungle::Tree, RandomJungle::Tree::Node, RandomJungle::XML, RandomJungle::OOB, RandomJungle::RAW, RandomJungle::DB, RandomJungle::Classification_DB

AUTHOR

Robert R. Freimuth

COPYRIGHT

Copyright (c) 2011 Mayo Foundation for Medical Education and Research. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.