The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

RandomJungle::File::RAW - Low level access to the data in the RandomJungle RAW input file

VERSION

Version 0.03

SYNOPSIS

RandomJungle::File::RAW provides access to the data contained within the RAW file used as input for RandomJungle. This module was developed to support files in ped format only. See RandomJungle::Jungle for higher-level methods.

        use RandomJungle::File::RAW;

        my $raw = RandomJungle::File::RAW->new( filename => $rawfile ) || die $RandomJungle::File::RAW::ERROR;
        $raw->parse || die $raw->err_str;

        my $file = $raw->get_filename; # returns the filename of the RAW file
        my $aref = $raw->get_header_labels; # FID, IID, PAT, MAT
        my $aref = $raw->get_variable_labels; # SEX, PHENOTYPE, rs... (not incl FID, IID, PAT, MAT)
        my $aref = $raw->get_sample_labels; # from the IID column in the RAW file (ordered by line in the file)
        my $href = $raw->get_sample_data; # all sample data records (convience method for RJ::File::DB)

        # Retrieve information by sample ($iid, from get_sample_labels)
        # These methods set err_str and return undef on error ($iid not specified or invalid)
        my $val = $raw->get_phenotype_for_sample( $iid ); # $iid is from get_sample_labels()
        my $aref = $raw->get_data_for_sample( $iid ); # variable data, suitable for classification (split, spliced)
        my $line = $raw->get_data_for_sample( $iid, orig => 1 ); # original line (unsplit, unspliced) from the RAW file

        my $href = $raw->get_data; # for debugging only; returns raw data structs

        # Error handling
        $raw->set_err( 'Something went boom' );
        my $msg = $raw->err_str;
        my $trace = $raw->err_trace;

METHODS

new()

Creates and returns a new RandomJungle::File::RAW object:

        my $raw = RandomJungle::File::RAW->new( filename => $rawfile );

The 'filename' parameter is required. Sets $ERROR and returns undef on failure.

parse()

Parses the RAW file specified in new():

        my $retval = $raw->parse;

Returns a true value on success. Sets err_str and returns undef on failure.

get_filename()

Returns the name of the RAW file specified in new():

        my $file = $raw->get_filename;

get_variable_labels()

Returns an array ref containing the labels for the variables in the input file. Note that the first four columns in a ped formatted file (FID, IID, PAT, MAT) are not considered variables and therefore they are not included in the results. The array will likely contain SEX and PHENOTYPE, followed by a list of SNP IDs.

        my $aref = $raw->get_variable_labels; # SEX, PHENOTYPE, rs...

get_header_labels()

Returns an array ref containing the header labels from the input file, corresponding to the first four columns of a ped formatted file (FID, IID, PAT, MAT):

        my $aref = $raw->get_header_labels; # FID, IID, PAT, MAT

get_sample_labels()

Returns an array ref containing a list of sample labels, ordered according to line number in the input file. The labels are taken from the IID column.

        my $aref = $raw->get_sample_labels;

get_phenotype_for_sample()

Returns the phenotype value for a given sample (specified using the sample label from the IID column):

        my $val = $raw->get_phenotype_for_sample( $iid ); # see get_sample_labels()

Sets err_str and returns undef on error ($iid not specified or invalid).

get_sample_data()

Returns a hash ref containing data for each sample in the input file. This is a convenience method for RandomJungle::File::DB and probably should not be called directly outside of that module, as the interface and return structure is not guaranteed to be stable.

        my $href = $raw->get_sample_data; # convenience method for RandomJungle::File::DB

get_data_for_sample()

This method retrieves data for a given sample, specified using the sample label from the IID column. If called with only a single parameter (the sample label), an array ref will be returned that contains the sample's variable data, suitable for classification by a RandomJungle::Tree object:

        my $aref = $raw->get_data_for_sample( $iid ); # variable data, suitable for classification

If called with 'orig => 1', the original line from the input file (unsplit, unspliced) will be returned:

        my $line = $raw->get_data_for_sample( $iid, orig => 1 ); # original line from the RAW file

Sets err_str and returns undef on error ($iid not specified or invalid).

get_data()

Returns the data structures contained in $self:

        my $href = $raw->get_data;

This method is for debugging only and should not be used in production code.

set_err()

Sets the error message (provided as a parameter) and creates a stack trace:

        $raw->set_err( 'Something went boom' );

err_str()

Returns the last error message that was set:

        my $msg = $raw->err_str;

err_trace()

Returns a backtrace for the last error that was encountered:

        my $trace = $raw->err_trace;

SEE ALSO

RandomJungle::Jungle, RandomJungle::Tree, RandomJungle::Tree::Node, RandomJungle::XML, RandomJungle::OOB, RandomJungle::RAW, RandomJungle::DB, RandomJungle::Classification_DB

AUTHOR

Robert R. Freimuth

COPYRIGHT

Copyright (c) 2011 Mayo Foundation for Medical Education and Research. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.