The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::db_helper::useq

DESCRIPTION

This module supports the use of useq file in the Bio::ToolBox distribution. Useq files are zip archives representing either intervals or scores. They may be used similarly to either bigWig or bigBed files. More information about useq files may be found at http://useq.sourceforge.net/useqArchiveFormat.html. USeq files use the extension .useq.

Scores from useq files may be collected using this module. Either a single score from an interval, or a hash of scores associated with positions across an interval.

Scores may be restricted to strand by specifying the desired strandedness. For example, to collect transcription data over a gene, pass the strandedness value 'sense'. If the strand of the region database object (representing the gene) matches the strand of the bed feature, then the data for that bed feature is collected.

USAGE

The module requires the Bio::DB::USeq package to be installed.

Load the module at the beginning of your program.

        use Bio::ToolBox::db_helper::useq;

It will automatically export the name of the subroutines.

collect_useq_scores()

This subroutine will collect only the data values from a binary useq file for the specified database region. The positional information of the scores is not retained, and the values are best further processed through some statistical method (mean, median, etc.).

The subroutine is passed seven or more arguments in the following order:

1. The chromosome or seq_id
2. The start position of the segment to collect
3. The stop or end position of the segment to collect
4. The strand of the segment to collect

Strand values should be in BioPerl standard values, i.e. -1, 0, or 1.

5. The strandedness of the data to collect

A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.

6. The value type of data to collect

Acceptable values include score, count, pcount, and length.

   score returns the feature scores
   
   count returns the number of features that overlap the 
   search region. 
   
   pcount, or precise count, returns the count of features 
   that only fall within the region. 
   
   length returns the lengths of all overlapping features 
7. Paths to one or more USeq files

The subroutine returns an array of the defined dataset values found within the region of interest.

collect_useq_position_scores()

This subroutine will collect the score values from a binary useq file for the specified database region keyed by position.

The subroutine is passed the same arguments as collect_useq_scores().

The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. The feature midpoint is used as the key position. When multiple features are found at the same position, a simple mean (for score or length data methods) or sum (for count methods) is returned.

open_useq_db()

This subroutine will open a useq database connection. Pass the local path to a useq file (.useq extension). It will return the opened Bio::DB::USeq database object.

AUTHOR

 Timothy J. Parnell, PhD
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.