The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::db_helper::seqfasta

DESCRIPTION

This module supports opening Bio::DB::SeqFeature::Store and Bio::DB::Fasta BioPerl database adaptors. It also supports collecting feature scores from Bio::DB::SeqFeature::Store databases. Unsupported BioPerl-style database adaptors that support generic methods may also be used, although success may vary.

Opening databases

For Fasta databases, either a single fasta file or a directory of fasta files may be provided.

For SeqFeature Store databases, the connection parameters are stored in a configuration file, .biotoolbox.cfg. Multiple database containers are supported, including MySQL, SQLite, and in-memory.

Collecting scores

Scores from seqfeature objects stored in the database may be retrieved. The scores may be collected as is, or they may be associated with genomic positions (indexed scores). Scores may be restricted to strand by specifying the desired strandedness. For example, to collect transcription data over a gene, pass the strandedness value 'sense'. If the strand of the region database object (representing the gene) matches the strand of the wig file data feature, then the data is collected.

Legacy wig file support uses GFF SeqFeature databases to store the file paths of the binary wiggle (.wib) files. If the seqfeature objects returned from the database include the wigfile attribute, then these objects are forwarded on to the Bio::ToolBox::db_helper::wiggle adaptor for appropriate score collection.

USAGE

The module requires the BioPerl adaptors Bio::DB::SeqFeature::Store and Bio::DB::Fasta.

Load the module at the beginning of your program.

        use Bio::ToolBox::db_helper::seqfasta;

It will automatically export the name of the subroutines.

collect_store_scores

This subroutine will collect only the score values from database features for the specified database region. The positional information of the scores is not retained, and the values may be further processed through some statistical method (mean, median, etc.).

The subroutine is passed eight or more arguments in the following order:

    1) The opened database object. A database name or file is not ok.
    2) The chromosome name
    3) The start position of the segment to collect from
    4) The stop or end position of the segment to collect from
    5) The strand of the original feature (or region), -1, 0, or 1.
    6) A scalar value representing the desired strandedness of the data 
       to be collected. Only those scores which match the indicated 
       strandedness are collected. Acceptable values include 
        "sense", 
        "antisense", 
        "none" or "no".
    7) The type of data collected. Acceptable values include 
       'score' (returns the score), 
       'count' (the number of defined positions with scores), or 
       'length' (the wig step is used here).  
    8) One or more feature types or primary_tags to perform the 
       database search. If nothing is provided, then usually everything 
       in the database is returned!

The subroutine returns an array of the defined dataset values found within the region of interest.

collect_wig_position_scores

This subroutine will collect the score values form features in the database for the specified region keyed by position.

The subroutine is passed the same arguments as collect_wig_scores().

The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. Note that only one value is returned per position, regardless of the number of dataset features passed.

AUTHOR

 Timothy J. Parnell, PhD
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.