The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::db_helper::bigwig

DESCRIPTION

This module supports the use of bigwig file in the biotoolbox scripts. It is used to collect the dataset scores from a binary bigWig file (.bw), or from a directory of bigWig files, known as a BigWigSet. BigWig files may be local or remote.

In either case, the file is read using the Bio::DB::BigWig module, and the values extracted from the region of interest.

Stranded data collection is not supported with bigWig files. However, since the BigWigSet database supports metadata attributes for each included bigWig, it has the potential for collecting stranded data. To do so, each bigWig metadata must include the strand attribute.

For loading bigwig files into a Bio::DB database, see the biotoolbox perl script 'big_file2gff3.pl'. This will prepare either a GFF3 file for loading into a Bio::DB::SeqFeature::Store database, or a Bio::DB::BigWigSet database.

When a single score is requested for a region, then a special low-level statistical method is employed to significantly reduce data collection times. Up to a ten fold improvement or better has been observed over the simple point-by-point collection, depending on the size of the region requested.

USAGE

The module requires Lincoln Stein's Bio::DB::BigWig to be installed.

Load the module at the beginning of your program.

        use Bio::ToolBox::db_helper::bigwig;

It will automatically export the name of the subroutines.

collect_bigwig_score

This subroutine will collect a single value from a binary bigWig file. It uses the low-level summary method to collect the statistical information and is therefore significantly faster than the other methods, which rely upon parsing individual data points across the region.

The subroutine is passed five or more arguments in the following order:

    1) The chromosome or seq_id
    2) The start position of the segment to collect 
    3) The stop or end position of the segment to collect 
    4) The method of collecting the data. Acceptable values include 
       mean, min, max, sum, count, and stddev. 
    5) One or more paths to bigWig files from which to collect the data

The object will return either a valid score. When nothing is found, it will return 0 for methods sum and score, or a null '.' value.

collect_bigwigset_score

Similar to collect_bigwig_score() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed eight or more arguments

    1) The opened BigWigSet database object
    2) The chromosome or seq_id
    3) The start position of the segment to collect from
    4) The stop or end position of the segment to collect from
    5) The strand of the segment to collect from
    6) A scalar value representing the desired strandedness of the data 
       to be collected. Acceptable values include "sense", "antisense", 
       or "all". Only those scores which match the indicated 
       strandedness are collected.
    7) The method of collecting the data. Acceptable values include 
       mean, min, max, sum, count, and stddev. 
    8) One or more database feature types for the data 
collect_bigwig_scores

This subroutine will collect only the score values from a binary BigWig file for the specified database region. The positional information of the scores is not retained, and the values are best further processed through some statistical method (mean, median, etc.).

The subroutine is passed four or more arguments in the following order:

    1) The chromosome or seq_id
    2) The start position of the segment to collect 
    3) The stop or end position of the segment to collect 
    4) One or more paths to bigWig files from which to collect the data

The subroutine returns an array of the defined dataset values found within the region of interest.

collect_bigwigset_scores

Similar to collect_bigwig_scores() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed seven or more arguments

    1) The opened BigWigSet database object
    2) The chromosome or seq_id
    3) The start position of the segment to collect from
    4) The stop or end position of the segment to collect from
    5) The strand of the segment to collect from
    6) A scalar value representing the desired strandedness of the data 
       to be collected. Acceptable values include "sense", "antisense", 
       or "all". Only those scores which match the indicated 
       strandedness are collected.
    7) One or more database feature types for the data 
collect_bigwig_position_scores

This subroutine will collect the score values from a binary BigWig file for the specified database region keyed by position.

The subroutine is passed four or more arguments in the following order:

    1) The chromosome or seq_id
    2) The start position of the segment to collect 
    3) The stop or end position of the segment to collect 
    4) One or more paths to bigWig files from which to collect the data

The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. Note that only one value is returned per position, regardless of the number of dataset features passed. Usually this isn't a problem as only one dataset is examined at a time.

collect_bigwigset_position_score

Similar to collect_bigwig_position_scores() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed seven or more arguments

    1) The opened BigWigSet database object
    2) The chromosome or seq_id
    3) The start position of the segment to collect from
    4) The stop or end position of the segment to collect from
    5) The strand of the segment to collect from
    6) A scalar value representing the desired strandedness of the data 
       to be collected. Acceptable values include "sense", "antisense", 
       or "all". Only those scores which match the indicated 
       strandedness are collected.
    7) One or more database feature types for the data 
open_bigwig_db()

This subroutine will open a BigWig database connection. Pass either the local path to a bigWig file (.bw extension) or the URL of a remote bigWig file. It will return the opened database object.

open_bigwigset_db()

This subroutine will open a BigWigSet database connection using a directory of BigWig files and one metadata index file, as described in Bio::DB::BigWigSet. Essentially, this treats a directory of BigWig files as a single database with each BigWig file representing a different feature with unique attributes (type, source, strand, etc).

Pass the subroutine a scalar value representing the local path to the directory. It presumes a feature_type of 'region', as expected by the other Bio::ToolBox::db_helper subroutines and modules. It will return the opened database object.

AUTHOR

 Timothy J. Parnell, PhD
 Howard Hughes Medical Institute
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.