The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::db_helper::bigwig

DESCRIPTION

This module provides support for binary BigWig files to the Bio::ToolBox package. It also supports a directory of one or more bigWig files as a combined database, known as a BigWigSet.

USAGE

The module requires Bio::DB::BigWig to be installed, which in turn requires the UCSC Kent C library to be installed.

In general, this module should not be used directly. Use the methods available in Bio::ToolBox::db_helper or <Bio::ToolBox::Data>.

All subroutines are exported by default.

Available subroutines

collect_bigwig_score()

This subroutine will collect a single value from a binary bigWig file. It uses the low-level summary method to collect the statistical information and is therefore significantly faster than the other methods, which rely upon parsing individual data points across the region.

The subroutine is passed a parameter array reference. See below for details.

The object will return either a valid score or a null value.

collect_bigwigset_score()

Similar to collect_bigwig_score() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed a parameter array reference. See below for details.

collect_bigwig_scores()

This subroutine will collect only the score values from a binary BigWig file for the specified database region. The positional information of the scores is not retained.

The subroutine is passed a parameter array reference. See below for details.

The subroutine returns an array or array reference of the requested dataset values found within the region of interest.

collect_bigwigset_scores()

Similar to collect_bigwig_scores() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed a parameter array reference. See below for details.

collect_bigwig_position_scores()

This subroutine will collect the score values from a binary BigWig file for the specified database region keyed by position.

The subroutine is passed a parameter array reference. See below for details.

The subroutine returns a hash of the defined dataset values found within the region of interest keyed by position. Note that only one value is returned per position, regardless of the number of dataset features passed. Usually this isn't a problem as only one dataset is examined at a time.

collect_bigwigset_position_score()

Similar to collect_bigwig_position_scores() but using a BigWigSet database of BigWig files. Unlike individual BigWig files, BigWigSet features support stranded data collection if a strand attribute is defined in the metadata file.

The subroutine is passed a parameter array reference. See below for details.

open_bigwig_db()

This subroutine will open a BigWig database connection. Pass either the local path to a bigWig file (.bw extension) or the URL of a remote bigWig file. It will return the opened database object.

open_bigwigset_db()

This subroutine will open a BigWigSet database connection using a directory of BigWig files and one metadata index file, as described in Bio::DB::BigWigSet. Essentially, this treats a directory of BigWig files as a single database with each BigWig file representing a different feature with unique attributes (type, source, strand, etc).

Pass the subroutine a scalar value representing the local path to the directory. It presumes a feature_type of 'region', as expected by the other Bio::ToolBox::db_helper subroutines and modules. It will return the opened database object.

Data Collection Parameters Reference

The data collection subroutines are passed an array reference of parameters. The recommended method for data collection is to use get_segment_score() method from Bio::ToolBox::db_helper.

The parameters array reference includes these items:

1. The chromosome or seq_id
1. The start position of the segment to collect
3. The stop or end position of the segment to collect
4. The strand of the segment to collect

Should be standard BioPerl representation: -1, 0, or 1.

5. The strandedness of the data to collect

A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.

6. The method for combining scores.

Acceptable values include mean, min, max, stddev, sum, and count. Used when collecting a single value over a genomic segnment. Methods of pcount and ncount are technically supported, but are treated the same as count.

7. A database object.

Pass the opened Bio::DB::BigWigSet database object when working with BigWigSets.

8 and higher. BigWig file names or BigWigSet database types.

Opened BigWig objects are cached. Both local and remote BigWig files are supported.

AUTHOR

 Timothy J. Parnell, PhD
 Howard Hughes Medical Institute
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.