Bio::DB::Big::File - Object representing a UCSC big file
This object encapsulates the querying and logic associcated with working with a big file. Both BigWig and BigBed are supported through the same object. When a routine is run on the wrong type of file that routine will throw an exception informing you of the error.
These methods are all called on the package name
Returns a boolean response if the given path (local or remote) was a BigBed or not.
Returns a boolean response if the given path (local or remote) was a BigWig or not.
Opens a BigWig from a given path (local or remote) and returns a Bio::DB::Big::File object.
Opens a BigBed from a given path (local or remote) and returns a Bio::DB::Big::File object.
These methods are available across all big files.
Returns 0 if the file is a BigWig and 1 if the file is a BigBed
Returns true if the file is a BigWig
Returns true if the file is a BigBed
Returns a hash to the big file's header. Elements available are
Ensure that these are used according to the file type.
my $chroms_hash = $bf->chroms(); foreach my $chrom (keys %{$chroms_hash}) { my $h = $chroms_hash->{$chrom}; printf("%s - %d", $h->{name}, $h->{length}); }
Returns a hash of chromsoomes keyed by the chromsome name. Each value is a hash with the keys name and length.
Return the length of the specified chromosome
Returns a boolean if the given chromosome was found in the given big file i.e. there was data recorded for it
The following methods are available only for bigwig files
Used to calculate statistics from a BigWig file across a range specified in 0-based, half-open coordinates. Chromosome, start and end are required paramters. Bins defaults to 1 and type defaults to mean. Consult the later documentation on the available summary statistics you can request.
my $stats = $bf->get_stats('chr1', 0, 100, 5, "max", 0); foreach my $v (@{$stats}) { printf("%f\n", $v); }
The full parameter is used to force libBigWig to use the true underlying values held in the BigWig file for a small speed penalty. If this is set to false (as is done by default) then the library will use the pre-computed summary statistic zoom levels to calculate your request from. More information is available from https://github.com/dpryan79/pyBigWig#a-note-on-statistics-and-zoom-levels.
Calculate all available statistics and returns the elements back to you in a single array reference of Hashes.
my $stats = $bf->get_all_stats('chr1', 0, 100, 5, 0); foreach my $v (@{$stats}) { printf("mean -- %f | min -- %f | max -- %f", $v->{mean}, $v->{min}, $v->{max}); if(exists $v->{cov}) { printf(" | cov -- %f", $v->{cov}); } print "\n"; }
Each hash is keyed by the following elements.
If a statistic was not available then the key will not be available in the hash to differentiate between those values missing and those which were set explicitly to a value e.g. telling the difference quickly between 0 and a lack of a value.
Be aware that this code currently executes a seperate stats call for each type of statistic so the runtime of this method will be 5x slower than running get_stats() on the types you want. This performance may change if libBigWig supports this kind of operation.
get_stats()
Used to retrieve the original values for each base across a range specified in 0-based, half-open coordinates. Chromosome, start and end are required paramters. The returned array will contain an element for each base in the given range. Those without a value in the underlying BigWig file will be returned as undefined. You must check for these values when iterating the list.
my $values = $bf->get_values('chr1', 0, 100); foreach my $v (@{$values}) { if(! defined $v) { print "X\n"; } else { printf("%f\n", $v); } }
Used to retrieve the intervals that overlap a range specified in 0-based, half-open coordinates. Chromosome, start and end are required paramters. The returned array will contain a hash for each interval with the keys start (0 based), end (half-open) and value (a double).
my $intervals = $bf->get_intervals('chr1', 0, 100); foreach my $i (@{$intervals}) { printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value}); }
An iterator version of the get_intervals() code allowing you to walk through an entire BigWig file of data without loading all of it into memory.
get_intervals()
my $iter = $bf->get_intervals_iterator('chr1', 0, 100); while(my $intervals = $iter->next()) { foreach my $i (@{$intervals}) { printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value}); } }
Used to retrieve the intervals that overlap a range specified in 0-based, half-open coordinates. Chromosome, start and end are required paramters. The $use_string parameter controls if the call returns just the bounds of each bed record or returns the tab seperated Bed line along with the element. If you are using bed for most things apart from overlap calls then you want to set this to true.
The returned array will contain a hash for each entry with the keys start (0 based), end (half-open) and string (a string). If strings were not requested then the string key will be absent from the hash.
my $entries = $bf->get_entries('chr1', 0, 100, 1); foreach my $e (@{$entries}) { printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string}); }
Iterator version of get_entries() allowing you to walk through an entire BigBed file of entries without loading all of it into memory.
get_entries()
my $iter = $bf->get_entries_iterator('chr1', 0, 100, 1); while(my $entries = $iter->next()) { foreach my $e (@{$entries}) { printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string}); } }
Returns the AutoSQL held alongside a BigBed file. Will return undef if no AutoSQL was used in the file.
Returns a Bio::DB::Big::AutoSQL object representing the retrieved AutoSQL string. Will return undef if there is no AutoSQL assoicated with the big file. Can throw an exception if the AutoSQL string does not correctly parse.
The following strings can be used when calculating statistics over a BigWig file.
Copyright [2015-2017] EMBL-European Bioinformatics Institute
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
To install Bio::DB::Big, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::DB::Big
CPAN shell
perl -MCPAN -e shell install Bio::DB::Big
For more information on module installation, please visit the detailed CPAN module installation guide.