Bio::Search::BlastUtils - Utility functions for Bio::Search:: BLAST objects
# This module is just a collection of subroutines, not an object.
The BlastUtils.pm module is a collection of subroutines used primarily by Bio::Search::Hit::BlastHit objects for some of the additional functionality, such as HSP tiling. Right now, the BlastUtils is just a collection of methods, not an object, and it's tightly coupled to Bio::Search::Hit::BlastHit. A goal for the future is to generalize it to work based on the Bio::Search interfaces, then it can work with any objects that implements them.
Steve Chervitz <firstname.lastname@example.org>
Usage : tile_hsps( $sbjct ); : This is called automatically by Bio::Search::Hit::BlastHit : during object construction or : as needed by methods that rely on having tiled data. Purpose : Collect statistics about the aligned sequences in a set of HSPs. : Calculates the following data across all HSPs: : -- total alignment length : -- total identical residues : -- total conserved residues Returns : n/a Argument : A Bio::Search::Hit::BlastHit object Throws : n/a Comments : : This method is *strongly* coupled to Bio::Search::Hit::BlastHit : (it accesses BlastHit data members directly). : TODO: Re-write this to the Bio::Search::Hit::HitI interface. : : This method performs more careful summing of data across : all HSPs in the Sbjct object. Only HSPs that are in the same strand : and frame are tiled. Simply summing the data from all HSPs : in the same strand and frame will overestimate the actual : length of the alignment if there is overlap between different HSPs : (often the case). : : The strategy is to tile the HSPs and sum over the : contigs, collecting data separately from overlapping and : non-overlapping regions of each HSP. To facilitate this, the : HSP.pm object now permits extraction of data from sub-sections : of an HSP. : : Additional useful information is collected from the results : of the tiling. It is possible that sub-sequences in : different HSPs will overlap significantly. In this case, it : is impossible to create a single unambiguous alignment by : concatenating the HSPs. The ambiguity may indicate the : presence of multiple, similar domains in one or both of the : aligned sequences. This ambiguity is recorded using the : ambiguous_aln() method. : : This method does not attempt to discern biologically : significant vs. insignificant overlaps. The allowable amount of : overlap can be set with the overlap() method or with the -OVERLAP : parameter used when constructing the Blast & Sbjct objects. : : For a given hit, both the query and the sbjct sequences are : tiled independently. : : -- If only query sequence HSPs overlap, : this may suggest multiple domains in the sbjct. : -- If only sbjct sequence HSPs overlap, : this may suggest multiple domains in the query. : -- If both query & sbjct sequence HSPs overlap, : this suggests multiple domains in both. : -- If neither query & sbjct sequence HSPs overlap, : this suggests either no multiple domains in either : sequence OR that both sequences have the same : distribution of multiple similar domains. : : This method can deal with the special case of when multiple : HSPs exactly overlap. : : Efficiency concerns: : Speed will be an issue for sequences with numerous HSPs. : Bugs : Currently, tile_hsps() does not properly account for : the number of non-tiled but overlapping HSPs, which becomes a problem : as overlap() grows. Large values overlap() may thus lead to : incorrect statistics for some hits. For best results, keep overlap() : below 5 (DEFAULT IS 2). For more about this, see the "HSP Tiling and : Ambiguous Alignments" section in L<Bio::Search::Hit::BlastHit>.
Usage : n/a; called automatically during object construction. Purpose : Builds HSP contigs for a given BLAST hit. : Utility method called by _tile_hsps() Returns : Argument : Throws : Exceptions propagated from Bio::Search::Hit::BlastHSP::matches() : for invalid sub-sequence ranges. Status : Experimental Comments : This method does not currently support gapped alignments. : Also, it does not keep track of the number of HSPs that : overlap within the amount specified by overlap(). : This will lead to significant tracking errors for large : overlap values.
Usage : &get_exponent( number ); Purpose : Determines the power of 10 exponent of an integer, float, : or scientific notation number. Example : &get_exponent("4.0e-206"); : &get_exponent("0.00032"); : &get_exponent("10."); : &get_exponent("1000.0"); : &get_exponent("e+83"); Argument : Float, Integer, or scientific notation number Returns : Integer representing the exponent part of the number (+ or -). : If argument == 0 (zero), return value is "-999". Comments : Exponents are rounded up (less negative) if the mantissa is >= 5. : Exponents are rounded down (more negative) if the mantissa is <= -5.
Usage : @cnums = collapse_nums( @numbers ); Purpose : Collapses a list of numbers into a set of ranges of consecutive terms: : Useful for condensing long lists of consecutive numbers. : EXPANDED: : 1 2 3 4 5 6 10 12 13 14 15 17 18 20 21 22 24 26 30 31 32 : COLLAPSED: : 1-6 10 12-15 17 18 20-22 24 26 30-32 Argument : List of numbers sorted numerically. Returns : List of numbers mixed with ranges of numbers (see above). Throws : n/a
See Also : Bio::Search::Hit::BlastHit::seq_inds()
Usage : $boolean = &strip_blast_html( string_ref ); : This method is exported. Purpose : Removes HTML formatting from a supplied string. : Attempts to restore the Blast report to enable : parsing by Bio::SearchIO::blast.pm Returns : Boolean: true if string was stripped, false if not. Argument : string_ref = reference to a string containing the whole Blast : report containing HTML formatting. Throws : Croaks if the argument is not a scalar reference. Comments : Based on code originally written by Alex Dong Li : (email@example.com). : This method does some Blast-specific stripping : (adds back a '>' character in front of each HSP : alignment listing). : : THIS METHOD IS VERY SENSITIVE TO BLAST FORMATTING CHANGES! : : Removal of the HTML tags and accurate reconstitution of the : non-HTML-formatted report is highly dependent on structure of : the HTML-formatted version. For example, it assumes that first : line of each alignment section (HSP listing) starts with a : <a name=..> anchor tag. This permits the reconstruction of the : original report in which these lines begin with a ">". : This is required for parsing. : : If the structure of the Blast report itself is not intended to : be a standard, the structure of the HTML-formatted version : is even less so. Therefore, the use of this method to : reconstitute parsable Blast reports from HTML-format versions : should be considered a temorary solution.