Ewan Birney

NAME

Bio::SearchIO::psiblast - Parser for traditional BLAST and PSI-BLAST reports

SYNOPSIS

    use Bio::SearchIO;

    my $in = Bio::SearchIO->new( -format  => 'psiblast',
                                 -file    => 'report.blastp' );

    while ( my $blast = $in->next_result() ) {
        foreach my $hit ( $blast->hits ) {
            print "Hit: $hit\n";
        }
   }

    # BLAST hit filtering function. All hits of each BLAST report must satisfy 
    # this criteria to be retained. If a hit fails this test, it is ignored.
    # If all hits of a report fail, the report will be considered hitless.
    # But we can distinguish this from the case where there were no
    # hits in the report by testing the function $blast->no_hits_found().

    my $filt_func = sub{ my $hit=shift; 
                         $hit->frac_identical('query') >= 0.5 
                             && $hit->frac_aligned_query >= 0.50
                         };

    # Not supplying a -file or -fh parameter means read from STDIN

    my $in2 = Bio::SearchIO->new( -format  => 'psiblast',
                                  -hit_filter => $filt_func
                                 );

DESCRIPTION

This module parses BLAST and PSI-BLAST reports and acts as a factory for objects that encapsulate BLAST results: Bio::Search::Result::BlastResult, Bio::Search::Hit::BlastHit, Bio::Search::HSP::BlastHSP.

This module does not parse XML-formatted BLAST reports. See Bio::SearchIO::blastxml if you need to do that.

To use this module, the only module you need to use is Bio::SearchIO.pm. SearchIO knows how to load this module when you supply a -format => 'psiblast' parameters to its new() function. For more information about the SearchIO system, see documentation in Bio::SearchIO.pm.

PSI-BLAST Support

In addition to BLAST1 and BLAST2 reports, this module can also handle PSI-BLAST reports. When accessing the set of Hits in a result, hits from different iterations are lumped together but can be distinguished by interrogating Bio::Search::Hit::BlastHit::iteration and Bio::Search::Hit::BlastHit::found_again.

If you want to collect hits only from a certain iteration during parsing, supply a function using the -HIT_FILTER parameter.

EXAMPLES

To get a feel for how to use this, have look at scripts in the examples/searchio and examples/searchio/writer directory of the Bioperl distribution as well as the test script t/SearchIO.t.

SEE ALSO

For more documentation about working with Blast result objects that are produced by this parser, see Bio::Search::Result::BlastResult, Bio::Search::Hit::BlastHit, Bio::Search::HSP::BlastHSP.

FEEDBACK

Mailing Lists

User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated.

  bioperl-l@bioperl.org              - General discussion
  http://bioperl.org/MailList.shtml  - About the mailing lists

Reporting Bugs

Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via email or the web:

  bioperl-bugs@bioperl.org
  http://bioperl.org/bioperl-bugs/

AUTHOR

Steve Chervitz <sac@bioperl.org>

See the FEEDBACK section for where to send bug reports and comments.

ACKNOWLEDGEMENTS

I would like to acknowledge my colleagues at Affymetrix for useful feedback.

COPYRIGHT

Copyright (c) 2001 Steve Chervitz. All Rights Reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

DISCLAIMER

This software is provided "as is" without warranty of any kind.

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _

new

 Usage     : Bio::SearchIO::psiblast->new( %named_parameters )
 Purpose   : Parse traditional BLAST or PSI-BLAST data a file or input stream.
           : Handles Blast1, Blast2, NCBI and WU Blast reports.
           : Populates Bio::Search:: objects with data extracted from the report.
           : (The exact type of Bio::Search objects depends on the type of
           : Bio::Factory::ResultFactory and Bio::Factory::HitFactory you hook up
           : to the SearchIO object.)
 Returns   : Bio::SearchIO::psiblast object.
 Argument  : Named parameters:  (PARAMETER TAGS CAN BE UPPER OR LOWER CASE).
           : These are in addition to those specified by Bio::SearchIO::new() (see).
           : -SIGNIF     => number (float or scientific notation number to be used
           :                         as a P- or Expect value cutoff; default = 999.)
           : -SCORE     => number (integer or scientific notation number to be used
           :                         as a score value cutoff; default = 0.)
           : -HIT_FILTER  => func_ref (reference to a function to be used for
           :                          filtering out hits based on arbitrary criteria.
           :                          This function should take a
           :                          Bio::Search::Hit::BlastHit.pm object as its first
           :                          argument and return a boolean value,
           :                          true if the hit should be filtered out).
           :                          Sample filter function:
           :                          -HIT_FILTER => sub { $hit = shift;
           :                                              $hit->gaps == 0; },
           :                 Historical note: This parameter was formerly
                             called -FILT_FUNC in the older
                             Bio::Tools::Blast::parse method. Using
                             -FILT_FUNC will still work for backward
                             compatibility.
           : -CHECK_ALL_HITS => boolean (check all hits for significance against
           :                             significance criteria.  Default = false.
           :                             If false, stops processing hits after the first
           :                             non-significant hit or the first hit that fails
           :                             the hit_filter call. This speeds parsing,
           :                             taking advantage of the fact that the hits
           :                             are processed in the order they are ranked.)
           : -MIN_LEN     => integer (to be used as a minimum query sequence length
           :                          sequences below this length will not be processed).
           :                          default = no minimum length).
           : -STATS       => boolean (collect key statistical parameters for the report: 
           :                          matrix, filters, etc. default = false). 
           :                          This requires extra parsing
           :                          so if you aren't interested in this info, don't
           :                          set this parameter. Note that the unparsed 
           :                          parameter section of a Blast report is always
           :                          accessible via $blast->raw_statistics().
           : -BEST        => boolean (only process the best hit of each report;
           :                          default = false).
           : -OVERLAP     => integer (the amount of overlap to permit between
           :                          adjacent HSPs when tiling HSPs. A reasonable value is 2.
           :                          Default = $Bio::SearchIO::psiblast::MAX_HSP_OVERLAP)
           : -HOLD_RAW_DATA => boolean (store the raw alignment sections for each hit.
           :                            used with the -SHALLOW_PARSE option).
           : -SHALLOW_PARSE => boolean (only minimal parsing; does not parse HSPs.
           :                            Hit data is limited to what can be obtained
           :                            from the description line.
           :                            Replaces the older NO_ALIGNS option.)
           :            
           :
 Comments  : Do NOT remove the HTML from an HTML-formatted Blast report by using the
           : "Save As" option of a web browser to save it as text. This renders the
           : report unparsable.
 Throws    : An exception will be thrown if a BLAST report contains a FATAL: error.

signif

Synonym for max_significance()

max_significance

 Usage     : $obj->max_significance();
 Purpose   : Gets the P or Expect value used as significance screening cutoff.
             This is the value of the -signif parameter supplied to new().
             Hits with P or E-value above this are skipped.
 Returns   : Scientific notation number with this format: 1.0e-05.
 Argument  : n/a
 Comments  : Screening of significant hits uses the data provided on the
           : description line. For NCBI BLAST1 and WU-BLAST, this data 
           : is P-value. for NCBI BLAST2 it is an Expect value.

min_score

 Usage     : $obj->min_score();
 Purpose   : Gets the Blast score used as screening cutoff.
             This is the value of the -score parameter supplied to new().
             Hits with scores below this are skipped.
 Returns   : Integer or scientific notation number.
 Argument  : n/a
 Comments  : Screening of significant hits uses the data provided on the
           : description line. 

min_length

 Usage     : $obj->min_length();
 Purpose   : Gets the query sequence length used as screening criteria.
             This is the value of the -min_len parameter supplied to new().
             Hits with sequence length below this are skipped.
 Returns   : Integer
 Argument  : n/a

See Also : signif()

highest_signif

 Usage     : $value = $obj->highest_signif();
 Purpose   : Gets the largest significance (P- or E-value) observed in
             the report.
           : For NCBI BLAST1 and WU-BLAST, this is a P-value. 
           : For NCBI BLAST2 it is an Expect value.
 Returns   : Float or sci notation number
 Argument  : n/a

lowest_signif

 Usage     : $value = $obj->lowest_signif();
 Purpose   : Gets the largest significance (P- or E-value) observed in
             the report.
           : For NCBI BLAST1 and WU-BLAST, this is a P-value. 
           : For NCBI BLAST2 it is an Expect value.
 Returns   : Float or sci notation number
 Argument  : n/a

highest_score

 Usage     : $value = $obj->highest_score();
 Purpose   : Gets the largest BLAST score observed in the report.
 Returns   : Integer or sci notation number
 Argument  : n/a

lowest_score

 Usage     : $value = $obj->lowest_score();
 Purpose   : Gets the largest BLAST score observed in the report.
 Returns   : Integer or sci notation number
 Argument  : n/a

best_hit_only

 Usage     : print "only getting best hit.\n" if $obj->best_hit_only();
 Purpose   : Set/Get the indicator for whether or not to processing only 
           : the best BlastHit.
 Returns   : Boolean (1 | 0)
 Argument  : n/a