NAME

SeqDiff - A tool to find the differences between two Seq objects.

SYNOPSIS

  # use the package
	use SeqDiff;
	
  # get some SeqI objects from somewhere (GenBank/RefSeq/...)
	my $old_seq;        # a Bio::SeqI implementing object
	my $new_seq;        # a Bio::SeqI implementing object
	
  # get a new instance
	my $seqdiff = SeqDiff->new(
		-old 	=> $old_seq,
		-new 	=> $new_seq,
	);
	
  # match the features
	$seqdiff->match_features();	
	
  # loop through the pairs of matching features and compare
	while ( my $diff = $seqdiff->next() ) {
		next unless ref $diff;
		# do something with $diff
	}
	
  # get whatever features were 'lost' or 'gained
    my @lost      = $seqdiff->get_lost_features();
    my @gained    = $sefdiff->get_gained_features();

	

DESCRIPTION

The SeqDiff tool presented here will compare two Bio::Seq objects. It first looks through both objects and matches their features based on some criteria. It then recursively compares each pair of features and returns the comparison.

Originally the package calculated the differences for all the features instantly (in memory.) This caused a problem for Seq objects that have large numbers of features. Now the SeqDiff object has a method called next() that should be used to iterate through the comparisons.

This package was developed specifically for comparing the file- histories of GenBank/RefSeq files....what changed from one version to the next?

CONSTRUCTORS SeqDiff->new()

The new() method constructs a new SeqDiff object. The returned object can be used to retreive differences between the two SeqI objects given to it.

-old

A Bio::SeqI implementing object. This is considered to be a representation of the data that existed earlier in time.

-new

Another Bio::SeqI implementing object. This is the data that is more recent relative to the other object.

-include_all

This boolean flag tells SeqDiff to return the entire comparison, not just the differences between the two features. It will return a hash consisting of the keys:

'old'           # the feature from the "old" obj
'new'           # the feature from the "new" obj
'comparison'    # the complete comparison
-verbose

This boolean flag will print nice messages about what is going on. Pretty much useless.

OBJECT METHODS

See below for more detailed summaries. The main methods are:

$seqdiff->match_features()

Match the two objects' features to each other. ("Line 'em up.")

$seqdiff->next()

Return the result of the comparison between the next two matching
features from the stream, or nothing if no more. ("Knock 'em down.")

$seqdiff->get_lost_features()

Returns an array of the features that were not matched from the 
"old" seq object. (i.e. They were 'lost' from older to newer.)

$seqdiff->get_gained_features()

Returns an array of the features that were not matched from the 
"new" seq object. (i.e. They were 'gained' from older to newer.)    

AUTHOR

Lance Ferguson <lancer92385@neo.tamu.edu<gt>

Daniel Renfro <bluecurio@gmail.com<gt>

APPENDIX

The rest of the documentation details each of the object methods. Internal methods are usually preceeded by an underscore "_".

new

Title   : new
Usage   : $seqdiff = SeqDiff->new( %options );
Function: Returns a new instance of this class.
Returns : An object
Args    : Named parameters:
           -old    		=> SeqI object of the older data
           -new   			=> SeqI object of the newer data
           -include_all	=> include all features, not just the comparison
           -verbose		=> print (possible) helpful messages
           

old_seq

Title   : old_seq
Usage   : $seqdiff->old_seq( $seq );
Function: If a parameter is given to this method it will set the "old" Seq object.
          This is purely convention (based on what is called "lost" or "gained.")
          If a parameter is not given, it will return the object that is currently set
          to the "old" object.
Returns : a SeqI implementing object.
Args    : new value (optional)

new_seq

Title   : new_seq
Usage   : $seqdiff->new_seq( $seq );
Function: If a parameter is given to this method it will set the "new" Seq object.
          This is purely convention (based on what is called "lost" or "gained.")
          If a parameter is not given, it will return the object that is currently set
          to the "new" object.
Returns : a SeqI implementing object.
Args    : new value (optional)

match_features

Title   : match_features
Usage   : $seqdiff->match_features();
Function: First loops through the features and determines which ones are available to match, 
          based on the criteria set forth in SeqDiff::_feature_pair_matches(). These features get
          grouped into three categories:
            1. matched   - features that matched
            2. lost      - features in the "old" object that are not in the "new"
            3. gained    - features in the "new" object that are not in the "old"
          Then the method compares each set of matching features using the method
          SeqDiff::_compare_features().       
Returns : null
Args    : none (Uses member variables.)

get_lost_features

Title   : get_lost_features
Usage   : $seqdiff->get_lost_features();
Function: Returns an array of the features that failed to match
          from the "old" seq object...based on the given criteria. 
Returns : an array
Args    : none

get_gained_features

Title   : get_gained_features
Usage   : $seqdiff->get_gained_features();
Function: Returns an array of the features that failed to match
          from the "new" seq object...based on the given criteria. 
Returns : an array
Args    : none

next

Title   : next
Usage   : $seqdiff->next();
Function: Calculates the difference between the next two matching 
          features from the stream and returns it.
Returns : A hash of the differences, true if there are no differences,
          or false if there is nothing else to compare
Args    : none

primary_tag_whitelist

Title   : primary_tag_whitelist
Usage   : $seqdiff->primary_tag_whitelist( @list );
Function: Sets or gets the array of whitelisted primary_tags to use for
          matching the features in _feature_pair_matches.
          
          Currently unused.
          
Returns : an array
Args    : an array or nothing

dbxref_prefix_whitelist

Title   : dbxref_prefix_whitelist
Usage   : $seqdiff->dbxref_prefix_whitelist( @list );
Function: Sets or gets the array of whitelisted database cross-
          references to use for matching the features in 
          _feature_pair_matches.           
Returns : an array
Args    : an array or nothing

BioPerl_object_handler

 Title   : BioPerl_object_handler
 Usage   : $seqdiff->BioPerl_object_handler( %list );
 Function: Sets or gets the mapping of object-types to callbacks for
           specific types of (BioPerl) objects. This method simply 
           registers callbacks for a class. _compare_properties 
           uses this hash to look for code to run when it encounters
           an object as a property of a feature. See _compare_properties.
           
           Example:
           
           my %callbacks = (
	          'Bio::PrimarySeqI'    => \&_my_Bio_PrimarySeqI_hander,
	          'Bio::LocationI'      => \&_my_Bio_LocationI_handler,
            );
            $seqdiff->BioPerl_object_handler( %callbacks );
  
 Returns : a hash
 Args    : an hash or nothing
 

INTERNAL METHODS

The methods are listed here for understanding the internals of the package. Most of the time these methods should not be called directly. Use at your own risk.

_feature_pair_matches

Title   : _feature_pair_matches
Usage   : $seqdiff->_feature_pair_matches( $fA, $fB );
Function: This method contains the criteria to match features on. It can be 
          overridden to provide specific criteria.
Returns : boolean
Args    : two SeqFeatureI implementing objects, the older one first.

_compare_features

 Title   : _compare_features
 Usage   : $seqdiff->_compare_features( $feature_A, $feature_B );
 Function: Typically run by SeqDiff->match_features() and not called directly,
           this method will compare two objects. This is the heart of the SeqDiff 
           package.            
 Returns : This method returns one of two things:
			  1. A reference to a hash; the three keys being 'lost', 'gained', and 'common'.
				 This refers to properties that were either lost, gained, or that both 
				 objects have in common. This hash recurses exhaustively. I suggest using
				 Data::Dumper or YAML to have a look at it.
			  2. False - the two objects are exactly the same and no difference could
						 be found.
 Args    : Two objects in the ambiguous order (old, new);
 

_compare_properties

Title   : _compare_properties
Usage   : $seqdiff->_compare_properties( $fA, $fB );
Function: Compares the internals of the features. Essentially a general 
          object-diffing method. Has code for attaching callbacks for 
          specific types of BioPerl objects (see BioPerl_object_handler.)
Returns : boolean
Args    : two SeqFeatureI implementing objects, the older one first.

_get_differences

Title   : _get_differences
Usage   : $seqdiff->_get_differences();
Function: Returns whatever is currently in the $_differences property.

          Currently unused.

Returns : a hash
Args    : none

_specific_bp_obj_handler

Title   : _specific_bp_obj_handler
Usage   : $seqdiff->_specific_bp_obj_handler( $class, $oA, $oB );
Function: Internal method that looks through the registered callbacks based
          on the class given. It first looks for any callbacks that match exactly
          to the classname, then checks inheretance in a depth-first manner.
          
          Totally untested! Sounded like a good idea at the time.
          
Returns : ??
Args    : string, obj, obj