Bio::Tools::Prepeat - Finding repeats in protein sequences
use Bio::Tools::Prepeat; my $p = Bio::Tools::Prepeat->new( wd => './working_directory' ); $p->feed(@seq); $p->buildidx(); $result = $p->query();
This is a module for locating repeats in protein sequences. Usage is as follows: feed the sequences, build index files, perform queries, and then it will return a reference to the repeat data.
my $p = Bio::Tools::Prepeat->new( wd => './working_directory' );
Contructor. You need to specify a directory's name for storing index files and other information.
Use this to feed protein sequences into the object. NOTE, the module does not do character checking for your input data.
It resets the object. Sequences will be freed from memory, and you may need to use 'loadidx' to load index files that are previously built before you perform another query.
It builds bigram index for sequences. Bigram index is used to pick up possible candidates.
It loads previously built bigram index files.
It returns a reference to repeat sequences of length 10 with sequence ids they belong to and their positions in sequences.
You can also give it a range, say
It returns a reference to repeat sequences from length 4 to length 10 with sequence ids they belong to and their positions in sequences.
or you may use it as a plain function.
use Bio::Tools::Prepeat qw(random_sequence); print random_sequence(100000);
It generates a random protein sequence, and you may use this for testing.
It is all written in Perl for now, and parts of the code will be translated into XS for better performance in next versions.
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.