fasta-shuffle-notryptic.pl - shuffle each sequence, without any original tryptic peptide
Reads input fasta file and produce a shuffle databank & avoid known cleaved peptides: shuffle sequence but avoid producing known tryptic peptides
#shuffle each sequence fasta-shuffle-notryptic.pl --in=/tmp/uniprot_sprot.fasta
#to limit memory usage, one can use CRC code (--crcsize will ./fasta-shuffle-notryptic.pl --ac-prefix=DECOY_ --in=/home/alex/tmp/a.fasta --out=/tmp/a.fasta --crcsize=33 -v --norandom
An input fasta file (will be uncompressed if ending with gz)
A .fasta file [default is stdout]
Set a key to be prepended before the AC in the randomized bank. By default, it will be dependent on the choosen method.
Set the size of the peptide to be reshuffled if they already exist
Building a hash of known cleaved peptide can be quite demanding for memory (uniprot_trembl => ~4GB). Therefore solution is to make an array containing statements if or not a peptide with corresponding crc code was found.
The argument passed here is the number of bits use for the CRC coding: 33 means 2^33 bit of memory => 2^30 bytes => 1GB
Random generator seed is set to 0, so 2 run on same data will produce the same result
do not display terminal progress bar (if possible)
Setting an environment variable DO_NOT_DELETE_TEMP=1 will keep the temporay file after the script exit
Copyright (C) 2004-2006 Geneva Bioinformatics www.genebio.com
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Alexandre Masselot, www.genebio.com
To install InSilicoSpectro::Databanks, copy and paste the appropriate command in to your terminal.
cpanm
cpanm InSilicoSpectro::Databanks
CPAN shell
perl -MCPAN -e shell install InSilicoSpectro::Databanks
For more information on module installation, please visit the detailed CPAN module installation guide.