The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

mascot2pept.pl - Extraction of reliable peptide/spectrum matches from Mascot .dat files

SYNOPSIS

mascot2pept.pl [options] .dat files

OPTIONS

Use mascot2pept.pl -h

DESCRIPTION

The script parses one or several Mascot .dat files to extract reliable peptide/spectrum matches and outputs them in the .peptSpectra.xml format. The .dat file(s) can be compressed (gzipped) files.

The selection of the peptide assignments is performed based on several thresholds applied to identifications found in the .dat file(s):

minimum ion score (Mascot peptide score)
minimum protein score
minimum number of distinct peptides per protein
minimum peptide save ion score
minimum peptide sequence length
minimum ion score to read a peptide from the .dat file (simple pre-filtering)

To be selected a peptide must have an ion score larger than the minimum peptide score, a protein score larger than the minimum protein score, and a minimum number of distinct peptides with sufficient score must match a given protein entry in the database. In case less than the minimum number of distinct peptides is found for a protein, then all the ones having an ion score higher than the minimum save ion score are nonetheless selected.

During the parsing of the file, each spectrum is associated with the peptide that gives the best match. That is, all multiple interpretations of a spectrum are lost in favor of the best one. Moreover, all peptides with score less than the basic score (typically 5) are not read.

It is possible to restrict the exported peptides to an imposed charge state. All the peptides participate in the selection (criterion on the number of distinct peptides per protein), but only the ones having the imposed charge are printed in the .peptSpectra.xml output.

It is also possible to give a fasta file containing a list of protein sequences that are known to be in the analyzed sample. In this case, an additional condition for a peptide to be selected is that it appears in one of the given sequences. This option is useful when analyzing mixtures of purified proteins for quality control or any other purpose. It allows to work with released thresholds to increase sensitivity by maintaining high confidence in the selected peptide/spectrum matches.

EXAMPLE

./mascot2pept.pl example.dat > test.peptSpectra.xml

AUTHOR

Jacques Colinge