The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

idr2pept.pl - Extraction of reliable peptide/spectrum matches from Phenyx .idr.xml files

SYNOPSIS

xml2pept.pl [options] idr.xml files

OPTIONS

Use idr2pept.pl -h

DESCRIPTION

The script parses one or several Phenyx .idr.xml files to extract reliable peptide/spectrum matches and outputs them in the .peptSpectra.xml format. The .idr.xml file(s) can be compressed (gzipped) files.

The selection of the peptide assignments is performed based on several thresholds applied to identifications found in the idr.xml file(s):

maximum peptide p-value
minimum peptide score
minimum peptide z-score
minimum protein score
minimum number of distinct peptides per protein
minimum peptide save z-score

To be selected a peptide must have a p-value smaller than the maximum peptide p-value, score and z-score larger than the minimum peptide score and z-score respectively, and a minimum number of distinct peptides satisfying the latter criteria must match a given protein entry in the database. In case less than the minimum number of distinct peptides is found for a protein, then all the ones having a z-score higher than the minimum save z-score are nonetheless selected.

During the parsing of the file, each spectrum is associated with the peptide that gives the best match. That is, all multiple interpretations of a spectrum are lost in favor of the best one.

It is possible to restrict the exported peptides to an imposed charge state. All the peptides participate in the selection (criterion on the number of distinct peptides per protein), but only the ones having the imposed charge are printed in the .peptSpectra.xml output.

It is also possible to give a fasta file containing a list of protein sequences that are known to be in the analyzed sample. In this case, an additional condition for a peptide to be selected is that it appears in one of the given sequences. This option is useful when analyzing mixtures of purified proteins for quality control or any other purpose. It allows to work with released thresholds to increase sensitivity by maintaining high confidence in the selected peptide/spectrum matches.

Finally, a list of database names can be provided to the script if the original search .idr.xml files contained results found in several databases.

EXAMPLE

./idr2pept.pl example.idr.xml > test.peptSpectra.xml

AUTHOR

Jacques Colinge