MS::Protein - A class representing protein species for proteomic analysis
use MS::Protein; use BioX::Seq::Stream; my $p = BioX::Seq::Stream->new('some_proteome.fasta'); my $seq = $p->next_seq; my $pro = MS::Protein->new($seq); say "pI:", $pro->isoelectric_point; # or $pro->pI; say "MW:", $pro->molecular_weight; # or $pro->mw; say "hydropathy:", $pro->gravy; say "AI:", $pro->aliphatic_index; # or $pro->ai; say "EC:", $pro->extinction_coefficient; # or $pro->ec; my $z = $pro->charge_at_pH( 7.0 ); my $atoms = $pro->n_atoms; say "Atom counts:"; for (keys %$atoms) { say join "\t", $_, $atoms->{$_}; } my $res = $pro->n_residues; say "Residue counts:"; for (keys %$res) { say join "\t", $_, $res->{$_}; } use MS::CV qw/:MS/; # use enzyme constants my @peptides = $pro->digest( enzymes => [ MS_TRYPSIN, ], missed => 1, min_len => 6, ); ## All methods can also be used as functions, e.g. my $pi = pI( 'AAPLSYAMK' ); my $z = charge_at_pH( 'AAPLSYAMK' );
MS::Protein is a class representing protein species for use in proteomics analysis. It inherits from the MS::Peptide class. It is intended to hold methods more likely to be useful for complete protein sequences, but this distinction is entirely semantic. There may be times when the methods contained here may be usefully implied on partial peptide sequences as well. At some point these methods may be moved into the MS::Peptide class and this class become a simple stub for MS::Peptide, but the change will be backward-compatible.
All methods of the class can also be used as functions on simple scalar strings. This can improve performance in some situations where a large number of protein (or peptide) sequences are processed. The only method/function that produces a different output when called as a method vs function is digest(), as detailed in its documentation below.
All methods of the MS::Peptide class, including the constructor, are shared. Methods specific to MS::Protein are:
use MS::CV qw/:MS/; my @peptides = $pro->digest( enzymes => [ MS_TRYPSIN, ], missed => 1, min_len => 6, );
Performs an in silico hydrolytic cleavage on a protein sequence based on the supplied parameters. When called as a method, returns an array of MS::Peptide objects representing digested peptides. When called as a function, returns an array of strings representing digested peptides. Available options include:
enzymes — a reference to an array of CV terms representing cleavage enzymes. See details below on finding valid IDs to use. Required.
enzymes
missed — the number of allowable missed cleavages. All possible valid peptides satisfying this criterion will be reported. Default: 0.
missed
min_len — the minimum length of peptide to be returned. Default: 1. be left undefined if not known.
min_len
The method requires that cleavage enzymes be specified by their psi-ms CV terms, due to the fact that the regex patterns used are also extracted from the psi-ms CV. The easiest way to do this is to use the constants exported by MS::CV. A full list of available constants can be exported using:
use MS::CV; MS::CV::print_tree('MS');
and then look for the terms under the 'cleavage agent name' parent term. A (possibly out of date) list of available constants:
MS_TRYPSIN (Trypsin)
MS_TRYPSIN
MS_TRYPSIN_P (Trypsin/P)
MS_TRYPSIN_P
MS_ASP_N (Asp-N)
MS_ASP_N
MS_ARG_C (Arg-C)
MS_ARG_C
MS_LYS_C (Lys-C)
MS_LYS_C
MS_LYS_C_P (Lys-C/P)
MS_LYS_C_P
MS_LEUKOCYTE_ELASTASE (leukocyte elastase)
MS_LEUKOCYTE_ELASTASE
MS_GLUTAMYL_ENDOPEPTIDASE (glutamyl endopeptidase)
MS_GLUTAMYL_ENDOPEPTIDASE
MS_CNBR (CNBr)
MS_CNBR
MS_PROLINE_ENDOPEPTIDASE (proline endopeptidase)
MS_PROLINE_ENDOPEPTIDASE
MS_2_IODOBENZOATE (2-iodobenzoate)
MS_2_IODOBENZOATE
MS_V8_DE (V8-DE)
MS_V8_DE
MS_FORMIC_ACID (Formic_acid)
MS_FORMIC_ACID
MS_CHYMOTRYPSIN (Chymotrypsin)
MS_CHYMOTRYPSIN
MS_ASP_N_AMBIC (Asp-N_ambic)
MS_ASP_N_AMBIC
MS_PEPSINA (PepsinA)
MS_PEPSINA
MS_V8_E (V8-E)
MS_V8_E
MS_TRYPCHYMO (TrypChymo)
MS_TRYPCHYMO
my $pi = $pro->isoelectric_point; my $pi = pI( 'ACDEF' );
Returns the isoelectric point of the protein (the pH at which the net charge is expected to be zero). The pKA values used are based on those of the ProMoST webserver (https://dx.doi.org/10.1007%2F978-1-60327-834-8_21).
my $mw = $pro->molecular_weight; my $mw = $pro->mw('mono'); monoisotopic mass my $mw = $pro->mw('average'); average mass my $mw = mw( 'ACDEF', 'mono' );
Returns the neutral molecular weight of the protein. Takes an optional argument specifying the type of mass to use (mono for monoisotopic or average for average mass).
mono
average
my $ai = $pro->aliphatic_index; my $ai = $pro->ai; my $ai = ai( 'ACDEF' );
Returns the aliphatic index of the protein (the relative volume taken up by aliphatic side chains).
my $ec = $pro->extinction_coefficient; my $ec = $pro->ec; my $ec = ec( 'ACDEF' );
Returns the extinction coefficient of the protein.
my $gravy = $pro->gravy; my $gravy = gravy( 'ACDEF' );
Returns the GRAVY (grand average of hydropathy) of a protein. Calculated based on the values of Kyte and Doolittle (https://doi.org/10.1016/0022-2836(82)90515-0).
my $z = $pro->charge_at_pH( 7.0 ); my $z = charge_at_pH( 'ACDEF', 7.0 );
Returns the expected net charge of the protein at the given pH. The pKA values used are based on those of the ProMoST webserver (https://dx.doi.org/10.1007%2F978-1-60327-834-8_21).
my $n_res = $pro->n_residues; my $n_res = n_residues( 'ACDEF' ); my $n_atoms = $pro->n_atoms; my $n_atoms = n_atoms( 'ACDEF' );
Returns a hash reference where the keys are atom or residue names, respectively, and the values are the counts of those units in the protein.
The API is in alpha stage and is not guaranteed to be stable.
Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-MS/issues.
InSilicoSpectro
Jeremy Volkening <jdv@base2bio.com>
Copyright 2015-2019 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
To install MS, copy and paste the appropriate command in to your terminal.
cpanm
cpanm MS
CPAN shell
perl -MCPAN -e shell install MS
For more information on module installation, please visit the detailed CPAN module installation guide.