MS::Protein - A class representing protein species for proteomic analysis


    use MS::Protein;
    use BioX::Seq::Stream;

    my $p = BioX::Seq::Stream->new('some_proteome.fasta');
    my $seq = $p->next_seq;

    my $pro = MS::Protein->new($seq);

    say "pI:", $pro->isoelectric_point; # or $pro->pI;
    say "MW:", $pro->molecular_weight;  # or $pro->mw;
    say "hydropathy:", $pro->gravy;
    say "AI:", $pro->aliphatic_index;   # or $pro->ai;
    say "EC:", $pro->extinction_coefficient;  # or $pro->ec;

    my $z = $pro->charge_at_pH( 7.0 );

    my $atoms = $pro->n_atoms;
    say "Atom counts:";
    for (keys %$atoms) {
        say join "\t", $_, $atoms->{$_};

    my $res = $pro->n_residues;
    say "Residue counts:";
    for (keys %$res) {
        say join "\t", $_, $res->{$_};

    use MS::CV qw/:MS/; # use enzyme constants

    my @peptides = $pro->digest(
        enzymes => [
        missed => 1,
        min_len => 6,

    ## All methods can also be used as functions, e.g.

    my $pi = pI( 'AAPLSYAMK' );
    my $z  = charge_at_pH( 'AAPLSYAMK' );


MS::Protein is a class representing protein species for use in proteomics analysis. It inherits from the MS::Peptide class. It is intended to hold methods more likely to be useful for complete protein sequences, but this distinction is entirely semantic. There may be times when the methods contained here may be usefully implied on partial peptide sequences as well. At some point these methods may be moved into the MS::Peptide class and this class become a simple stub for MS::Peptide, but the change will be backward-compatible.

All methods of the class can also be used as functions on simple scalar strings. This can improve performance in some situations where a large number of protein (or peptide) sequences are processed. The only method/function that produces a different output when called as a method vs function is digest(), as detailed in its documentation below.


All methods of the MS::Peptide class, including the constructor, are shared. Methods specific to MS::Protein are:


    use MS::CV qw/:MS/;
    my @peptides = $pro->digest(
        enzymes => [
        missed => 1,
        min_len => 6,

Performs an in silico hydrolytic cleavage on a protein sequence based on the supplied parameters. When called as a method, returns an array of MS::Peptide objects representing digested peptides. When called as a function, returns an array of strings representing digested peptides. Available options include:

  • enzymes — a reference to an array of CV terms representing cleavage enzymes. See details below on finding valid IDs to use. Required.

  • missed — the number of allowable missed cleavages. All possible valid peptides satisfying this criterion will be reported. Default: 0.

  • min_len — the minimum length of peptide to be returned. Default: 1. be left undefined if not known.

Enzyme IDs

The method requires that cleavage enzymes be specified by their psi-ms CV terms, due to the fact that the regex patterns used are also extracted from the psi-ms CV. The easiest way to do this is to use the constants exported by MS::CV. A full list of available constants can be exported using:

    use MS::CV;

and then look for the terms under the 'cleavage agent name' parent term. A (possibly out of date) list of available constants:

  • MS_TRYPSIN (Trypsin)

  • MS_TRYPSIN_P (Trypsin/P)

  • MS_ASP_N (Asp-N)

  • MS_ARG_C (Arg-C)

  • MS_LYS_C (Lys-C)

  • MS_LYS_C_P (Lys-C/P)

  • MS_LEUKOCYTE_ELASTASE (leukocyte elastase)

  • MS_GLUTAMYL_ENDOPEPTIDASE (glutamyl endopeptidase)

  • MS_CNBR (CNBr)

  • MS_PROLINE_ENDOPEPTIDASE (proline endopeptidase)

  • MS_2_IODOBENZOATE (2-iodobenzoate)

  • MS_V8_DE (V8-DE)

  • MS_FORMIC_ACID (Formic_acid)

  • MS_CHYMOTRYPSIN (Chymotrypsin)

  • MS_ASP_N_AMBIC (Asp-N_ambic)

  • MS_PEPSINA (PepsinA)

  • MS_V8_E (V8-E)

  • MS_TRYPCHYMO (TrypChymo)

isoelectric_point =head2 pI

    my $pi = $pro->isoelectric_point;
    my $pi = pI( 'ACDEF' );

Returns the isoelectric point of the protein (the pH at which the net charge is expected to be zero). The pKA values used are based on those of the ProMoST webserver (

molecular_weight =head2 mw

    my $mw = $pro->molecular_weight;
    my $mw = $pro->mw('mono'); monoisotopic mass
    my $mw = $pro->mw('average'); average mass
    my $mw = mw( 'ACDEF', 'mono' );

Returns the neutral molecular weight of the protein. Takes an optional argument specifying the type of mass to use (mono for monoisotopic or average for average mass).

aliphatic_index =head2 ai

    my $ai = $pro->aliphatic_index;
    my $ai = $pro->ai;
    my $ai = ai( 'ACDEF' );

Returns the aliphatic index of the protein (the relative volume taken up by aliphatic side chains).

extinction_coefficient =head2 ec

    my $ec = $pro->extinction_coefficient;
    my $ec = $pro->ec;
    my $ec = ec( 'ACDEF' );

Returns the extinction coefficient of the protein.


    my $gravy = $pro->gravy;
    my $gravy = gravy( 'ACDEF' );

Returns the GRAVY (grand average of hydropathy) of a protein. Calculated based on the values of Kyte and Doolittle (


    my $z = $pro->charge_at_pH( 7.0 );
    my $z = charge_at_pH( 'ACDEF', 7.0 );

Returns the expected net charge of the protein at the given pH. The pKA values used are based on those of the ProMoST webserver (



    my $n_res   = $pro->n_residues;
    my $n_res   = n_residues( 'ACDEF' );
    my $n_atoms = $pro->n_atoms;
    my $n_atoms = n_atoms( 'ACDEF' );

Returns a hash reference where the keys are atom or residue names, respectively, and the values are the counts of those units in the protein.


The API is in alpha stage and is not guaranteed to be stable.

Please reports bugs or feature requests through the issue tracker at



Jeremy Volkening <>


Copyright 2015-2019 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <>.