The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

PepXML::Parser - A Perl parser for the pepXML file format.

VERSION

Version 0.01

SYNOPSIS

Quick summary of what the module does.

    my $parser = PepXML::Parser->new();

    my $pepxml = $parser->parse("sample.pepxml");
    
    my %msms = $pepxml->get_msms_pipeline_analysis();
    
    my @proteins = $pepxml->get_proteins();
    
    my @peptides = $pepxml->get_unique_peptides();
    
    ...

DESCRIPTION

pepXML is an open data format developed at the SPC/Institute for Systems biology for the storage, exchange, and processing of peptide sequence assignments of MS/MS scans. pepXML is intended to provide a common data output format for many different MS/MS search engines and subsequent peptide-level analyses. Several search engines already have native support for outputting pepXML and converters are available to transform output files to pepXML.

Data Structure & Access

Once the file is parsed, a deeply nested data strcuture is organized in memory, with all the information stored inside the the top level class called PepXML::PepXMLFile. Use the public methods described bellow in order to acces the data.

    PepXML::PepXMLFile  {
        Parents       Moose::Object
        public methods (17) : get_enzymes, get_hits, get_modifications, get_msms_pipeline_analysis, get_parameters, get_peptides, get_proteins, 
        get_run_summary, get_search_summary, get_unique_peptides, get_unique_proteins, meta, msms_pipeline_analysis, msms_run_summary, 
        sample_enzyme, search_hit, search_summary
        private methods (0)
        internals: {
            msms_pipeline_analysis   PepXML::MsmsPipelineAnalysis,
            msms_run_summary         PepXML::RunSummary,
            sample_enzyme            [
                [0] PepXML::Enzyme
            ],
            search_hit               [
                [0]  PepXML::SearchHit,
                [1]  PepXML::SearchHit,
                [2]  PepXML::SearchHit,
                [3]  PepXML::SearchHit,
                [4]  PepXML::SearchHit,
                [5]  PepXML::SearchHit,
                ...
            ],
            search_summary           PepXML::SearchSummary
        }

Public Methods

get_msms_pipeline_analysis

Return type: hash.

    my %msms = $pepxml->get_msms_pipeline_analysis();

Data access:

    {
    date                   "2015-04-09T18:54:11",
    summary_xml            "/Adult_Adrenalgland_Gel_Elite_49_f01.pep.xml",
    xmlns                  "http://regis-web.systemsbiology.net/pepXML",
    xmlns_schemaLocation   "http://sashimi.sourceforge.net/schema_revision/pepXML/pepXML_v117.xsd",
    xmlns_xsi              "http://www.w3.org/2001/XMLSchema-instance"
    }
    
    

get_enzymes

Return tupe: array of PepXML::Enzyme objects.

    my @enzymes = $pepxml->get_enzymes();

Data access:

    [0] PepXML::Enzyme  {
        Parents       Moose::Object
        public methods (5) : cut, meta, name, no_cut, sense
        private methods (0)
        internals: {
            cut      "KR",
            name     "Trypsin",
            no_cut   "P",
            sense    "C"
        }
    }

get_run_summary()

Return type: PepXML::RunSummary Object.

    my $summary = $pepxml->get_run_summary();
    

Data access:

    PepXML::RunSummary  {
        Parents       Moose::Object
        public methods (6) : base_name, meta, msManufacturer, msModel, raw_data, raw_data_type
        private methods (0)
        internals: {
            base_name        "Sample",
            msManufacturer   "Thermo Scientific",
            msModel          "LTQ Orbitrap Elite",
            raw_data         ".mzXML",
            raw_data_type    "raw"
        }
    }
    
    

get_search_summary()

Return Type: PepXML::SearchSummary Object.

PepXML::SearchSummary is a complex object, some internal methods are accessors to other objects, like the aminoacid_modification for example.

    my $search_summary = $pepxml->get_search_summary();

Data access:

    PepXML::SearchSummary  {
        Parents       Moose::Object
        public methods (11) : aminoacid_modification, base_name, enzymatic_search_constraint, fragment_mass_type, meta, parameter, precursor_mass_type, search_database, search_engine, search_engine_version, search_id
        private methods (0)
        internals: {
            aminoacid_modification        [
                [0] PepXML::AAModification,
                [1] PepXML::AAModification
            ],
            base_name                     "/Adult_Adrenalgland_Gel_Elite_49_f01",
            enzymatic_search_constraint   PepXML::EnzSearchConstraint,
            fragment_mass_type            "monoisotopic",
            parameter                     [
                [0]  PepXML::Parameter,
                [1]  PepXML::Parameter,
                [2]  PepXML::Parameter,
                [3]  PepXML::Parameter,
                [4]  PepXML::Parameter,
                [5]  PepXML::Parameter,
                ...
            ],
            precursor_mass_type           "monoisotopic",
            search_database               PepXML::SearchDatabase,
            search_engine                 "Comet",
            search_engine_version         "2015.01 rev. 1",
            search_id                     1
        }
    }
       
        

get_modifications()

Return Type: array of PepXML::AAModification objects.

    my @mods = $pepxml->get_modifications();

Data access:

    [0] PepXML::AAModification  {
        Parents       Moose::Object
        public methods (6) : aminoacid, mass, massdiff, meta, symbol, variable
        private methods (0)
        internals: {
            aminoacid   "M",
            mass        147.035385,
            massdiff    15.994900,
            symbol      "*",
            variable    "Y"
        }
    },
    [1] PepXML::AAModification  {
        Parents       Moose::Object
        public methods (6) : aminoacid, mass, massdiff, meta, symbol, variable
        private methods (0)
        internals: {
            aminoacid   "C",
            mass        160.030649,
            massdiff    57.021464,
            symbol      "",
            variable    "N"
        }
    }

get_parameters()

Return Type: array of PepXML::Parameter objects.

    my @params = $pepxml->get_parameters();
    

Data access:

    [0] PepXML::Parameter  {
        Parents       Moose::Object
        public methods (3) : meta, name, value
        private methods (0)
        internals: {
            name    "# comet_version ",
            value   2015.01
        }
    },
    [1] PepXML::Parameter  {
        Parents       Moose::Object
        public methods (3) : meta, name, value
        private methods (0)
        internals: {
            name    "activation_method",
            value   "ALL"
        }
    },
    ...
    
    

get_db_info()

Return type: PepXML::SearchDatabase object.

    my $db = $pepxml->get_db_info;

Data access:

    PepXML::SearchDatabase  {
        Parents       Moose::Object
        public methods (3) : local_path, meta, type
        private methods (0)
        internals: {
            local_path   "Ens78plusREV_plusPeps.fa",
            type         "AA"
        }
    }
    
    

get_hits()

Return type: array of PepXML::SearchHit objects.

    my @hits = $pepxml->get_hits();

Data access:

    [0] PepXML::SearchHit  {
        Parents       Moose::Object
        public methods (22) : assumed_charge, calc_neutral_pep_mass, end_scan, hit_rank, index, massdiff, meta, num_matched_ions, num_matched_peptides, num_missed_cleavages, num_tol_term, num_tot_proteins, peptide, peptide_next_aa, peptide_prev_aa, precursor_neutral_mass, protein, retention_time_sec, search_score, spectrum, start_scan, tot_num_ions
        private methods (0)
        internals: {
            assumed_charge           3,
            calc_neutral_pep_mass    1118.485333,
            end_scan                 517,
            hit_rank                 5,
            index                    9,
            massdiff                 0.005685,
            num_matched_ions         12,
            num_matched_peptides     3916,
            num_missed_cleavages     0,
            num_tol_term             2,
            num_tot_proteins         2,
            peptide                  "DSGHPGHAEGR",
            peptide_next_aa          "E",
            peptide_prev_aa          "R",
            precursor_neutral_mass   1118.491019,
            protein                  "ENSP00000374387",
            retention_time_sec       572.8,
            search_score             {
                deltacn       0.009,
                deltacnstar   0.000,
                expect        2.14E+01,
                sprank        46,
                spscore       172.1,
                xcorr         0.961
            },
            spectrum                 "Adult_Adrenalgland_Gel_Elite_49_f01.00517.00517.3",
            start_scan               517,
            tot_num_ions             40
        }
    }

get_proteins()

Return type: array

    my @proteins = $pepxml->get_proteins();

get_unique_proteins()

Return type: array

    my @proteins = $pepxml->get_unique_proteins();

get_peptides()

Return type: array

    my @peptides = $pepxml->get_peptides();

get_unique_peptides()

Return type: array

    my @peptides = $pepxml->get_unique_peptides();

AUTHOR

Felipe da Veiga Leprevost, <leprevost at cpan.org>

BUGS

Please report any bugs or feature requests to bug-pepxml-parser at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=PepXML-Parser. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc PepXML::Parser

You can also look for information at:

ACKNOWLEDGEMENTS

LICENSE AND COPYRIGHT

Copyright 2015 Felipe da Veiga Leprevost.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.