The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MS::Reader::MzIdentML - A simple but complete mzIdentML parser

SYNOPSIS

    use MS::Reader::MzIdentML;

    my $idents = MS::Reader::MzIdentML->new('idents.mzIdentML');

    # spectrum/peptide-level results
    while (my $result = $idents->next_spectrum_result) {

        # result is an MS::Reader::MzIdentML::SpectrumIdentificationResult
        # object

    }

    # protein-level results
    while (my $grp = $idents->next_protein) {

        # result is an MS::Reader::MzIdentML::ProteinAmbiguityGroup
        # object

    }

    # multi-analysis file
    my $n = $idents->n_ident_lists;
    for (0..$n-1) {
        $idents->goto_ident_list($_);
        while (my $result = $idents->next_spectrum_result) {

            # result is an MS::Reader::MzIdentML::SpectrumIdentificationResult
            # object

        }
    }

DESCRIPTION

MS::Reader::MzIdentML is a parser for the HUPO PSI standard mzIdentML format for mass spectrometry search results. It aims to provide complete access to the data contents while not being overburdened by detailed class infrastructure. Convenience methods are provided for accessing commonly used data. Users who want to extract data not accessible through the available methods should examine the data structure of the parsed object. The dump() method of MS::Reader::XML, from which this class inherits, provides an easy method of doing so.

Currently this module is only semi-complete. The parsing routines are functional, but there is a lack of direct access to much of the data, requiring traversal of the underlying data structure. Hopefully this situation will improve in the future.

INHERITANCE

MS::Reader::MzIdentML is a subclass of MS::Reader::XML, which in turn inherits from MS::Reader, and inherits the methods of these parental classes. Please see the documentation for those classes for details of available methods not detailed below.

METHODS

new

    my $idents = MS::Reader::MzIdentML->new( $fn,
        use_cache => 0,
        paranoid  => 0,
    );

Takes an input filename (required) and optional argument hash and returns an MS::Reader::MzIdentML object. This constructor is inherited directly from MS::Reader. Available options include:

  • use_cache — cache fetched records in memory for repeat access (default: FALSE)

  • paranoid — when loading index from disk, recalculates MD5 checksum each time to make sure raw file hasn't changed. This adds (typically) a few seconds to load times. By default, only file size and mtime are checked.

next_spectrum_result

    while (my $r = $idents->next_spectrum_result) {
        # do something
    }

Returns an MS::Reader::MzIdentML::SpectrumIdentificationResult object representing the next spectrum query in the file, or undef if the end of records has been reached. Typically used to iterate over each search query in the run.

fetch_spectrum_result

    my $r = $idents->fetch_spectrum_result($idx);

Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::SpectrumIdentificationResult object representing the result at that index. Throws an exception if the index is out of range.

next_protein_group

    while (my $g = $idents->next_protein_group) {
        # do something
    }

Returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup object representing the next protein group result in the file, or undef if the end of records has been reached. Typically used to iterate over each protein group in the run.

fetch_protein_group

    my $g = $idents->fetch_protein_group($idx);

Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup object representing the protein group at that index. Throws an exception if the index is out of range.

goto_ident_list

    $idents->goto_ident_list($idx);

Takes a single argument (zero-based list index) and sets the current spectrum result list to that index (for subsequent calls to next_spectrum_result).

n_ident_lists

    my $n = $idents->n_ident_lists;

Returns the number of spectrum identification lists in the file.

fetch_dbsequence_by_id

    my $seq = $idents->fetch_dbsequence_by_id( $seq_id );

Given a DBSequence element ID, returns the corresponding MS::Reader::MzIdentML::DBSequence object.

fetch_peptide_by_id

    my $pep = $idents->fetch_peptide_by_id( $pep_id );

Given a Peptide element ID, returns the corresponding MS::Reader::MzIdentML::Peptide object.

fetch_peptideevidence_by_id

    my $pe = $idents->fetch_peptideevidence_by_id( $pe_id );

Given a PeptideEvidence element ID, returns the corresponding MS::Reader::MzIdentML::PeptideEvidence object.

raw_file

    my $fn = $idents->raw_file($id);

Takes a single argument (ID of raw source) and returns the path on disk to the raw file (as recorded in the mzIdentML).

CAVEATS AND BUGS

The API is in alpha stage and is not guaranteed to be stable.

Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-MS/issues.

AUTHOR

Jeremy Volkening <jdv@base2bio.com>

COPYRIGHT AND LICENSE

Copyright 2015-2016 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.