MS::Reader::MzIdentML - A simple but complete mzIdentML parser
use MS::Reader::MzIdentML; my $idents = MS::Reader::MzIdentML->new('idents.mzIdentML'); # spectrum/peptide-level results while (my $result = $idents->next_spectrum_result) { # result is an MS::Reader::MzIdentML::SpectrumIdentificationResult # object } # protein-level results while (my $grp = $idents->next_protein) { # result is an MS::Reader::MzIdentML::ProteinAmbiguityGroup # object } # multi-analysis file my $n = $idents->n_ident_lists; for (0..$n-1) { $idents->goto_ident_list($_); while (my $result = $idents->next_spectrum_result) { # result is an MS::Reader::MzIdentML::SpectrumIdentificationResult # object } }
MS::Reader::MzIdentML is a parser for the HUPO PSI standard mzIdentML format for mass spectrometry search results. It aims to provide complete access to the data contents while not being overburdened by detailed class infrastructure. Convenience methods are provided for accessing commonly used data. Users who want to extract data not accessible through the available methods should examine the data structure of the parsed object. The dump() method of MS::Reader::XML, from which this class inherits, provides an easy method of doing so.
MS::Reader::MzIdentML
dump()
Currently this module is only semi-complete. The parsing routines are functional, but there is a lack of direct access to much of the data, requiring traversal of the underlying data structure. Hopefully this situation will improve in the future.
MS::Reader::MzIdentML is a subclass of MS::Reader::XML, which in turn inherits from MS::Reader, and inherits the methods of these parental classes. Please see the documentation for those classes for details of available methods not detailed below.
my $idents = MS::Reader::MzIdentML->new( $fn, use_cache => 0, paranoid => 0, );
Takes an input filename (required) and optional argument hash and returns an MS::Reader::MzIdentML object. This constructor is inherited directly from MS::Reader. Available options include:
use_cache — cache fetched records in memory for repeat access (default: FALSE)
paranoid — when loading index from disk, recalculates MD5 checksum each time to make sure raw file hasn't changed. This adds (typically) a few seconds to load times. By default, only file size and mtime are checked.
while (my $r = $idents->next_spectrum_result) { # do something }
Returns an MS::Reader::MzIdentML::SpectrumIdentificationResult object representing the next spectrum query in the file, or undef if the end of records has been reached. Typically used to iterate over each search query in the run.
MS::Reader::MzIdentML::SpectrumIdentificationResult
undef
my $r = $idents->fetch_spectrum_result($idx);
Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::SpectrumIdentificationResult object representing the result at that index. Throws an exception if the index is out of range.
while (my $g = $idents->next_protein_group) { # do something }
Returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup object representing the next protein group result in the file, or undef if the end of records has been reached. Typically used to iterate over each protein group in the run.
MS::Reader::MzIdentML::ProteinAmbiguityGroup
my $g = $idents->fetch_protein_group($idx);
Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup object representing the protein group at that index. Throws an exception if the index is out of range.
$idents->goto_ident_list($idx);
Takes a single argument (zero-based list index) and sets the current spectrum result list to that index (for subsequent calls to next_spectrum_result).
next_spectrum_result
my $n = $idents->n_ident_lists;
Returns the number of spectrum identification lists in the file.
my $seq = $idents->fetch_dbsequence_by_id( $seq_id );
Given a DBSequence element ID, returns the corresponding MS::Reader::MzIdentML::DBSequence object.
my $pep = $idents->fetch_peptide_by_id( $pep_id );
Given a Peptide element ID, returns the corresponding MS::Reader::MzIdentML::Peptide object.
my $pe = $idents->fetch_peptideevidence_by_id( $pe_id );
Given a PeptideEvidence element ID, returns the corresponding MS::Reader::MzIdentML::PeptideEvidence object.
my $fn = $idents->raw_file($id);
Takes a single argument (ID of raw source) and returns the path on disk to the raw file (as recorded in the mzIdentML).
The API is in alpha stage and is not guaranteed to be stable.
Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-MS/issues.
Jeremy Volkening <jdv@base2bio.com>
Copyright 2015-2016 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
To install MS, copy and paste the appropriate command in to your terminal.
cpanm
cpanm MS
CPAN shell
perl -MCPAN -e shell install MS
For more information on module installation, please visit the detailed CPAN module installation guide.