MS::Reader::PepXML - A simple but complete pepXML parser


    use MS::Reader::PepXML;

    my $search = MS::Reader::PepXML->new('search.pep.xml');

    # for single search files

    while (my $result = $search->next_result) {
        # $result is an MS::Reader::PepXML::Result object

    # for multi-search files

    my $n = $search->n_lists;

    for (0..$n-1) {
        while (my $result = $search->next_result) {
            # $result is an MS::Reader::PepXML::Result object


MS::Reader::PepXML is a parser for the pepXML file format for spectral search results. It aims to provide complete access to the data contents while not being overburdened by detailed class infrastructure. Convenience methods are provided for accessing commonly used data. Users who want to extract data not accessible through the available methods should examine the data structure of the parsed object. The dump() method of MS::Reader::XML, from which this class inherits, provides an easy way of doing so.


MS::Reader::PepXML is a subclass of MS::Reader::XML, which in turn inherits from MS::Reader, and inherits the methods of these parental classes. Please see the documentation for those classes for details of available methods not detailed below.



    my $search = MS::Reader::PepXML->new( $fn,
        use_cache => 0,
        paranoid  => 0,

Takes an input filename (required) and optional argument hash and returns an MS::Reader::PepXML object. This constructor is inherited directly from MS::Reader. Available options include:

  • use_cache — cache fetched records in memory for repeat access (default: FALSE)

  • paranoid — when loading index from disk, recalculates MD5 checksum each time to make sure raw file hasn't changed. This adds (typically) a few seconds to load times. By default, only file size and mtime are checked.


    while (my $s = $search->next_result) {
        # do something with $s

Returns an MS::Reader::PepXML::Result object representing the next result (pepXML element <<spectrum_query>>) in the current result list, or undef if the end of records has been reached. In a multi-list file (i.e. multiple <<msms_run_summary>> elements) you must call goto_list() for each one followed by iterating over the list records.


    my $s = $search->fetch_result($idx);

Takes a single argument (zero-based record index) and returns an MS::Reader::PepXML::Result object representing the record at that index. Throws an exception if the index is out of range.


    my $n = $search->result_count;

Returns the number of result records in the current result list (not the same as the number of results in the file if it contains multiple runs/lists).


    my $n = $search->n_lists;

Returns the number of result lists (pepXML <msms_run_summary> elements) in the file. If this number is greater than one, individual lists can be iterated over using goto_list() and next_result().



Takes a single argument (zero-based list index) and sets the record pointer to the first result from that list.



Takes a single argument (zero-based list index) and returns the raw file path associated with that list. If index is not provided, index 0 is assumed.


The API is in alpha stage and is not guaranteed to be stable.

Please reports bugs or feature requests through the issue tracker at



Jeremy Volkening <>


Copyright 2015-2016 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <>.