The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

MSMSOutput - An object implementing common display/output methods for masses

SYNOPSIS

use MSMSOutput;

DESCRIPTION

MSMSOutput Perl object is intended to support common display and output methods for masses as obtained by mass spectrometry-related computations.

It is released under the LGPL license (see source code).

ATTRIBUTES

spectrum

A reference to a hash such as computed by MassCalculator::getFragmentMasses or an object of class MSMSTheoSpectrum.

expSpectrum

A reference to an experimental spectrum such as required by MassCalculator::matchClosest or an object of class ExpSpectrum. When this parameter is specified the constructor will assume that the hash spectrum contains data about the match with this experimental spectrum.

massIndex

The mass index in the experimental peak vectors, default 0. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.

intensityIndex

The intensity index in the experimental peak vectors, default 1. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.

tol

Relative mass error tolerance; this parameter is optional. When not specified, the matched masses found by the match algorithm are all preserved. When specified, the new tolerance is applied.

This parameter is mainly useful for match obtained via matchSpectrumClosest that does not apply any mass tolerance.

minTol

Absolute mass error, default value 0.2 Da. This parameter is used only in case tol parameter is specified, see above.

intSel

This parameter controls how the peak intensities are normalized, see function normalizeIntensities.

Parameter intSel is used provided expSpectrum was set.

prec

The number of digits after the decimal points for the masses. Default precision is 3 digits.

modifLvl

Controls how the modifications are highlighted in the vector splitPept defined below, see also function annotatePept.

cmp

This parameter is a reference to a comparison function used for sorting fragment names. If cmp is not set, the function cmpFragTypes is used instead.

METHODS

new(%h|$MSMSOutput)

Constructor. %h is a hash of attribute=>value pairs and $MSMSOutput is a InSilicoSpectro::InSilico::MSMSOutput object, from which the attributes are copied.

To prepare for actual output - through specialized methods - the constructor builds a dedicated data structure. In case users want to create new methods via inheritance or code modification, we describe hereafter this data structure:

  my $table = new InSilicoSpectro::InSilico::MSMSOutput(...);

  $table->{peptideMass} is the precursor peptide mass.
  $table->{peptide} is the precursor peptide sequence.
  $table->{modif} is the precursor peptide modification string.
  $table->{splitPept} is a reference to a vector of the same length
                      as the peptide sequence that contains each
                      amino acid with annotated modifications (see
                      parameter modifLvl above).
  $table->{intSel} is the value of the intSel parameter.

  $table->{mass}{term} contains the terminal fragments.
  $table->{mass}{intern} contains the internal fragments.

  $table->{mass}{term}[i][0] contains the name of the ith fragment type.
  $table->{mass}{term}[i][j] contains the mass of the jth fragment of type i.

  $table->{mass}{intern}[i][0] contains the name of the ith fragment type
  $table->{mass}{intern}[i][j,j+1] contains a description of the internal
                                   fragment followed by its mass, j>0.

  $table->{match} has the same structure as $table->{mass} but it
                  contains the matched experimental masses. How the
                  masses are matched depends on the match function
                  that was called.

  $table->{intens} has the same structure as $table->{match} but it
                   contains the normalized intensities of the matched 
                   experimental peaks.

See also the code of the method tabSepSpectrum for a simple example of how this data structure can be used.

tabSepSpectrum($nColIntern)

This method returns a string containing a tab-separated tabular representation of the theoretical spectrum. Matched masses, if present, are ignored.

As it is certainly more appropriate to instantiate the object with modifLvl set to 1 (or 0) before calling this method, we also include in the output table a string giving the peptide modifications as obtained with modifLvl set to 2. Peptide mass is included as well.

The string computed by tabSepSpectrum is appropriate for loading in a spread sheet or is usable as an intermediary format for a custom output format. For the latter reason, we try to make it simple to parse and, in particular, we add a 'TERMINAL' tag at the beginning of the N-/C-terminal fragment masses and an 'INTERNAL' tag at the beginning the internal ones. Moreover, if match data are available, the matched theoretical masses are followed by the matched experimental masses and intensities in parentheses (should be easy to read and parse via elementary regular expressions).

The only parameter is:

$nColIntern

Number of groups of 3 columns in the second table for the internal fragments. Default is 2.

Example:

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
  print $msms->tabSepSpectrum();

latexSpectrum($nColIntern)

This method returns a simple latex table in a string containing a tabular representation of the tabular structure generated by tabSpectrum. This table should be fairly easy to edit afterwards to meet specific style requirements. Matched masses, if present, are ignored.

Internal fragments (only immonium ions for the time being) are output in a separated table since their number is different from the peptide length.

The only parameter is:

$nColIntern

Number of groups of 3 columns in the second table for the internal fragments. Default is 2.

Example:

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
  print $msms->latexSpectrum(3);

htmlTerm(%h)

This method returns a string containing the lines of an HTML table representing a tabular structure such as generated by tabSpectrum; only the N-/C-terminal fragments are considered, see the sister function htmlIntern for the internal fragments.

Since this method is susceptible to be used for generating HTML pages automatically, we give the user some flexibility to change the aspect of the output table (manual editing is not an option). Moreover, the <table> tag is not included in the returned string such that you can choose the table styles you want.

The named parameters are:

colLineFunc

A reference to a function aimed at changing the line colors in the table to make it more readable. This package export two functions for this purpose: chooseColorLineNum and chooseColorFrag (see their respective descriptions).

You can define your own function if you need another logic. Such a function has four parameters: fragment type for the current line, fragment type of the previous line, a reference to color 1 and another to color 2 to exchange them.

The default function is chooseColorFrag.

css

If css is defined then CSS are used instead of old fashioned in situ color and font specifications. See function htmlCSS.

lineCol1, lineCol2

The two colors used for the lines, default colors are '#DDFFFF' and '#EEEEEE'.

boldTitle

Peptide sequence in bold if set to any value.

bgTitle

Background color for the peptide sequence, default '#CCFFCC'.

boldFrag

Fragment names in bold if set to any value.

bgFrag

Background color for the fragment names, default '#FFFFBB'.

Example :

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
  print "<html><head></head><body><table border=0 cellspacing=5>\n";
  print "\n",$msms->htmlTerm(boldTitle=>1, bgFrag=>'#FFFFBB', bgTitle=>'#99CCFF',
                             colLineFunc=>\&chooseColorFrag);
  print "</table></html>\n";

htmlIntern(%h)

This method returns a string containing the lines of an HTML table representing a tabular structure such as generated by tabSpectrum; only internal fragments are considered, see the sister function htmlTerm for the N-/C-terminal fragments.

Since this method is susceptible to be used for generating HTML pages automatically, we give the user some flexibility to change the aspect of the output table (manual editing is not an option). Moreover, the <table> tag is not included in the returned string such that you can choose the table styles you want.

The named parameters are:

css

If css is defined then CSS are used instead of old fashioned in situ color and font specifications. See function htmlCSS.

bgIntern

The color used for the lines, default '#EEEEEE'.

boldTitle

Column titles in bold if set to any value.

bgTitle

Background color for the column titles, default '#CCFFCC'.

boldFrag

Fragment names in bold if set to any value.

bgFrag

Background color for the fragment names, default '#FFFFBB'.

nColIntern

Number of groups of 3 columns in the second table for the internal fragments. Default is 2.

Example:

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
  print "<table border=0 cellspacing=5>\n";
  print "\n",$msms->htmlIntern(boldTitle=>1);
  print "</table>\n";

plotSpectrumMatch(%h)

This method generates images to represent matches between theoretical and experimental spectra. Such images are intended to be used in user interface, typically web interfaces. To fit rather diverse requirements, a great number of parameters can be set to change colors and aspects of the plots.

The named parameters are:

fname

The file name of the generated image.

fhandle

An open file handle for writing the generated image. It has priority over parameter fname and the file handle will be set in binmode.

format

The graphic file format. If not specified, the function will return the image object for further processing (see GD documentation). The supported file formats are the ones of GD.

fontChoice

The size of the graphics is controlled via the choice of the font. The fontChoics parameter is a string 'class:size', where class selects the type of font and size its size.

The GD native fonts are selected by setting class equal to 'default'. The size the 'default' class must be one of 'Tiny', 'Small', 'MediumBold', 'Large', or 'Giant'. Default font is 'default:Large'.

Alternatively, it is possible give the name of a file containing the definition of a TrueType font for the class (absolute path) and size is the point size.

inCellBorder

Number of pixels between lines and text, default 1.

style

Two styles are supported for the match graphics: 'circle' and 'square'. Default is 'circle' except when modifLvl was 2 in tabSpectrum, where it is 'square'.

plotIntern

If this parameter is set to any value, and at least one internal fragment mass exists, the internal fragments are represented in the graphics.

nColIntern

Number of column to display internal fragments, default is 2.

colorScale

This parameter is used for defining a list of intensities thresholds and corresponding colors used when highlighting the table cells to indicate fragment matches. Thresholds must be in increasing order of intensities.

colorScale is a reference to a vector of values, each threshold is associated with 8 values in the following order:

threshold value
red intensity (cell color)
green intensity (cell color)
blue intensity (cell color)
legend text
red intensity (legend text color)
green intensity (legend text color)
blue intensity (legend text color)

These eight data are repeated for each threshold and the number of threshold is not limited. The threshold values must be adapted to intensity normalization (see function tabSpectrum).

By default, plotSpectrumMatch generates a color scale that adapts to the normalization and contains 5 bins: blue (less intense), red, orange, yellow, green (most intense).

legend

When this parameter is set to 'right', a legend is added at the right of the graphics. When it is set to 'bottom', a legend is added under the graphics.

The legend is made of the color scale and a count number of matched peaks versus number of experimental peaks in each intensity bin. This count informs on the quality of the match. It is important to note that it is not uncommon for an experimental peak to match several theoretical masses and therefore the count, which considers each mass once, may be slightly different from what is read from the graphics. The present two different point of views: theoretical and experimental masses point of views.

changeColModifAA

Except when tabSpectrum was called with modifLvl equal to 2, plotSpectrumMatch displays one character per amino acid only, i.e. the asterisk indicating the presence of a modification is suppressed. When changeColModifAA is set to any value, plotSpectrumMatch display the modified amino acids in another color. If not set, the modified amino acids are over-lined.

modifAAColor

A reference to a vector of three values (R, G, B) used to defined the color for modified amino acids, default blue.

bgColor

A reference to a vector of three values (R, G, B) used to defined the graphics background color, default white.

textColor

A reference to a vector of three values (R, G, B) used to defined the text color, default black.

lineColor

A reference to a vector of three values (R, G, B) used to defined the line color, default black.

Example:

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(spectrum=>\%spectrum, prec=>2, modifLvl=>1,
                               expSpectrum=>\@peaks, intSel=>'order', tol=>$tol, minTol=>$minTol);
  $msms->plotSpectrumMatch(fname=>$peptide, format=>'png', fontChoice=>'default:Large',
                           changeColModifAA=>1, legend=>'bottom');

FUNCTIONS

cmpFragTypes

This function can be used in a sort of fragment type names. Fragment type names are assumed to follow the rule:

internal fragments

They are named after their generic name, only immonium ions are supported so far and they are named 'immo'.

N-/C-terminal fragments

They must comply with the pattern

  ion&charge - loss1 -loss2 - ...

For instance, singly charged b ions are simply named 'b' and their doubly and triply counterparts are names 'b++' and 'b+++'. This is the ion&charge part of the pattern above.

The losses may occur once or several times, multiple losses are indicated in parentheses preceeded by multiplicity. Examples are:

  b-H2O
  b-3(H2O)
  b++-H2O-NH3
  b++-3(H2O)-NH3
  y-H2O-2(H3PO4)-NH3

The order on fragment type names is defined as follows: (1) immonium ions always come after N-/C-terminal fragments; (2) N-/C-terminal fragment types are compared by doing a sequence of comparisons which continues as long as the compared values are equal. The first comparison is on the ion type (a,b,y,...) followed by a comparison on the charge. If ion types and charges are equal, comparisons are made on the losses. The fragment that has less loss types is considered smaller. If the two fragment types have the same number of loss types then the losses are sorted lexicographically and the first ones are compared on their name, if the names are the same then the comparison is on the multiplicity, if the multiplicities are the same then the second losses are compared, etc.

Asterisks that are used for signaling multiple possible losses are ignored in the comparisons.

Since this function is defined in package MSMSOutput and it is used in other packages with function sort (and predefined variables $a and $b), we had to use prototypes ($$). Therefore it can no longer be exported by the package MSMSOutput and you have to call it via MSMSOutput::cmpFragTypes.

Example:

foreach (sort MSMSOutput::cmpFragTypes ('y','b','y++','a','b-NH3','b-2(NH3)','b++-10(NH3)','b-H2O-NH3','immo(Y)', 'b++','y-NH3*','y-H2O*','z')){ print $_,"\n"; }

annotatePept($pept, $modif, $modifLvl)

Returns a vector whose cells contain each amino acid of the peptide sequence annotated with their eventual modifi- cations.

This function is exported for allowing users to prepare peptide sequences for display purposes. The parameters are:

$pept

The peptide sequence.

$modif

The modification string or the modification vector.

$modifLvl

Controls how the modifications are highlighted in the returned vector.

If not set or set to 0, this parameter causes the modified amino acids not to be indicated. If set to 1, the modified amino acids are marked by an asterisk. If set to 2, the modified amino acids are followed by the name of the modification between curly brackets.

Example:

print join('', annotatePept('ACCTK', '::Cys_CAM:Cys_CAM:::', 2)), "\n";

normalizeIntensities($inSel, $expSpectrum, $normInt, [$massIndex, [$intensityIndex]])

Normalizes experimental peaks intensities. The parameters are:

$intSel

This parameter controls how the peak intensities are normalized. Default choice is 'order' for relative order; other possible choices are 'relative' for relative intensity, 'original' for no normalization, and 'log' for logarithmic transform.

$expSpectrum

The experimental spectrum.

$normInt

A reference to a hash that will contain the normalized intensities (keys are the original intensities).

massIndex

The mass index in the experimental peak vectors, default 0. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.

intensityIndex

The intensity index in the experimental peak vectors, default 1. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.

htmlCSS(%h)

This function returns a string that can be used for defining a CSS, which is then used by the tables created by functions htmlTerm and htmlIntern. To give you more flexibility, we do not include the <style> tags in the string such that you can add the styles returned by htmlCSS where you like.

Alternatively, you can choose not to use this function and define totally different styles!

The named parameters are:

lineCol1, lineCol2

The two colors used for the lines, default colors are '#DDFFFF' and '#EEEEEE'.

boldTitle

Peptide sequence in bold if set to any value.

bgTitle

Background color for the peptide sequence, default '#CCFFCC'.

boldFrag

Fragment names in bold if set to any value.

bgFrag

Background color for the fragment names, default '#FFFFBB'.

bgIntern

The color used for the lines in the internal fragments table, default '#EEEEEE'.

Example :

  my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
  print "<html>\n<head>\n<style type=\"text/css\">\n";
  print InSilicoSpectro::InSilico::MSMSOutput::htmlCSS(boldTitle=>1);
  print "</style>\n</head>\n<body><table border=0 cellspacing=5>\n";
  print "\n",$msms->htmlTerm(css=>1);
  print "</table><br><br><table border=0 cellspacing=5>\n";
  print "\n",$msms->htmlIntern(css=>1);
  print "</table></html>\n";

chooseColorLineNum

Function for HTML output that alternates the line colors for every line.

chooseColorFrag

Function for HTML output that changes the line color when the type of fragment changes; b-H2O and b-2(H2O) are considered the same type by this function.

plotLegendOnly(%h)

This function plots the color scale only and should be used if you don not want to display it for each match plot. Note that the legend generated by PlotSpectrumMatch contains extra information that is specific to the match, i.e. the count of matched peaks per intensity bin. This information is not reported if you decide to save space and only display the color scale once.

The named parameters are (see plotSpectrumMatch for detailed explanations):

fname

The file name of the generated image.

fhandle

An open file handle for writing the generated image. It has priority over parameter fname and the file handle will be set in binmode.

format

The graphic file format.

fontChoice

The size of the graphics is controlled via the choice of the font.

inCellBorder

Number of pixels between lines and text, default 1.

colorScale

This parameter is used for defining a list of intensities thresholds and corresponding colors used when highlighting the table cells to indicate fragment matches.

lineColor

A reference to a vector of three values (R, G, B) used to defined the line color, default black.

intSel

In case no user-defined color scale is provided, a default color scale is used instead. To properly adjust this scale to the intensity normalization method it is important to indicate via parameter intSel which is this normalization. Possible values are listed in function normalizeIntensities.

EXAMPLES

See programs starting with testMSMSOut in folder InSilicoSpectro/InSilico/test/.

AUTHORS

Jacques Colinge, Upper Austria University of Applied Science at Hagenberg