The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::Draw::FeatureStack - BioPerl module to generate GD images of stacked gene models

SYNOPSIS

  use Bio::DB::SeqFeature::Store;
  use Bio::Draw::FeatureStack;
 
  # load GFF3-compliant features from GFF file 
  # features could be obtained from/with any other source/methods as well...
  #---
  my @features;
  my $store = Bio::DB::SeqFeature::Store->new
  (
    -adaptor => 'memory',
    -dsn => 'my_gff_file.gff3' 
  );                            
  push(@features, $store->features(-name => 'gene1', -aliases => 1));
  push(@features, $store->features(-name => 'gene2', -aliases => 1));

  # create FeatureStack, passing features as array-ref
  #---
  my $feature_stack = new Bio::Draw::FeatureStack
  (
    -features => \@features,    # array-ref of features to be rendered
    -glyph => 'gene',           # features will be rendered using this BioPerl glyph
    -flip_minus => 1,           # flip features on reverse strand (default is on)
    -ignore_utr => 1,           # do not show UTRs (default is off)
    -panel_params => {          # Bio::Graphics::Panel parameters
      -width => 1024,          
      -pad_left => 80,
      -pad_right => 20,
      -grid => 1
    },
    -glyph_params => {          # glyph-specific parameters (Bio::Graphics::Glyph::gene in this case)
      -utr_color   => 'white',
      -label_position => 'left',
      -label_transcripts => 1,
      -description => 1
    }
  );

  # output SVG, including HTML image map
  #---
  (my $svg, $map) = $feature_stack->svg(-image_map => 1);
        
  # output PNG
  #---
  my $png = $feature_stack->png;
        

DESCRIPTION

FeatureStack creates GD images of vertically stacked gene models to facilitate visual comparison of gene structures. Compared genes can be clusters of orthologous genes, gene family members, or any other genes of interest. FeatureStack takes an array of BioPerl feature objects as input, projects them onto a common coordinate space, flips features from the negative strand (optional), left-aligns them by start coordinates (optional), sets a fixed intron size (optional), removes unwanted transcripts (optional), and then draws the so transformed features with a user-specified glyph. Internally, this transformation is achieved by cloning all input features into Bio::Graphics::Feature objects before the features get rendered by the specified glyph. Output images can be generated in SVG (scalable vectorized image) or PNG (rastered image) format.

FeatureStack was designed with the goal to retain maximum control of the rendering process. As such, the user can not only control how FeatureStack behaves using the FeatureStack parameters described below, but also can provide both panel- and glyph-specific parameters to fine-control all aspects of the rendered image.

Albeit FeatureStack can be used in combination with any glyph, it is particularly useful when used in combination with the Bio::Graphics::Glyph::decorated_gene glyph. This glyph is currently not distributed with BioPerl, but should install together with FeatureStack. Bio::Graphics::Glyph::decorated_gene can also be used and obtained independent from FeatureStack via CPAN. The decorated_gene glyph allows to highlight protein motifs such as signal peptides, transmembrane domains, or protein domains on top of gene models, which greatly faclitates the comparison of gene structures. Please refer to the documentation of Bio::Graphics::Glyph::decorated_gene for more details. If protein decorations are associated with gene features in the input data, FeatureStack can also automatically align gene models by a user-defined decoration type, such that for example gene models are aligned by a particularly well conserved protein motif.

FeatureStack requires GFF3-complient features. That is, features provided to FeatureStack need to have either a two-tier 'mRNA'->'CDS' or three-tier 'gene'->'mRNA'->'CDS' level structure. Here is an example gene structure in GFF3 format compatible with FeatureStack:

   MAL10  test  gene  1596486  1597604  .  +  .  ID=PF10_0392;Name=PF10_0392
   MAL10  test  mRNA  1596486  1597604  .  +  .  ID=rna_PF10_0392-1;Name=PF10_0392-1;Parent=PF10_0392
   MAL10  test  CDS   1596486  1596554  .  +  .  ID=cds_PF10_0392-1;Parent=rna_PF10_0392-1
   MAL10  test  CDS   1596747  1597604  .  +  .  ID=cds_PF10_0392-2;Parent=rna_PF10_0392-1

FeatureStack can display multiple transcripts (isoforms) per gene if the specified glyph supports this as well (for example the 'gene' or the 'decorated_gene' glyph).

In addition to drawing a set of gene models on top of each other, FeatureStack can intermingle gene models with alternative tracks that display additional features associated with these genes. This can be used for example to display regulatory elements or sequence variants (SNPs, indels) alongside gene model. There is currently no limitation of how these alternative features are displayed, and any BioPerl glyph can be used for this purpose. In the input data, alternative features must be specified one level below the gene or transcript feature that is passed to FeatureStack. Here is an example GFF that shows how a regulatory motif (associated with the gene) and a SNP (associated with a transcript) can be specified:

   CHR_I  test  gene      5100769  5101677  .  +  .  ID=Gene:Y110A7A.20;Name=ift-20
   CHR_I  test  promoter  5100709  5100722  .  +  .  ID=Promoter:Y110A7A.20;Note=GTCTCTATAGCAAC;Parent=Gene:Y110A7A.20
   CHR_I  test  mRNA      5100769  5101677  .  +  .  ID=Transcript:Y110A7A.20;Parent=Gene:Y110A7A.20
   CHR_I  test  SNP       5100888  5100888  .  +  .  ID=SNP123456;Parent=Transcript:Y110A7A.20;Note=C>T
   CHR_I  test  CDS       5100769  5101423  .  +  .  ID=CDS:Y110A7A.20:1;Parent=Transcript:Y110A7A.20
   CHR_I  test  CDS       5101468  5101677  .  +  .  ID=CDS:Y110A7A.20:2;Parent=Transcript:Y110A7A.20

OPTIONS

  Option          Description                                              Default
  ------          -----------                                              -------

 -features                                                                 none
  
                  Array reference (mandatory). BioPerl features to be 
                  displayed. Currently, features can be either of type 
                  'mRNA' or 'gene'. 
                  
  -glyph                                                                   'generic'

                  String (optional). Name of glyph to be used to render 
                  features. The glyph specified here should be suitable 
                  for rendering the provided features (e.g., use 
                  'processed_transcript' glyph for features of type 'mRNA' 
                  and 'gene' glyph for features of type 'gene'). The 
                  'decorated_gene' or 'decorated_transcript' glyph 
                  can also be used for highlighting protein features on 
                  top of gene models (see description above). 
                  
                  If no glyph is specified, the 'generic' glyph will 
                  be used.
                  
  -glyph_params                                                            none

                  Hash reference (optional). Glyph-specific parameters. 
                  Will be passed unmodified to the glyph. Parameters 
                  can include callback functions for fine-grained control 
                  of the rendering process. Please refer to the
                  documentation of the glyph for a description of which
                  glyph parameters are available. 

  -panel_params                                                            none

                  Hash reference (optional). Panel parameters. Will be 
                  passed unmodified to the L<Bio::Graphics::Panel> instance 
                  that is internally created by FeatureStack.  

                  Typical parameters here include -width, -pad_left, 
                  -pad_right, or -grid (see L<Bio::Graphics::Panel> for
                  more information).

  -ignore_utr                                                              false
  
                  Boolean (optional). If true, gene models will be drawn
                  without untranslated regions (UTRs).
                  
  -flip_minus                                                              true
  
                  Boolean (optional). By default, features on the negative
                  (reverse) strand are drawn flipped, such that the 
                  5' end of features is always on the left side. This 
                  behaviour can be turned off by setting this parameter to
                  0 (false).

  -intron_size                                                             undef
  
                  Integer (optional). Intron size in base-pairs. If specified, 
                  introns of gene models will be transformed to have 
                  this specified size. This is useful when comparing gene 
                  models of vastly different sizes due to very large
                  introns (for example, when comparing protist genes with human 
                  genes). By default, gene models are drawn to scale with
                  original intron sizes. This parameter does not affect
                  the length of exons, which are always drawn to scale.
                  
  -feature_offsets                                                         undef
  
                  Hash reference or string (optional). This parameter allows 
                  you to control the horizontal alignment of features. By
                  default, all features are left-aligned by their start
                  coordinate.  
                  
                  If a hash reference is specified here, it is assumed that
                  keys correspond to feature IDs and values to offsets in bp. 
                  This way the alignment of individual features can be 
                  manually fine-controlled. 
                  
                  If 'start_codon' is specified, features will be aligned
                  by their smallest CDS coordinate, assuming that this
                  will be the translation start site.
                  
                  Any other value here will be interpreted as the name of
                  a protein decoration. In this case, FeatureStack will
                  attempt to use L<Bio::Graphics::Glyph::decorated_transcript>
                  to map this protein decoration to nucleotide space and 
                  will then left-align the feature by this mapped 
                  coordinate. This way, features can for example be 
                  automatically aligned by their most conserved protein 
                  domain. If no protein decoration with this name is found
                  for a feature, then this feature will not be aligned.
                  Please refer to the documentation of the 
                  decorated_transcript glyph to see how protein decorations
                  can be specified for transcripts.

  -transcripts_to_skip                                                     none

                  Array reference (optional). Contains transcript IDs not to
                  be included in the output image. This parameter can be used
                  if a gene feature passed to FeatureStack has multiple 
                  isoforms but only a subset of these isoforms should appear
                  in the output.
  
  -alt_feature_type                                                        none

                  String (optional). Type and source of alternative features 
                  (e.g., 'SNP:mpileup') to be outputted alongside gene models. 
                  FeatureStack looks for features of this type/source one
                  level below the specified gene/transcript feature. If found, 
                  alternative features are drawn in a separate track above 
                  the gene track. The appearance of alternative features 
                  can be controlled using the -alt_glyph and -alt_glyph_params 
                  parameters.
                  
                  FeatureStack will automatically compute the distance of
                  alternative features (in bp) to the associated main features's 
                  start coordinate and adds this distance as a feature tag
                  (tag name 'start_dist'). This tag can later be read 
                  by the glyph that displays alternative features. 
                  This can e.g. be useful for labeling regulatory features 
                  with their distance from the transcription start site 
                  (UTRs visible) or from the translation start site 
                  (UTRs ignored).
                  
  -alt_glyph                                                               none
   
                  String (optional). Name of glyph to be used to draw 
                  alternative features specified with -alt_feature_type.

  -alt_glyph_params                                                        none

                  Hash reference (optional). Glyph-specific parameters for 
                  glyph specified with -alt_glyph. Parameters will be passed 
                  unmodified to the glyph. Parameters can include callback 
                  functions for fine-grained control of the rendering process. 

  -ruler                                                                   true

                  Boolean (optional). If true, a ruler indicating distances
                  in base-pairs will be drawn on top of the image. The ruler
                  will automatically adjust to feature offsets; that is,
                  the origin of the ruler will be placed at the
                  point where features are align, showing negative 
                  coordinates left of this point and positive coordinates 
                  right of this point. 

  -span                                                                    [auto]

                  Integer (optional). Span of the output image in bp. By 
                  default, the span is the length of the longest feature. 
                  If one wants to generate an image that shows only the 
                  5' portion of features (for example to visualize only 
                  the first exon of genes and their associated promoters), 
                  one can set a smaller, fixed value here, effectively 
                  clipping the right part of the image at this coordinate.

  -separator                                                               false

                  Boolean (optional). If true, draw horizontal line between
                  gene models. This might be useful if alternative tracks
                  are visible to know which alternative track belongs to
                  which gene model track. 
                  

EXPORT

None by default.

BUGS

Please report all errors.

SEE ALSO

Bio::Graphics::Panel, Bio::Graphics::Glyph, Bio::Graphics::Glyph::gene, Bio::Graphics::Glyph::processed_transcript, Bio::Graphics::Glyph::decorated_gene, Bio::Graphics::Glyph::decorated_transcript, Bio::DB::SeqFeature::Store

It is recommended to study test cases shipped with this module to get additional information of how to use this module.

AUTHOR

Christian Frech <frech.christian@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2012 by Christian Frech

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.