Ewan Birney


gadfly_to_gff.pl - Massage Gadfly's GFF format into a form suitable for Bio::DB::GFF


   perl gadfly_to_gff.pl /path/to/gadfly/release/files > gadfly.gff


This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format for use with Bio::DB::GFF. This lets you view the Drosophila annotations with the generic genome browser (http://www.gmod.org).

To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.

Give that directory as the argument to this script, and capture the script's output to a file:

  % gadfly_to_gff.pl ./RELEASE2 > gadfly.gff

The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:

  % bulk_load_gff.pl -d <databasename> gadfly.gff

The resulting database will have the following feature types (represented as "method:source"):

  Component:arm              A chromosome arm
  Component:scaffold         A chromosome scaffold (accession #)
  Component:gap              A gap in the assembly
  clone:clonelocator         A BAC clone
  gene:gadfly                A gene accession number
  transcript:gadfly          A transcript accession number
  translation:gadfly         A translation
  codon:gadfly               Significance unknown
  exon:gadfly                An exon
  symbol:gadfly              A classical gene symbol
  similarity:blastn          A BLASTN hit
  similarity:blastx          A BLASTX hit
  similarity:sim4            EST->genome using SIM4
  similarity:groupest        EST->genome using GROUPEST
  similarity:repeatmasker    A repeat


Lincoln Stein <lstein@cshl.org>