gadfly_to_gff.pl - Massage Gadfly's GFF format into a form suitable for Bio::DB::GFF
perl gadfly_to_gff.pl /path/to/gadfly/release/files > gadfly.gff
This script massages the Flybase/Gadfly GFF files located at ftp://ftp.fruitfly.org/pub/genomic/gadfly/ into the "correct" version of the GFF format for use with Bio::DB::GFF. This lets you view the Drosophila annotations with the generic genome browser (http://www.gmod.org).
To use this script, get the Gadfly GFF distribution archive which is organized by GenBank accession unit (e.g. "RELEASE2GFF.tar.gz"). Unpacking it will yield a directory named after the release, e.g. RELEASE2.
Give that directory as the argument to this script, and capture the script's output to a file:
% gadfly_to_gff.pl ./RELEASE2 > gadfly.gff
The gadfly.gff file can then be loaded into a Bio::DB::GFF database using the following command:
% bulk_load_gff.pl -d <databasename> gadfly.gff
The resulting database will have the following feature types (represented as "method:source"):
Component:arm A chromosome arm Component:scaffold A chromosome scaffold (accession #) Component:gap A gap in the assembly clone:clonelocator A BAC clone gene:gadfly A gene accession number transcript:gadfly A transcript accession number translation:gadfly A translation codon:gadfly Significance unknown exon:gadfly An exon symbol:gadfly A classical gene symbol similarity:blastn A BLASTN hit similarity:blastx A BLASTX hit similarity:sim4 EST->genome using SIM4 similarity:groupest EST->genome using GROUPEST similarity:repeatmasker A repeat
Lincoln Stein <lstein@cshl.org>
To install Bio::Seq, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::Seq
CPAN shell
perl -MCPAN -e shell install Bio::Seq
For more information on module installation, please visit the detailed CPAN module installation guide.