The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

SYNOPSIS

    buildGFF3FromEnsembl.pl [-h|--f] [--output <output_file>] [--est] <genome> 
    The mandatory argument is a genome which is indexed in Ensembl GB. 
    For example:
             'Homo Sapiens' for Human,
             'Pan troglodytes' for Chimpanzee,
             'Mus musculus' for Mouse,
             'Macaca mulatta' for Macaque,
             'Pongo pygmaeus' for Orangutan,
              etc (cf http://www.ensembl.org/info/about/species.html)
    --output: put the filename to write the gff3 output (STDOUT by default)  
    --est: build GFF3 from Ensembl API with OtherFeatures DB (Core DB by default)

OPTIONS

    -h, --help, --fullhelp
    --output=I<output_file>
    --est
    
    make a GFF3 file on <output_file>
      column 1: <seqname> 
                The name of the sequence. Commonly, this is the chromosome ID or
                contig ID. Note that the coordinates used must be unique within
                each sequence name in all GTFs for an annotation set.

      column 2: <source>
                The source column should be a unique label indicating where the 
                annotations came from Ensembl.
      column 3: <feature>
                exon, cds, five, three, gene or mRNA
      column 4: <start exon>
                Start coordinates of the feature relative to the beginning of the 
                sequence named in <seqname>. 
      column 5: <end exon>
                End coordinates of the feature relative to the beginning of the 
                sequence named in <seqname>. 
      column 6: <score>
                .
      column 7: <strand>
                strand of the exon relative to the genome, ie - or +
      column 8: <frame>
                .
      column 9: a list of binome <key "value"> separated by a semicolon ";". 
                A GFF file has the same three mandatory attributes at the end 
                of the record (Note that other attributes are optional):
                  -ID=value                      A globally unique identifier for the feature.
                  -Parent=value1,...,valueN      A list of identifier(s) for the parent(s) of the feature.
                  -Name=value                    The HGNC name of the gene 
               
                This script define the following attributes:
               
                  -transcripts_nb=value          The number of transcripts contained in the gene
                  -exons_nb=value                The number of exons contained in the transcript/gene
                  -exon_rank=value               The rank of the exon contained in the gene
                  -type "prefix:value"           The nature of the mRNA where the "prefix" 
                                                 represents a first class level (protein_coding, 
                                                 small_ncRNA, lincRNA, other_lncRNA, other_noncodingRNA)
                                                 and "value" is the biotype defined by Ensembl. 
 

REQUIRES

    Perl5.
    Bio::EnsEMBL
    Getopt::Long
    Pod::Usage
    

AUTHOR

    Nicolas PHILIPPE <nicolas.philippe@inserm.fr>