gff3_to_ucsc_table.pl
A script to convert a GFF3 file to a UCSC style refFlat table
gff3_to_ucsc_table.pl [--options...] <filename>
Options: --in <filename> [gff3 gtf] --out <filename> --alias --gz --verbose --version --help
The command line flags and descriptions:
Specify the input GFF3 or GTF file. The file may be compressed with gzip.
Specify the output filename. By default it uses the input file base name appended with '.refFlat'.
Specify that any additional aliases, including the primary_ID, should be appended to the gene name. They are concatenated using the pipe "|" symbol.
Specify whether (or not) the output file should be compressed with gzip. The default is to mimic the status of the input file
Specify that extra information be printed as the GFF3 file is parsed.
Print the version number.
Display this POD documentation.
This program will convert a GFF3 annotation file to a UCSC-style gene table, using the refFlat format. This includes transcription and translation start and stops, as well as exon start and stops, but does not include coding exon frames. See the documentation at http://genome.ucsc.edu/goldenPath/gbdDescriptionsOld.html#RefFlat for more information.
The program assumes the input GFF3 file includes standard parent->child relationships using primary IDs and primary tags, including gene, mRNA, exon, CDS, and UTRs. Non-standard genes, including non-coding RNAs, will also be processed too. Chromosomes, contigs, and embedded sequence are ignored. Non-pertinent features are safely ignored but reported. Most pragmas are ignored, except for close feature pragmas (###), which will aid in processing very large files. Multiple parentage and shared features, for example exons common to multiple alternative transcripts, are properly handled. See the documentation for the GFF3 file format at http://www.sequenceontology.org/resources/gff3.html for more information.
Previous versions of this script attempted to export in the UCSC genePredExt table format, often with inaccurate results. Users who need this format should investigate the gff3ToGenePred program available at http://hgdownload.cse.ucsc.edu/admin/exe/.
gff3ToGenePred
Timothy J. Parnell, PhD Howard Hughes Medical Institute Dept of Oncological Sciences Huntsman Cancer Institute University of Utah Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.
To install Bio::ToolBox, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::ToolBox
CPAN shell
perl -MCPAN -e shell install Bio::ToolBox
For more information on module installation, please visit the detailed CPAN module installation guide.