BioX::Wrapper::Gemini - A simple wrapper around the python Gemini library for annotating VCF files.
gemini_wrapper.pl --indir /path/to/vcfs --outdir /location/we/can/write/to > commands.in
For more involved usage please see BioX::Wrapper::Gemini::Example
BioX::Wrapper::Gemini is written using Moose and can be extended in all the usual fashions.
use BioX::Wrapper::Gemini; after 'db_load' => sub { my $self = shift; # Run some commands # SCIENCE! }
A wrapper around Gemini for processing files.
Read more about Gemini here: http://gemini.readthedocs.org/en/latest/
The workflow described is taken straight from the documentation written by the author of Gemini.
For more customization please see the attributes sections of the docs
Moose Attributes
VCF files can be given individually as well.
#Option is an ArrayRef and can be given as either --vcfs 1.vcf,2.vcf,3.vcfs #or --vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf
Don't mix the methods
If these vcfs are uncompressed, they will be compressed in place. Please make sure either this location has read/write access, or create a symbolic link to someplace Everytime you leave genomics data uncompressed a kitten dies!
Vcfs that are uncompressed
Supply a path to a reference genome
Default is to assume there is an environmental variable $REFGENOME
Base directory of snpeff
The default assumes there is an environmental variable of $SNPEFF, being the base directory of the snpeff installation.
Options to run snpeff with
Default is -c \$SNPEFF/snpEff.config -formatEff -classic GRCh37.75
If all vcf files are being loaded into the gemini db with the same pedigree file, simply change the --db_load_opts to correspond to your file.
If each vcf file has its own pedigree, make sure the pedigree file matches the basename of the vcf.
Basenames are captured like so:
my @gzipbase = map { basename($_, ".vcf.gz") } @gzipped ; my @notgzipbase = map { basename($_, ".vcf") } @notgzipped ;
With the extension being .vcf.gz/.vcf
Invoke this with --ped
Exact specifications should be found here:
http://gemini.readthedocs.org/en/latest/content/preprocessing.html#describing-samples-with-a-ped-file
If using the --ped option you must specify this if your pedigree files are not in the same directory as the --indir option
Options for loading VCF file into gemini sqlite db
Default is -t snpEff
This used to be --skip_cadd -t snpeff, but by popular demand is now just -t snpEff
Subroutines
Check to make sure either an indir or vcfs are supplied
Use File::Find::Rule to find the vcfs
Make sure they are all gzipped first. If there are any .vcf$ files without a corresponding .vcf.gz$, bgzip those
Run bgzip command on files found in find_vcfs
normalize vcfs using vt and annotate using SNPEFF
Load DB into gemini
Subroutine that starts everything off
Jillian Rowe <jillian.e.rowe@gmail.com>
This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.
Copyright 2015- Weill Cornell Medical College in Qatar
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
To install BioX::Wrapper::Gemini, copy and paste the appropriate command in to your terminal.
cpanm
cpanm BioX::Wrapper::Gemini
CPAN shell
perl -MCPAN -e shell install BioX::Wrapper::Gemini
For more information on module installation, please visit the detailed CPAN module installation guide.