The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

BioX::Wrapper::Gemini - A simple wrapper around the python Gemini library for annotating VCF files.

SYNOPSIS

Basic Usage

  gemini_wrapper.pl --indir /path/to/vcfs --outdir /location/we/can/write/to > commands.in

Customized workflow

For more involved usage please see BioX::Wrapper::Gemini::Example

Using the API

BioX::Wrapper::Gemini is written using Moose and can be extended in all the usual fashions.

  use BioX::Wrapper::Gemini;

  after 'db_load' =>
  sub {
  my $self = shift;
    # Run some commands
    # SCIENCE!
  }

Description

A wrapper around Gemini for processing files.

Read more about Gemini here: http://gemini.readthedocs.org/en/latest/

The workflow described is taken straight from the documentation written by the author of Gemini.

For more customization please see the attributes sections of the docs

Attributes

Moose Attributes

vcfs

VCF files can be given individually as well.

    #Option is an ArrayRef and can be given as either

    --vcfs 1.vcf,2.vcf,3.vcfs

    #or

    --vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf

Don't mix the methods

    If these vcfs are uncompressed, they will be compressed in place. Please make sure either this location has read/write access, or create a symbolic link to someplace

    Everytime you leave genomics data uncompressed a kitten dies!

uncomvcfs

Vcfs that are uncompressed

ref

Supply a path to a reference genome

Default is to assume there is an environmental variable $REFGENOME

snpeff

Base directory of snpeff

The default assumes there is an environmental variable of $SNPEFF, being the base directory of the snpeff installation.

snpeff_opt

Options to run snpeff with

Default is -c \$SNPEFF/snpEff.config -formatEff -classic GRCh37.75

ped

If all vcf files are being loaded into the gemini db with the same pedigree file, simply change the --db_load_opts to correspond to your file.

If each vcf file has its own pedigree, make sure the pedigree file matches the basename of the vcf.

Basenames are captured like so:

    my @gzipbase = map {  basename($_, ".vcf.gz") }  @gzipped ;
    my @notgzipbase = map {  basename($_, ".vcf") }  @notgzipped ;

With the extension being .vcf.gz/.vcf

Invoke this with --ped

Exact specifications should be found here:

http://gemini.readthedocs.org/en/latest/content/preprocessing.html#describing-samples-with-a-ped-file

ped_dir

If using the --ped option you must specify this if your pedigree files are not in the same directory as the --indir option

db_load_opts

Options for loading VCF file into gemini sqlite db

Default is -t snpEff

This used to be --skip_cadd -t snpeff, but by popular demand is now just -t snpEff

Subroutines

Subroutines

check_files

Check to make sure either an indir or vcfs are supplied

find_vcfs

Use File::Find::Rule to find the vcfs

Make sure they are all gzipped first. If there are any .vcf$ files without a corresponding .vcf.gz$, bgzip those

bgzip

Run bgzip command on files found in find_vcfs

norml

normalize vcfs using vt and annotate using SNPEFF

db_load

Load DB into gemini

run

Subroutine that starts everything off

AUTHOR

Jillian Rowe <jillian.e.rowe@gmail.com>

ACKNOWLEDGEMENTS

This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.

COPYRIGHT

Copyright 2015- Weill Cornell Medical College in Qatar

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO