The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

GVF::Parser - A parser for Genome Variation Format files.

VERSION

Version 0.01

DESCRIPTION

Takes a given GVF file and creates a DBIx::Class sqlite3 database. In addition to having the ability to retrive sections of pragma and feature data directly via methods provided.

This module is not intended to provide a method to do complex analysis with GVF data, but to provide a manner to parse desired data out or in cordonation with a user pipeline.

GVF::Parser partitions GVF files into pragma and feature data, and the feature data is further split into features and attributes. Pragma data is stored in object, and can be requested using the provided methods. Attribute information is stored/saved in a sqlite datafile, and can be accessed using the attributeRequest method, or more via DBIx::Class.

This parser looks at feature line data in the following way:

                                 features                                                                attributes
|--------------------------------------------------------------------------------------||---------------------------------------|
chr16   samtools        SNV     49291141        49291141        .       +       .        ID=ID_1;Variant_seq=A,G;Reference_seq=G;

featureRequest calls will access the first eight elements of a feature line, and attributeRequest calls encompass the eighth column.

SYNOPSIS

        use GVF::Parser;

        # Add unsupported attributes to the database. Currently five extra tags are allowed

        # Example:
        my $unsupported = {
            add_attribute1 => 'hgmd_disease',
            add_attribute2 => 'hgmd_location',
        };

        my $obj = GVF::Parser->new(
            file           => $gvf,          # required
            file_modifier  => $unsupported,  # pass the unsupported tags to GVF::Parser
        );

        # pragmas are stored in the object
        # features are use to build sqlite database

        $obj->pragmas;
        $obj->features;

        #---------------------------------------------------------
                
        # Example one
        # DBIx::Class approach.
        
        # connection to db via DBIx::Class object
        my $dbix       = $obj->get_dbixclass;
        
        # use DBIx::Class as standard from this point.
        my $features   = $dbix->resultset('Features');
        my $attributes = $dbix->resultset('Attributes');
        
        # create a hash of all the feature items wanted
        # using feature table primary key
        my %feats;
        while (my $f = $features->next){
            $feats{ $f->id } = {
                type  => $f->type,
                start => $f->start,
                end   => $f->end,
            };
        }

        # use attribure resultset to access desired parts of file
        # using attributes foreign_key to maintain relationship with features
        while (my $a = $attributes->next ){
            if ( $feats{ $a->features_id } ){
                
                my $varInfo = $obj->tidyVariantEffect( $a->varianteffect);
        
                foreach my $i ( @{$varInfo} ){
                    if ( $i->{sequence_variant} eq 'frameshift_variant' ) {
                        print $i->{'feature_type'}, "\t";
                        print $i->{'feature_id'}, "\t";
                        print $feats{ $a->features_id }->{'start'}, "\t";
                        print $feats{ $a->features_id }->{'type'}, "\t";
                        print $a->referenceseq, "\t";
                        print $a->variantseq, "\n";
                    }
                }
            }
        }

        #------------------------------------------------------------------------------
        # Example two.
        # accessing data in parts
        
        # Example of using request methods.
        my $feats   = $obj->featureRequest('seqid');
        my $atts    = $obj->attributeRequest('Variant_effect');
        my @regions = $obj->sequenceRegions;
        
        # pragma can be requested with list or individually.
        my @wantList  = qw/ multi-individual population data-source /;
        my $foundList = $obj->pragmaRequest(\@wantList);
        my $foundMore = $obj->pragmaRequest('data-source');
        my $foundprag = $obj->getAllPragmas;
        my @foundIndv = $obj->pragmaRequest('data-source');
        
        #------------------------------------------------------------------------------

SUBROUTINES/METHODS

pragmas

    Title    : pragmas
    Usage    : $obj->pragmas
    Function : Build pragma data into the object.
    Returns  : None.

 Pragma data is stored in object and requested via pragmaRequest, getAllPragmas.

features

    Title    : features
    Usage    : $obj->features
    Function : Builds a SQLite3 database of feature values.
    Returns  : None

 This will populate a sqlite3 database creating a features and attributes table, parts of which can be accessed via featureRequest or attributeRequest.

getAllPragmas

    Title    : getAllPragmas
    Usage    : $obj->getAllPragmas
    Function : Retrieves a hash of all pragmas and values in a given file.
    Returns  : hash or (reference) of "pragma => value".

 Simple pragmas values are returned as simple key values pair e.g. gvf_version => '1.06', and structured pragma are returned as hash of hash e.g. data_source => { 'Type' => 'SNV' }

pragmaRequest

    Title    : pragmaRequest
    Usage    : $wanted = $obj->pragmaRequest($request) or
               $wanted = $obj->pragmaRequest($arrayref)
    Function : Capture requested pragma term
    Returns  : Array or (reference) of requested pragma term in its original form.  Structured pragmas are not further broken down.

 This method allow you to request only a specfic pragma term, or a list of terms passed as an array reference.  All are returned in original form.
 

sequenceRegions

    Title    : sequenceRegions
    Usage    : $regions = $obj->sequenceRegions
    Function : Capture all sequence regions from a GVF file.
    Returns  : Arrayref of all sequence regions.

featureRequest

    Title    : featureRequest
    Usage    : $features = $obj->featureRequest('seqid');
    Function : Caputre requested feature types.
    Returns  : Returns array or (reference) of requested feature.

attributeRequest

    Title    : attributeRequest
    Usage    : $attributes = $obj->attributeRequest('reference_seq');
    Function : Caputre requested attribute type.
    Returns  : Returns array or (reference) of requested attribute.
  

tidyVariantEffect

    Title    : tidyVariantEffect
    Usage    : $effect = $obj->tidyVariantEffect( "variant_effect line" ); 
    Function : Will take individual Variant_effect from feature line and return
               line as hashref of each space delimited field.
    Returns  : array of hashref or (arrayref of hashref) of variant effects. 
    Args     : Individual Variant_effect line.

 This method is only designed to work with an individual Variant_effect from a feature line.  It use is not needed when using attributeRequest as that method preform the tidying for you.

 Example of method return:
 
    {
        feature_id         "CM990001",
        feature_type       "mRNA",
        index              0,
        sequence_variant   "coding_sequence_variant"
    },

 The hashref's keys will always be as the example shows.  Please see "Example one DBIx::Class approach for individual usage example.

get_dbixclass

    Title    : get_dbixclass
    Usage    : $obj->get_dbixclass
    Function : Handle used to connect to DBIx::Class
    Returns  : DBIx::Class object

 When assigning resultset the sqlite3 column names for features are the first eight columns of a feature line, and attribute columns are the allow GVF column names, lowercased with no underscores, e.g. referencecodon.
 Also allow are the five "added_attribute1" which can be added at object construction, and feature_id which is the foreign key to the feature table.

AUTHOR

Please contact me with any issue, method ideas/improvments or documentation matters.

Shawn Rynearson, <shawn.rynerson at gmail.com>

BUGS

Please report any bugs or feature requests to bug-gvf-parser at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=GVF-Parser. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

    perldoc GVF::Parser

You can also look for information at:

ACKNOWLEDGEMENTS

This module would not be complete with out acknowledging all the help I've had from the SO community, special thanks to Barry Moore for ideas and guidance.

LICENSE AND COPYRIGHT

Copyright 2012 Shawn Rynearson.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.