The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CracTools::Interval::Query - Store and query genomics intervals.

VERSION

version 1.251

SYNOPSIS

  my $interval_query = CracTools::Interval::Query->new();

  $interval_query->addInterval("chr1",1,12,1,"geneA");
  $interval_query->addInterval("chr2",5,14,1,"geneB");

  @results = @{$intervalQuery->fetchByRegion("chr1",12,15,1)};

  foreach my $gene (@results) {
    print STDERR "Found $gene overlapping gene\n";
  }

DESCRIPTION

This module stores and query genomic intervals associated with variables. It is based on the interval tree datastructure provided by Set::IntervalTree.

CracTools::Interval::Query query methods all returns a Array reference with all the scalar associated to the retrieved intervals. But it also return an ArrayRef with the intervals (start,end) themself, see "_processReturnValues" for more informations.

All CracTools::Interval::Query method can be used without the strand argument (or undef). In this case, we will only consider the forward strand.

This class can be easily overloaded with "_processReturnValue" hook method.

SEE ALSO

You may want to check CracTools::Interval::Query::File that is an implementation of CracTools::Interval::Query that directly retrieve intervals from standard files (BED,SAM,GTF,GFF) and returns the lines associated to the queried intervals.

METHODS

new

  Example     : my $intervalQuery = CracTools::Interval::Query->new();
  Description : Create a new CracTools::Interval::Query object
  ReturnType  : CracTools::Interval::Query
  Exceptions  : none

addInterval

  Arg [1] : String              - Chromosome
  Arg [2] : Integer             - Start position
  Arg [3] : Integer             - End position
  Arg [4] : (Optional) Integer  - Strand
  Arg [5] : Scalar              - The value to be hold by this interval. It can
                                  be anything, an Integer, a String, a hash 
                                  reference, an array reference, ...

  Example     : $interval_query->addInterval("chr1",12,30,-1,"geneA")
  Description : Add a new genomic interval, with an associated value to the interval_query.

fetchByRegion

  Arg [1] : String              - Chromosome
  Arg [2] : Integer             - Start position
  Arg [3] : Integer             - End position
  Arg [4] : (Optional) Integer  - Strand
  Arg [5] : (Optional) Boolean  - Windowed query, only return intervals which
                                  are completely contained in the queried region.

  Example     : my @values = $IntervalQuery->fetchByRegion('1',298345,309209,'+');
  Description : Retrieves intervals that belong to the region.
  ReturnType  : ArrayRef of scalar

fetchByLocation

  Arg [1] : String              - Chromosome
  Arg [2] : Integer             - Positon
  Arg [3] : (Optional) Integer  - Strand

  Example     : my @values = $intervalQuery->fetchByLocation('1',298345,'+');
  Description : Retrieves lines that overlapped the given location.
  ReturnType  : ArrayRef of Scalar

fetchNearestDown

  Arg [1] : String              - Chromosome
  Arg [2] : Integer             - Position
  Arg [3] : (Optional) Integer  - Strand

  Example     : my @values = $interval_query->fetchNearestDown('1',298345,'+');
  Description : Search for the closest interval in downstream that does not contain the query
                and returns the line associated to this interval. 
  ReturnType  : Scalar

fetchNearestUp

  Arg [1] : String             - Chromosome
  Arg [2] : Integer            - Position
  Arg [3] : (Optional) Integer - Strand

  Example     : my @values = $interval_query->fetchNearestDown('1',298345,'+');
  Description : Search for the closest interval in upstream that does not contain the query
                and returns the line associated to this interval. 
  ReturnType  : Scalar

fetchAllNearestDown

  Arg [1] : String             - Chromosome
  Arg [2] : Integer            - Position
  Arg [3] : (Optional) Integer - Strand

  Example     : my @values = $interval_query->fetchNearestDown('1',298345,'+');
  Description : Search for all the closest interval in downstream that does not contain the query
                and returns the line associated to this interval. 
  ReturnType  : ArrayRef of Scalar

fetchAllNearestUp

  Arg [1] : String             - Chromosome
  Arg [2] : Integer            - Position
  Arg [3] : (Optional) Integer - Strand

  Example     : my @values = $interval_query->fetchNearestDown('1',298345,'+');
  Description : Search for all the closest interval in upstream that does not contain the query
                and returns the line associated to this interval. 
  ReturnType  : ArrayRef of Scalar

PRIVATE METHODS

_getIntervalTree

  Arg [1] : String             - Chromosome
  Arg [2] : (Optional) Integer - Strand

  Description : Return the Set::IntervalTree reference for the chromosome and strand (Default : 1)
  ReturnType  : Set::IntervalTree

_addIntervalTree

  Arg [1] : String             - Chromosome
  Arg [2] : (Optional) Integer - Strand
  Arg [3] : Set::IntervalTree  - Interval tree

  Description : Add an Set::IntervalTree object for a specific ("chr","strand") pair.
                Strand is set to 1 if none (or undef) is provided

_getIntervalTreeKey

  Arg [1] : String             - Chromosome
  Arg [2] : (Optional) Integer - Strand

  Description : Static method that return and unique key for the ("chr","strand") pair passed in arguements.
                Strand is set to 1 if none (or undef) is provided
  ReturnType  : String

_processReturnValues

  Arg [1] : ArrayRef - Values returned by Set::IntervalTree

  Example     : # Either get only the values holded by the retrieved intervals
                my @values = @{$interval_query->_processReturnValues($interval_results)};
                # Or also get the intervals themselves
                my ($intervals,$values) = $interval_query->_processReturnValues($interval_results);
  Description : Call _processReturnValue() method on each values of the array ref passed in parameters.
  ReturnType  : Array(ArrayRef({start => .., end => ..}),ArrayRef(Scalar))
                (
                  [ { start => 12, end => 20 }, ... ],
                  [ "geneA", ...]
                )

_processReturnValue

  Arg [1] : Scalar - Value holded by an interval

  Description : This method process the values contains by each intervals that
                match a query before returning it.  It is designed to be
                overloaded by doughter classes.
  ReturnType  : Scalar (ArrayRef,HashRef,String,Integer...)

AUTHORS

  • Nicolas PHILIPPE <nphilippe.research@gmail.com>

  • Jérôme AUDOUX <jaudoux@cpan.org>

  • Sacha BEAUMEUNIER <sacha.beaumeunier@gmail.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2017 by IRMB/INSERM (Institute for Regenerative Medecine and Biotherapy / Institut National de la Santé et de la Recherche Médicale) and AxLR/SATT (Lanquedoc Roussilon / Societe d'Acceleration de Transfert de Technologie).

This is free software, licensed under:

  The GNU Affero General Public License, Version 3, November 2007