The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

CracTools::GenomeMask - A bit vector mask over the whole genome

VERSION

version 1.251

SYNOPSIS

  my $genome_mask = CracTools::GenomeMask->new( genome => { "chr1" => 100000, "chr2" => 20000 } );

  $genome_mask->setRegion("chr1",200,250);

  $genome_mask->getNbBitsSetInRegion("chr1",190,220);

DESCRIPTION

This module defines a BitVector mask over a whole genome and provide method to query this mask. It can read genome sequence and length from various sources (SAM headers, CRAC index, User input).

SEE ALSO

You can look at CracTools::BitVector that is the underlying datastructure of CracTools::GenomeMask.

TODO

The GenomeMask should be able to handle double strand DNA (as an option)

METHODS

new

There is mutiple ways to create a genome mask:

One can specify a argument called genome that is a hashref where keys are chromosome names and values are chromosomes length.

  my $genome_mask = CracTools::GenomeMask->new( genome => { seq_name => length,
                                                            seq_name => length,
                                                            ...} );
One can specify a argument called C<crac_index_conf> that the configuration file of a CRAC index

  my $genome_mask = CracTools::GenomeMask->new(crac_index_conf => file.conf);

One can specify a CracTools::SAMReader object in order to read chromosomes names and lenght from the header

  my $genome_mask = CracTools::GenomeMask->new(sam_reader => CracTools::SAMReader->new(file.sam));

getBitvector

  Arg [1] : String - Chromosome

  Description : Return the CracTools::BitVector associated with the reference name given in argument.
                If no bitvectors exists for this reference, a warning will be reported.
  ReturnType  : CracTools::BitVector

getChrLength

  Arg [1] : String - Chromosome

  Description : Return the length of the chromosome
  ReturnType  : Integer

setPos

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position

  Description : Set the bit a this genome location

setRegion

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position start
  Arg [3] : Integer - Position end

  Example     ; $genome_mask->setRegion($chr,$start,$end)
  Description : Set all bits to 1 for this region

getPos

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position

  Description : Return true is the bit is set at this genomic location
  ReturnType  : Boolean

getPosSetInRegion

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position start
  Arg [3] : Integer - Position end

  Example     : my @nb_pos_set = @{$genome_mask->getNbBitsSetInRegion($chr,$start,$end)};
  Description : Return all the posititions of the bits set in this genomic
                region
  ReturnType  : Array(Integer)

getNbBitsSetInRegion

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position start
  Arg [3] : Integer - Position end

  Description : Return the number of bits set in this genomic region
  ReturnType  : Integer

rank

  Arg [1] : String - Chromosome
  Arg [2] : Integer - Position

  Description : Return the number of bits set, up to this genomic
                position as if the genome was linear.
  ReturnType  : Integer

select

  Arg [1] : Integer - Nth bit set

  my ($chr,$pos) = $genome_mask->select(12)
  Description : Return an array with the (chr,pos) of the Nth bit set
  ReturnType  : Array(String,Integer)

AUTHORS

  • Nicolas PHILIPPE <nphilippe.research@gmail.com>

  • Jérôme AUDOUX <jaudoux@cpan.org>

  • Sacha BEAUMEUNIER <sacha.beaumeunier@gmail.com>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2017 by IRMB/INSERM (Institute for Regenerative Medecine and Biotherapy / Institut National de la Santé et de la Recherche Médicale) and AxLR/SATT (Lanquedoc Roussilon / Societe d'Acceleration de Transfert de Technologie).

This is free software, licensed under:

  The GNU Affero General Public License, Version 3, November 2007