CracTools::Utils - A set of useful functions
version 1.251
# Reverse complementing a sequence my $seq = reverseComplemente("ATGC"); # Reading a FASTQ file my $it = seqFileIterator('file.fastq','fastq'); while(my $entry = $it->()) { print "Sequence name : $entry->{name} Sequence : $entry->{seq} Sequence quality: $entry->{qual}","\n"; } # Reading paired-end files easier my $it = pairedEndSeqFileIterator($reads1,$reads2,$format); while (my $entry = $it->()) { print "Read_1 : $entry->{read1}->{seq} Read_2 : $entry->{read2}->{seq}"; } # Parsing a GFF file my $it = gffFileIterator($file); while (my $annot = $it->()) { print "chr : $annot->{chr} start : $annot->{start} end : $annot->{end}"; }
Bio::Lite is a set of subroutines that aims to answer similar questions as Bio-perl distribution in a FAST and SIMPLE way.
Bio::Lite does not make use of complexe data struture, or objects, that would lead to a slow execution.
All methods can be imported with a single "use Bio::Lite".
Bio::Lite is a lightweight-single-module with NO DEPENDENCIES.
Reverse complemente the (nucleotid) sequence in arguement.
Example:
my $seq_revcomp = reverseComplement($seq);
reverseComplement is more than 100x faster than Bio-Perl revcom_as_string()
Arg [1] : String - a string with values separated with coma. Example : $reverse = reverse_tab('2,1,1,1,0,0,1'); Description : Reverse the values of the string in argument. For example : reverse_tab('1,2,0,1') returns : '1,0,2,1'. ReturnType : String Exceptions : none
Return true is version number v1 is greater than v2
Convert strand from '+/-' standard to '1/-1' standard and the opposite.
say "Forward a: ",convertStrand('+'); say "Forward b: ",convertStrand(1); say "Reverse a: ",convertStrand('-'); say "Reverss b: ",convertStrand(-1);
will print
Forward a: 1 Forward b: + Reverse a: -1 Reverse b: -
Remove the "chr" prefix from a given string
say "reference name: ",removeChrPrefix("chr1");
reference name: 1
Add the "chr" prefix to the given string
Encode a (0-based) list of increasing position to a string using Base64 encoding scheme : ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
my $encoded_list = CracTools::Utils::encodePosListToBase64(1,3,5,8,12,32); my @decoded_list = CracTools::Utils::decodePosListInBase64($encoded_list);
Decode position list encoded by encodePosListToBase64.
This are some tools that aim to read (bio) files like
Open Fasta, or Fastq files (can be gziped). seqFileIterator has an automatic file extension detection but you can force it using a second parameter with the format : 'fasta' or 'fastq'.
my $it = seqFileIterator('file.fastq','fastq'); while(my $entry = $it->()) { print "Sequence name : $entry->{name} Sequence : $entry->{seq} Sequence quality: $entry->{qual}","\n"; }
Return: HashRef
{ name => 'sequence_identifier', seq => 'sequence_value', qual => 'sequence_quality', # only defined for FASTQ files }
seqFileIterator is more than 50x faster than Bio-Perl Bio::SeqIO for FASTQ files seqFileIterator is 4x faster than Bio-Perl Bio::SeqIO for FASTA files
Open Paired-End Sequence files using seqFileIterator()
Paird-End files are generated by Next Generation Sequencing technologies (like Illumina) where two reads are sequenced from the same DNA fragment and saved in separated files.
my $it = pairedEndSeqFileIterator($reads1,$reads2,$format); while (my $entry = $it->()) { print "Read_1 : $entry->{read1}->{seq} Read_2 : $entry->{read2}->{seq}"; }
{ read1 => 'see seqFileIterator() return', read2 => 'see seqFileIterator() return' }
pairedEndSeqFileIterator has no equivalent in Bio-Perl
CracTools::Utils::writeSeq($filehandle,$format,$seq_name,$seq,$seq_qual)
Write the sequence in the output stream with the specified format.
manage BED files format
my $it = bedFileIterator($file); while (my $annot = $it->()) { print "chr : $annot->{chr} start : $annot->{start} end : $annot->{end}"; }
Return a hashref with the annotation parsed:
{ chr => 'field_1', start => 'field_2', end => 'field_3', name => 'field_4', score => 'field_5', strand => 'field_6', thick_start => 'field_7', thick_end => 'field_8', rgb => 'field_9' blocks => [ {'size' => 'block size', 'start' => 'block start', 'end' => 'block start + block_size', 'ref_start' => 'block start on the reference', 'ref_end' => 'block end on the reference'}, ... ], seek_pos => 'Seek position of this line in the file', }
manage GFF3 and GTF2 file format
my $it = gffFileIterator($file,'type'); while (my $annot = $it->()) { print "chr : $annot->{chr} start : $annot->{start} end : $annot->{end}"; }
{ chr => 'field_1', source => 'field_2', feature => 'field_3', start => 'field_4', end => 'field_5', score => 'field_6', strand => 'field_7', frame => 'field_8' attributes => { 'attribute_id' => 'attribute_value', ...}, seek_pos => 'Seek position of this line in the file', }
gffFileIterator is 5x faster than Bio-Perl Bio::Tools::GFF
manage VCF file format
{ chr => $chr, pos => $pos, id => $id, ref => $ref, alt => [ alt1, alt2, ...], qual => $qual, filter => $filter, info => { AS => value, DP => value, ... , };
Return a hashref with the chimera parsed:
{ sample => $sample, chim_key => $chim_key, name => $name, chr1 => $chr1, pos1 => $pos1, strand1 => $strand1, chr2 => $chr2, pos2 => $pos2, strand2 => $strand2, chim_value => $chim_value, spanning_junction => $spanning_junction, spanning_PE => $spanning_PE, class => $class, comments => { coment_id => 'comment_value', ... }, extended_fields => { extended_field_id => 'extended_field_value', ... }, }
BE AWARE this method is only availble if samtools binary is availble.
samtools
Return an iterator over a BAM file using a samtools view pipe.
samtools view
A region can be passed in parameter to restrict the results. In this case the BAM file must be indexed
my $fh = bamFileIterator("file.bam","17:43,971,748-44,105,700"); while(my $line = <$fh>) { my $parsed_line = CracTools::SAMReader::SAMline->new($line); // do some stuff }
SEE ALSO CracTools::SAMReader::SAMline if you need to parse SAMlines easily
Return a sequence from a given region in a fasta indexed file
my $fasta_seq = getSeqFromIndexedRef("file.fa","chr2",29012,10); my $seq = getSeqFromIndexedRef("file.fa","chr2",29012,10,'raw');
Given a CIGAR chain (see SAM specification), return a parsed version as an Array ref of cigar elements represented as { nb => 10, op => 'M' }.
Generic method to parse files.
Return a file handle for the file in argument. Display errors if file cannot be oppenned and manage gzipped files (based on .gz file extension)
my $fh = getReadingFileHandle('file.txt.gz'); while(<$fh>) { print $_; } close $fh;
my $fh = getWritingFileHandle('file.txt.gz'); print $fh "Hello world\n"; close $fh;
getLineFromSeekPos($filehandle,$seek_pos);
return a chomped line at a seeking position.
Nicolas PHILIPPE <nphilippe.research@gmail.com>
Jérôme AUDOUX <jaudoux@cpan.org>
Sacha BEAUMEUNIER <sacha.beaumeunier@gmail.com>
This software is Copyright (c) 2017 by IRMB/INSERM (Institute for Regenerative Medecine and Biotherapy / Institut National de la Santé et de la Recherche Médicale) and AxLR/SATT (Lanquedoc Roussilon / Societe d'Acceleration de Transfert de Technologie).
This is free software, licensed under:
The GNU Affero General Public License, Version 3, November 2007
To install CracTools, copy and paste the appropriate command in to your terminal.
cpanm
cpanm CracTools
CPAN shell
perl -MCPAN -e shell install CracTools
For more information on module installation, please visit the detailed CPAN module installation guide.