Bio::FASTASequence - Parsing sequence informations in FASTA format.
version 0.07
use Bio::FASTASequence; my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human). QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS ~; my $seq = Bio::FASTASequence->new($fasta);
This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:
accession number
description
sequence itself
length of sequence
crc64 checksum (as it is used by SWISS-PROT)
seq2xml
my $accession = $seq->getAccessionNr();
returns the AccessionNr of the FASTA-Sequence
my $description = $seq->getDescription();
returns the description standing in the first line of the FASTA-format (without the accession number)
my $sequence = $seq->getSequence();
returns the sequence
my $crc64_checksum = $seq->getCrc64();
returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT
$seq->addDBRef(DB, REFERENCE_AC);
DB is the name of the referenced database
REFERENCE_AC is the accession number in the referenced database
$seq->seq2file(FILENAME);
FILENAME is the path of the file where the sequence has to be stored.
my $indexes = $seq->allIndexesOf(EXPR);
returns a reference on an array, which contains all indexes of EXPR in the sequence
my $length = $seq->getSequenceLength();
returns the length of the sequence
my $hashref = $seq->getDBRefs();
returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},
my $fasta_sequence = $seq->getFASTA();
returns the sequence in FASTA-format
use Bio::FASTASequence; my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human). QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS ~; my $seq = Bio::FASTASequence->new($fasta); print 'The sequence of '.$seq->getAccessionNr().' is '.$seq->getSequence(),"\n"; print 'This sequence contains '.scalar($seq->allIndexesOf('C').' times Cystein at the following positions:'; print $_+1.', ' for(@{$seq->allIndexesOf('C')});
Bio::FASTASequence is a perl module to parse information out off a Fasta-Sequence.
This module can parse the following formats:
The structure of the hash for the example is:
$VAR1 = { 'seq_length' => 120, 'accession_nr' => 'P01815', 'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS', 'crc64' => '158A8B29AE7EEB98', 'dbrefs' => {}, 'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).' }
if you miss something please contact me.
There is no bug known. If you experienced any problems, please contact me.
http://modules.renee-baecker.de # not available yet - this site is under construction
the crc64-routine is based on the SWISS::CRC64 module.
More FASTA-Description lines are accepted.
Renee Baecker <reneeb@cpan.org>
This software is Copyright (c) 2014 by Renee Baecker.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
To install Bio::FASTASequence, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::FASTASequence
CPAN shell
perl -MCPAN -e shell install Bio::FASTASequence
For more information on module installation, please visit the detailed CPAN module installation guide.