The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::FASTASequence - Parsing sequence informations in FASTA format.

VERSION

version 0.07

SYNOPSIS

  use Bio::FASTASequence;
  my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
  ~;
  my $seq = Bio::FASTASequence->new($fasta);

DESCRIPTION

This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:

  • accession number

  • description

  • sequence itself

  • length of sequence

  • crc64 checksum (as it is used by SWISS-PROT)

  • seq2xml

METHODS

new

getAccessionNr

        my $accession = $seq->getAccessionNr();

returns the AccessionNr of the FASTA-Sequence

getDescription

        my $description = $seq->getDescription();

returns the description standing in the first line of the FASTA-format (without the accession number)

getSequence

        my $sequence = $seq->getSequence();

returns the sequence

getCrc64

        my $crc64_checksum = $seq->getCrc64();

returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT

addDBRef

        $seq->addDBRef(DB, REFERENCE_AC);

DB is the name of the referenced database

REFERENCE_AC is the accession number in the referenced database

seq2file

        $seq->seq2file(FILENAME);

FILENAME is the path of the file where the sequence has to be stored.

allIndexesOf

        my $indexes = $seq->allIndexesOf(EXPR);

returns a reference on an array, which contains all indexes of EXPR in the sequence

getSequenceLength

        my $length = $seq->getSequenceLength();

returns the length of the sequence

getDBRefs

        my $hashref = $seq->getDBRefs();

returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},

getFASTA

        my $fasta_sequence = $seq->getFASTA();

returns the sequence in FASTA-format

EXAMPLE

        use Bio::FASTASequence;
        my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
        QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
        YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
        ~;

        my $seq = Bio::FASTASequence->new($fasta);

        print 'The sequence of '.$seq->getAccessionNr().' is '.$seq->getSequence(),"\n";
        print 'This sequence contains '.scalar($seq->allIndexesOf('C').' times Cystein at the following positions:';
        print $_+1.', ' for(@{$seq->allIndexesOf('C')});

ABSTRACT

  Bio::FASTASequence is a perl module to parse information out off a Fasta-Sequence.

ADDITIONAL INFORMATION

accepted formats

This module can parse the following formats:

>P02656 APC3_HUMAN Apolipoprotein C-III precursor (Apo-CIII).
>IPI:IPI00166553|REFSEQ_XP:XP_290586|ENSEMBL:ENSP00000331094|TREMBL:Q8N3H0 T Hypothetical protein
>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).

structure

The structure of the hash for the example is:

        $VAR1 = {
                 'seq_length' => 120,
                 'accession_nr' => 'P01815',
                 'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
                 'crc64' => '158A8B29AE7EEB98',
                 'dbrefs' => {},
                 'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
               }

if you miss something please contact me.

BUGS

There is no bug known. If you experienced any problems, please contact me.

SEE ALSO

http://modules.renee-baecker.de # not available yet - this site is under construction

the crc64-routine is based on the SWISS::CRC64 module.

MODIFICATIONS

More FASTA-Description lines are accepted.

AUTHOR

Renee Baecker <reneeb@cpan.org>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2014 by Renee Baecker.

This is free software, licensed under:

  The Artistic License 2.0 (GPL Compatible)