The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

BioUtil::Seq - Utilities for sequence

Some great modules like BioPerl provide many robust solutions. However, it is not easy to install for someone in some platforms. And for some simple task scripts, a lite module may be a good choice. So I reinvented some wheels and added some useful utilities into this module, hoping it would be helpful.

VERSION

Version 2014.1213

EXPORT

    FastaReader
    read_sequence_from_fasta_file 
    write_sequence_to_fasta_file 
    format_seq

    validate_sequence 
    complement
    revcom 
    base_content 
    degenerate_seq_to_regexp
    degenerate_seq_match_sites
    dna2peptide 
    codon2aa 
    generate_random_seqence

    shuffle_sequences 
    rename_fasta_header 
    clean_fasta_header 

SYNOPSIS

  use BioUtil::Seq;

SUBROUTINES/METHODS

FastaReader

FastaReader is a fasta file parser using closure. FastaReader returns an anonymous subroutine, when called, it return a fasta record which is reference of an array containing fasta header and sequence.

FastaReader could also read from STDIN when the file name is "STDIN".

A boolean argument is optional. If set as "true", "return" ("\r") and "new line" ("\n") symbols in sequence will not be trimed.

Example:

   # do not trim the spaces and \n
   # $not_trim = 1;
   # my $next_seq = FastaReader("test.fa", $not_trim);
   
   # read from STDIN
   # my $next_seq = FastaReader('STDIN');
   
   # read from file
   my $next_seq = FastaReader("test.fa");

   while ( my $fa = &$next_seq() ) {
       my ( $header, $seq ) = @$fa;

       print ">$header\n$seq\n";
   }

read_sequence_from_fasta_file

Read all sequences from fasta file.

Example:

    my $seqs = read_sequence_from_fasta_file($file);
    for my $header (keys %$seqs) {
        my $seq = $$seqs{$header};
        print ">$header\n$seq\n";
    }

write_sequence_to_fasta_file

Example:

    my $seq = {"seq1" => "acgagaggag"};
    write_sequence_to_fasta_file($seq, "seq.fa");

format_seq

Format sequence to readable text

Example:

    my $seq = {"seq1" => "acgagaggag"};
    write_sequence_to_fasta_file($seq, "seq.fa");

validate_sequence

Validate a sequence.

Legale symbols:

    DNA: ACGTRYSWKMBDHV
    RNA: ACGURYSWKMBDHV
    Protein: ACDEFGHIKLMNPQRSTVWY
    gap and space: - *.

Example:

    if (validate_sequence($seq)) {
        # do some thing
    }

complement

Complement sequence

IUPAC nucleotide code: ACGTURYSWKMBDHVN

http://droog.gs.washington.edu/parc/images/iupac.html

    code    base    Complement
    A   A   T
    C   C   G
    G   G   C
    T/U T   A

    R   A/G Y
    Y   C/T R
    S   C/G S
    W   A/T W
    K   G/T M
    M   A/C K

    B   C/G/T   V
    D   A/G/T   H
    H   A/C/T   D
    V   A/C/G   B

    X/N A/C/G/T X
    .   not A/C/G/T
     or-    gap

my $comp = complement($seq);

revcom

Reverse complement sequence

my $recom = revcom($seq);

base_content

Example:

    my $gc_cotent = base_content('gc', $seq);

degenerate_seq_to_regexp

Translate degenerate sequence to regular expression

degenerate_seq_match_sites

Find all sites matching degenerat subseq

dna2peptide

Translate DNA sequence into a peptide

codon2aa

Translate a DNA 3-character codon to an amino acid

generate_random_seqence

Example:

    my @alphabet = qw/a c g t/;
    my $seq = generate_random_seqence( \@alphabet, 50 );

shuffle sequences

Example:

    shuffle_sequences($file, "$file.shuf.fa");

rename_fasta_header

Rename fasta header with regexp.

Example:

    # delete some symbols
    my $n = rename_fasta_header('[^a-z\d\s\-\_\(\)\[\]\|]', '', $file, "$file.rename.fa");
    print "$n records renamed\n";

clean_fasta_header

Rename given symbols to repalcement string. Because, some symbols in fasta header will cause unexpected result.

Example:

    my  $file = "test.fa";
    my $n = clean_fasta_header($file, "$file.rename.fa");
    # replace any symbol in (\/:*?"<>|) with '', i.e. deleting.
    # my $n = clean_fasta_header($file, "$file.rename.fa", '',  '\/:*?"<>|');
    print "$n records renamed\n";