The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Bio::BPWrapper::SeqManipulations - Functions for bioseq

SYNOPSIS

    use Bio::BPWrapper::SeqManipulations;
    # Set options hash ...
    initialize(\%opts);
    write_out(\%opts);

SUBROUTINES

initialize()

Sets up most of the actions to be performed on an alignment.

Call this right after setting up an options hash.

Sets package variables: $in, $in_format, $filename, $out_format, and $out.

write_out()

Writes out the sequence file.

Call this after calling #initialize(\%opts) and processing those options.

retrieve_seqs()

Retrieves a sequence from GenBank using the provided accession number. A wrapper for Bio::DB::GenBank>#get_Seq_by_acc.

remove_gaps()

Remove gaps

Print all sequence lengths. Wraps Bio::Seq->length.

Print all sequence lengths. Wraps Bio::Seq->length.

make_revcom()

Reverse complement. Wraps Bio::Seq->revcom().

Select substring (of the 1st sequence). Wraps Bio::Seq->subseq().

reading_frame_ops

Translate in 1, 3, or 6 frames based on the value of $opts set via #initilize(\%opts). Wraps Bio::Seq->translate(), Bio::SeqUtils->translate_3frames(), and Bio::SeqUtils->translate_6frames().

restrict_digest()

Predicted fragments from digestion by a specified restriction enzyme specified in $opts{restrinct} set via #initilize(\%opts).

An input file with a single sequence is expected. Wraps Bio::Restriction::Analysis->cut().

anonymize()

Replace sequence IDs with serial IDs n characters long, as specified in $opts{'anonymize'} set via #initilize(\%opts). For example if $opts{'anonymize'}, the first ID will be S0001. leading 'S' The length of the serial idea

A sed script file is produced with a .sed suffix that may be used with sed's '-f' argument. If the filename is '-', the sed file is named STDOUT.sed instead. A message containing the sed filename is written to STDERR.

shred_seq()

Break into individual sequences writing a FASTA file for each sequence.

count_codons()

Count codons for coding sequences (e.g., a genome file consisting of CDS sequences). Wraps Bio::Tools::SeqStats->count_codons().

print gene sequences in FASTA from a GenBank file of bacterial genome. Won't work for a eukaryote genbank file.

count_leading_gaps()

Count and print the number of leading gaps in each sequence.

hydroB()

Return the mean Kyte-Doolittle hydropathicity for protein sequences. Wraps Bio::Tools::SeqStats->hydropathicity().

linearize()

Linearize FASTA, print one sequence per line.

reloop_at()

Re-circularize a bacterial genome by starting at a specified position given in the $opts{"reloop" set via #initilize(\%opts).

For example for sequence "ABCDE". bioseq -R'2' .. would generate"'BCDEA".

remove_stop()

Remove stop codons.

EXTENDING THIS MODULE

We encourage BioPerl developers to add command-line interface to their BioPerl methods here.

Here is how to extend. We'll use option --count-codons as an example.

  • Create a new method like one of the above. For example, see count_codons.

  • Document your method in pod using =head2. For example:

        =head2 count_codons()
    
        Count codons for coding sequences (e.g., a genome file consisting of
        CDS sequences). Wraps
        L<Bio::Tools::SeqStats-E<gt>count_codons()|https://metacpan.org/pod/Bio::Tools::SeqStats#count_codons>.
    
        =cut

    See count_codons() for how this gets rendered.

  • Add the method to @EXPORT list in SeqManipulations.pm.

  • Add option to %opt_displatch which maps the option used in bioaln to the subroutine that gets called here. For example:

        "avpid" => \&print_avp_id,
  • Add option in to bioseq script. See the code that starts:

        GetOptions(
        ...
        "count-codons|C",
        ...

    This option has a short option name C and takes no additional argument

  • Write a test for the option. See the file t/10test-bioseq.t and Testing.

  • Share back. Create a pull request to the github repository and contact Weigang Qiu, City University of New York, Hunter College (mailto:weigang@genectr.hunter.cuny.edu)

SEE ALSO

CONTRIBUTORS

  • Yözen Hernández yzhernand at gmail dot com

  • Girish Ramrattan <gramratt at gmail dot com>

  • Levy Vargas <levy dot vargas at gmail dot com>

  • Weigang Qiu (Maintainer)

  • Rocky Bernstein