Ewan Birney

NAME

Bioperl - Coordinated OOP-Perl Modules for Biology

SYNOPSIS

Read on...

DESCRIPTION

Bioperl contains a number of Perl objects which are useful in biology. Examples include Sequence objects, Alignment objects and database searching objects. These objects not only do what they are advertised to do in the documentation, but they also interact - Alignment objects are made from the Sequence objects, Sequence objects have access to Annotation and SeqFeature objects and databases, Blast objects can be converted to Alignment objects, and so on. This means that the objects provide a coordinated and extensible framework to do computational biology.

Bioperl development is focused on the Perl modules or objects themselves. There are scripts provided in the scripts/ and examples/ directories, but scripts are not the focus of the Bioperl developers. Of course, as the objects do most of the hard work for you, all you have to do is combine a number of objects together sensibly to make useful scripts.

The intent of the Bioperl development effort is to make reusable tools that aid people in creating their own sites or job-specific applications.

The bioperl.org website at http://bioperl.org also attempts to maintain links and archives of standalone bio-related Perl tools that are not affiliated or related to the core Bioperl effort. Check the site for useful code ideas and contribute your own if possible.

DOCUMENTATION

We have a cookbook tutorial in bptutorial.pl which has embedded documentation. Start there if learning-by-example suits you most, or examine the Bioperl online course at http://www.pasteur.fr/recherche/unites/sis/formation/bioperl. Make sure to check the documentation in the modules as well - there are almost 200 modules in Bioperl, and counting, and there's detail in the modules' documentation that will not appear in the general documentation.

INSTALLATION

The Bioperl modules are distributed as a tar file that expands into a standard perl CPAN distribution. Detailed installation directions can be found in the distribution INSTALL file.

The Bioperl modules can now interact with local flat file and relational databases. To learn how to set this up, look at the biodatabases.pod documentation ('perldoc biodatabases.pod' should work once Bioperl has been installed).

The bioperl-db, bioperl-gui, corba-server, and corba-client packages are installed separately from Bioperl. Please refer to their respective documentation for more information.

GETTING STARTED

A good place to start is by reading and running the cookbook script, bptutorial.pl.

The distribution scripts/ directory has fully working, industrial strength scripts for use with Bioperl. These are documented, and the command 'perldoc scriptname' will work. This area only started in the 0.05 distribution, and so not that many scripts have been written - you are more than welcome to contribute!

The example scripts in the distribution examples/ directory and sub directories therein give you an idea of how to use some of the modules and driver code.

If you have installed Bioperl in the standard way, as detailed in the README in the distribution, these examples should work by just running them. If you have not installed it in a standard way you will have to change the 'use lib' to point to your installation (see INSTALL for details).

Examples/ Directory

There are many scripts included in the distribution. Here are brief descriptions of the scripts in the examples/ directory:

examples/aligntutorial.pl - examples using EMBOSS, pSW, Clustalw, TCoffee, and Blast to align sequences

examples/biblio.pl - a script that shows how to query bibliographic databases, such as Medline, using ids, keywords, and other fields. See Bio::Biblio for details

examples/biblio_soap.pl - connect to and test a SOAP server using a Bio::Biblio object

examples/blast/*pl - a set of scripts showing how to use Blast.pm. Please see Bio::Tools::Blast for more information

examples/change_gene.pl - a script showing how to use LiveSeq::Mutator and LiveSeq::Mutation. Please see Bio::LiveSeq::Mutator and Bio::LiveSeq::Mutation for more information

examples/clustalw.pl - a demonstration of the various uses of Alignment::Clustalw. See Bio::Tools::Run::Alignment::Clustalw for more

examples/getGenBank.pl - retrieving Genbank entries over the Web using DB::GenBank. See Bio::DB::GenBank for more information

examples/gsequence - create a Protein Sequence Control Panel GUI with Gtk

examples/hitdisplay.pl - create a GUI for displaying Blast results using Tk::HitDisplay. Please see Bio::Tk::HitDisplay for more information

examples/psw.pl - example code for using the XS extensions for a protein Smith-Waterman comparison

examples/remote_blast.pl - this script executes remote Blast using RemoteBlast. See Bio::Tools::Run::RemoteBlast for more information

examples/restriction.pl - example code for using the RestrictionEnzyme module. See Bio::Tools::RestrictionEnzyme for more information

examples/rev_and_trans.pl - examples using Bio::Seq.pm for reversing and translating sequences. See Bio::Seq for more information

examples/root_object/* - example code for using Object.pm. Please see Bio::Root::Object for more information

examples/run_genscan.pl - run GENSCAN on multiple sequences and create output sequence files using Tools::Genscan. Please see Bio::Tools::Genscan for more information

examples/searchio/* - a number of scripts illustrating the use of Bio::SearchIO for parsing Blast and PSI-Blast results. See Bio::SearchIO for more information.

examples/seq/* - example code for working with multiple sequence files, including formatting and filtering based on length or description or ID

examples/seq_pattern.pl - a script that shows how to use sequences as regular expressions using Tools::SeqPattern. Please see Bio::Tools::SeqPattern for more information

examples/simplealign.pl - a script that demonstrates some uses of AlignIO. Please see Bio::AlignIO for more information

examples/standaloneblast.pl - a demonstration of some of the uses of StandAloneBlast.pm. See Bio::Tools::StandAloneBlast for details

examples/state-machine.pl - a demonstration of how to create a state machine using StateMachine::AbstractStateMachine. Please see Bio::Tools::StateMachine::AbstractStateMachine for more information

examples/structure/struct_example* - scripts that show how to examine details of the 3D structure of a protein by parsing a PDB file. See Bio::Structure::IO for more information.

examples/test-genscan.pl - script for testing and demonstrating Genscan.pm

examples/exceptions/test*pl - scripts that demonstrate how to throw and catch Error.pm objects.

examples/root_object/vector/vector.pl - script to test Bio::Root::Vector.pm

examples/root_object/* - scripts that demonstrate uses of Bio::Root modules.

examples/use_registry.pl - script that shows how to use Bio::DB::Registry, part of Bioperl's integration with OBDA, the Open Bio Database Access registry scheme. See Bio::DB::Registry for more information.

Scripts/ Directory

Here are brief descriptions of the scripts in the scripts/ directory:

scripts/align_on_codons.pl - aligns nucleotide sequences based on codons in a specified reading frame

scripts/Bio-DB-GFF/* - scripts that reformat sequence to GFF and load GFF format files into an indexed database - see Bio::DB::GFF for more information

scripts/bioperl.pl - a Bioperl shell!

scripts/blast_fetch_local.pl - parse a Blast results file for ids and extract pertinent sequences from a local, indexed database using Tools::BPlite and Index::Fasta. See Bio::Tools::BPlite and Bio::Index::Fasta for more information

scripts/blast_fetch.pl - parse a Blast result and fetch sequences from Genbank or Genpept over the network using Tools::Blast and Bio::DB*. See Bio::Tools::Blast, Bio::DB::GenBank, and Bio::Tools::GenPept

scripts/bpfetch.pl - fetch sequences from local indexed database or over the network and reformat using Bio::Index* and Bio::DB*

scripts/bpindex.pl - indexes local databases, partners with bpfetch.pl

scripts/contributed/revcom_dir.pl - return reverse complement sequences of all sequences in the current directory and save them in the same directory, using the same names with extension changed from "seq" to "rev"

scripts/contributed/expression_analyis/* - a set of scripts for analysis of expression data : discriminative gene selection, leave-out-one cross validation, relevance network of gene expression

scripts/das/das_server - sets up a minimal DAS annotation server, requires Apache::DBI and Bio::DB::GFF. See Bio::DB::GFF for details

scripts/DB/dbfetch - creates a Web page to query a local SRS server and fetch sequences

scripts/est_tissue_query.pl - fetch EST sequences from local files or Genbank filtered by tissue using Bio::DB* or Bio::Index*

scripts/DB/flanks.pl - fetch a sequence, find the sequences flanking a variant or SNP in the sequence given its position

scripts/gb_to_gff.pl - extracts top-level sequence features from Genbank- formatted sequence files using Tools::GFF. See Bio::Tools::GFF

scripts/generate_random_seq.pl - writes random RNA, DNA, or protein sequence of given length

scripts/get_seqs.pl - fetches and formats sequences from GenBank, EMBL, or SwissProt over the network using Bio::DB*

scripts/gff2ps.pl - takes an input file in GFF format and draws its genes and features as Postscript using Tools::GFF. See Bio::Tools::GFF

scripts/rfetch.pl - a script that uses Bio::DB::Registry to retrieve sequences from EMBL, reformat them, and print them. See Bio::DB::Registry

scripts/make_mrna_protein.pl - translate a cDNA or ORF to protein using Bio::Seq's translate() method

scripts/make_primers.pl - design PCR primers given a sequence and the positions of the start and stop codons in the sequence's ORF

scripts/prosite2perl.pl - convert Prosite motifs to Perl regular expressions

scripts/render_sequence.pl - this scripts fetchs a sequence from a remote database, extracts its features (CDS, gene, tRNA), and creates a graphic representation of the sequence in PNG or GIF format. See Bio::DB::BioFetch and Bio::Graphics

scripts/seqstats/aacomp.pl - calculate amino acid composition of a protein using Tools::CodonTable and Tools::IUPAC. See Bio::Tools::IUPAC and Bio::Tools::CodonTable for more information

scripts/seqstats/chaos_plot.pl - produce a PNG or JPEG chaos plot given a DNA sequence using GD.pm

scripts/seqstats/gccalc.pl - calculate %GC given a DNA sequence using Tools::SeqStats. See Bio::Tools::SeqStats for more information

scripts/seqstats/oligo_count.pl - calculates oligomer frequencies given an oligomer length and a sequence

scripts/structure/nmrpdb_parse.pl - extracts individual conformers from an NMR-derived PDB file

scripts/subsequence.cgi - CGI script to fetch a sequence from Genbank and extract a subsequence using DB::GenBank. See Bio::DB::GenBank

scripts/tree/paup2phylip.pl - convert a PAUP tree block to Phylip format

GETTING INVOLVED

Bioperl is a completely open community of developers. We are not funded and we don't have a mission statement. We encourage collaborative code, in particular in Perl. You can help us in many different ways, from just a simple statement about how you have used Bioperl to doing something interesting to contributing a whole new object hierarchy. See http://bioperl.org for more information. Here are some ways of helping us:

Asking questions and telling us you used it

We are very interested to hear how you experienced using Bioperl. Did it install cleanly? Did you understand the documentation? Could you get the objects to do what you wanted them to do? If Bioperl was useless we want to know why, and if it was great - that too. Post a message to bioperl-l@bioperl.org, the Bioperl mailing list, where all the developers are.

Only by getting people's feedback do we know whether we are providing anything useful.

Writing a script that uses it

By writing a good script that uses Bioperl you both show that Bioperl is useful and probably save someone elsewhere writing it. If you contribute it to the 'script central' at http://bioperl.org then other people can view and use it. Don't be nervous if you've never done this sort of work, advice is freely given and all are welcome!

Find bugs!

We know that there are bugs in there. If you find something which you are pretty sure is a problem, post a note to bioperl-bugs@bioperl.org and we will get on it as soon as possible. You can also access the bug system through the web pages.

Suggest new functionality

You can suggest areas where the objects are not ideally written and could be done better. The best way is to find the main developer of the module (each module was written principally by one person, except for Seq.pm). Talk to him or her and suggest changes.

Make your own objects

If you can make a useful object we will happily include it into the core. Probably you will want to read a lot of the documentation in the Bio::Root, talk to people on the Bioperl mailing list, bioperl-l@bioperl.org, and read biodesign.pod. biodesign.pod provides documentation on the conventions and ideas used in Bioperl, it's definitely worth a read if you would like to be a Bioperl developer.

ACKNOWLEDGEMENTS

Bioperl owes its early organizational support to its association with the award-winning VSNS-BCD BioComputing Courses; some students of the 1996 course (Chris Dagdigian, Richard Resnick, Lew Gramer, Alessandro Guffanti, and others) have contributed code and commentary. Georg Fuellen, the VSNS-BCD chief organizer was one of the early driving forces behind Bioperl. Steven Brenner, who was an early adopter of Perl for bioinformatics provided some of the early work on Bioperl. Lincoln Stein has long provided guidance and code.

Bioperl was then taken up by people developing code at the large genome centres. In particular Steve Chervitz at Stanford, Ian Korf at the Genome Sequencing Centre (St. Louis) and Ewan Birney at the Sanger Centre (Cambridge UK). All of the C code XS extensions were provided by Ewan Birney. Bioperl is used in anger at these sites, indicating that is both useful and that it works.

Jason Stajich and Hilmar Lapp joined Bioperl for the drive towards a 0.7 release over 2000 and the first part of 2001, which includes a revised feature location model, richer feature objects (in particular genes) and more and better tools. Peter Schattner and Lorenz Pollak contributed serious chunks of code, being the AlignIO and bptutorial scripts and the BPLite port to Bioperl respectively. At this time Bioperl was being used in absolute earnest by the Ensembl group which shook out a number of problems in the code base. Additional compatibility with the Sequence Workbench (Bioperl-gui, Mark Wilkinson and David Block) and Biocorba (Jason Stajich, Brad Chapman and Alan Robinson) and finally Game-XML (Brad Marshall) provided more interoperability.

Current server hardware for bioperl.org (and other open-bio.org hosted projects) was provided by Compaq Computer Corporation. The donation was facilitated by both the Pharmaceutical Sales and High Performance Technical Computing (HPTC) groups.

The Bioperl servers reside in Cambridge, Massachusetts USA with colocation facilities and Internet bandwidth donated by Genetics Institute. In particular Dr. Steven Howes, Kenny Grant & Rich DiNunno have made significant efforts on our behalf.

COPYRIGHT

 Copyright (c) 1996-2000 Georg Fuellen, Richard Resnick, Steven E. Brenner,
 Chris Dagdigian, Steve Chervitz, Ewan Birney, James Gilbert, Elia Stupka, 
 and others. All Rights Reserved. This module is free software; 
 you can redistribute it and/or modify it under the same terms as Perl itself.