NAME

bioperl backend - how to customise bioperl for your site

SYNOPSIS

Not really appropiate for a synopsis. Read on

DESCRIPTION

This document is designed to let you customise bioperl on your site. Bioperl can work with a number of database formats (at the moment, simple fasta flat file formats and EMBL/Swissprot .dat format), allowing users to retrieve sequences from these databases. In addition another layer, above flat file indexing is provided, allowing sites to retrieve sequences from GenBank via the web or via flat file indexing, or - if you have the time to do so, you can write your own interface to an in-house RDB. Using DBI this should be quite simple.

Two scripts are provided to get you started with the bioperl backend:

bpfetch: Fetches sequences from a Database
bpindex: Builds indexes for flat files databases which are easily accessible by bpfetch

The core of the backend system is found in following modules

Bio::DB::*

generic access to databases, whether flat file, web or rdb. At the moment, this provides random access retrieval, on the basis of ids or accession numbers, but does not provide the ability to loop over the entire database, nor does it provide any complex querying ability.

Bio::DB::BioSeqI is the abstract interface (hence the I) for the databases. Bio::DB::GenBank and Bio::DB::GenPept are concrete implementations for network access to the GenBank and GenPept databases held at NCBI, using http as a protocol.

Bio::Index::*

flat file indexing system, for read-only, flat file distributions. These provide for specific instances generic type access, but the underlying machinery can be customised for any number of different flat file systems.

The Index modules EMBL and Fasta, as they are designed as Sequence databases conform to the Bio::DB::BioSeqI interface, meaning they can be used whereever the Bio::DB::BioSeqI is expected.

Bio::SeqIO::*

conversion systems for Bio::Seq objects, either to or from sequence streams. The move of things into SeqIO prevents the Bio::Seq object bloating up with format code, and the SeqIO system has the benefit of being very easy to extend to new formats.

SETTING UP BIOPERL INDICES

If you want to use the bioperl indexing of fasta and embl/swissprot .dat files then the bpfetch and bpindex scripts are great ways to start off (and also reading the scripts shows you how to use the bioperl indexing stuff). bpfetch and bpindex coordinate by the use of two environment variables

  BIOPERL_INDEX - directory where the indices are kept

  BIOPERL_INDEX_TYPE - type of DBM file to use for the index

The basic way of indexing a database, once BIOPERL_INDEX has been set up, is to go

  bpindex <index-name> <filenames as full path>

eg, for Fasta files

  bpindex est /nfs/somewhere/fastafiles/est*.fa

Or, for embl/swissprot files

  bpindex -fmt=EMBL swiss /nfs/somewhere/swiss/swissprot.dat

To retrieve sequences from the index go

  bpfetch <index-name>:<id>

eg,

  bpfetch est:AA01234

  bpfetch swiss:VAV_HUMAN

bpfetch has other options to connect to genbank across the network.

CHECKLIST

   make a directory called /nfs/datadisk/bioperlindex/

   setenv BIOPERL_INDEX (or export in Bash) in the system login
   script to /nfs/datadisk/bioperlindex/

   go bpindex swissprot /nfs/datadisk/swiss/swissprot.dat
   etc

   You are ready to use bpfetch

To install Bio::Seq, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Bio::Seq

CPAN shell

perl -MCPAN -e shell
install Bio::Seq

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

SETTING UP BIOPERL INDICES

CHECKLIST

Module Install Instructions