GenBank HOWTO

This is a quick synopsis of the steps needed to initialize a GBrowse database from a genbank record. For the purposes of illustration, we will use the RefSeq record for M. bovis, accession NC_002945.

Using the GBrowse in-memory database

1. Convert from Genbank format into GFF format

First download the Genbank record. Then you can create a GFF version of the file easily using the bp_genbank2gff3.pl script, which is part of bioperl:

   bp_genbank2gff3.pl NC_002945

This command will create a file called NC_002945.gff.

The newly-converted file will be in GFF3 format, which combines feature data with sequence/DNA data. This means that you do not need a separate FASTA file for the sequence.

2. Install the GFF file into the databases directory

Copy this file into your in-memory GFF databases directory, as described in the tutorial. We will assume /usr/local/apache/htdocs/gbrowse/databases.

  mkdir /usr/local/apache/htdocs/gbrowse/databases/mbovis
  chmod o+rwx /usr/local/apache/htdocs/gbrowse/databases/mbovis
  cp NC_002945.gff /usr/local/apache/htdocs/gbrowse/databases/mbovis

3. Set up the configuration file

Use the configuration file 08.genbank.conf as your starting template. This is located in contrib/conf_files:

  cp contrib/conf_files/08.genbank.conf /usr/local/apache/conf/gbrowse.conf/mb.conf

4. Edit the configuration file as appropriate

You will need to change the [GENERAL] section to use the in-memory adaptor and to point to the location of the M. bovis GFF file:

 [GENERAL]
 description   = Mycobacterium Bovis In-Memory
 db_adaptor    = Bio::DB::GFF
 db_args       = -adaptor memory
                      -dir /usr/local/apache/htdocs/gbrowse/databases/mbovis

You might also want to change the "examples" tag to introduce the accession number for the whole genome, and a few choice gene names and search terms:

  examples = NC_002945 Mb1800 galT glucose

That is all there is to it, but since this is a pretty big chunk of DNA (> 4 Mbp), it uses a considerable amount of memory and performance will be sluggish unless you have a fast machine with lots of memory. So you might wish to view it using a MySQL, PostgreSQL or Oracle database. The following are instructions for doing this.

Using the GBrowse in-memory database

We will assume that you are using a MySQL database.

1. Create the database

Create the database using mysqladmin:

  mysqladmin create mbovis

As described in the tutorial, give yourself write permission for the database, and give the web server user (e.g. "nobody") select permission.

2. Load the GFF3 into the database

You can load the GFF3 into your Mysql database using the bp_bulk_load_gff.pl script from Bioperl:

 bp_bulk_load_gff.pl -d mbovis NC_002945.gff

3. Set up the configuration file

Use the configuration file 08.genbank.conf as your starting template. This is located in contrib/conf_files:

  cp contrib/conf_files/08.genbank.conf /usr/local/apache/conf/gbrowse.conf/mb.conf

4. Edit the configuration file as appropriate

You will need to change the [GENERAL] section to use the appropriate database adaptor:

 [GENERAL]
 description   = Mycobacterium Bovis Database
 db_adaptor    = Bio::DB::GFF
 db_args       = -adaptor dbi::mysql
                      -dsn     dbi:mysql:database=mbovis;host=localhost
                 -user    nobody
                           -passwd  ""

You might also want to change the "examples" tag to introduce the accession number for the whole genome, and a few choice gene names and search terms:

  examples = NC_002945 Mb1800 galT glucose

That should be it!

NOTE

You can load as many accessions into the database as you like. Each one will appear as a "chromosome" named after the accession number of the entry.