Ewan Birney

NAME

Bio::DB::GFF::Adaptor::dbi::mysqlopt -- Optimized Bio::DB::GFF adaptor for mysql

SYNOPSIS

See Bio::DB::GFF

DESCRIPTION

This adaptor is similar to Bio::DB::GFF::Adaptor::mysqlopt, except that it implements several optimizations:

1. Binning

It uses a hierarchical binning scheme to dramatically accelerate feature queries that use positional information.

2. DNA fetching

Because mysql is slow when fetching substrings out of large text BLOBs, this adaptor uses Bio::DB::Fasta to fetch DNA segments rapidly. out of FASTA files.

3. An ACEDB interface

Features can be linked to ACEDB objects, allowing this module to be used as a replacement for the Ace::Sequence module.

The schema is identical to Bio::DB::GFF::Adaptor::dbi, except for the fdata table:

    fid            feature ID (integer)
    fref           reference sequence name (string)
    fstart         start position relative to reference (integer)
    fstop          stop postion relative to reference (integer)
    fbin           bin containing this feature (float)
    ftypeid        feature type ID (integer)
    fscore         feature score (float); may be null
    fstrand        strand; one of "+" or "-"; may be null
    fphase         phase; one of 0, 1 or 2; may be null
    gid            group ID (integer)
    ftarget_start  for similarity features, the target start position (integer)
    ftarget_stop   for similarity features, the target stop position (integer)

The only difference is the "fbin" field, which indicates the interval in which the feature is contained. This module uses a hierarchical set of bins, the smallest of which are 1 kb, and the largest is 100 megabases.

In the call to initialize() you can set the following options:

  -minbin        minimum value to use for binning

  -maxbin        maximum value to use for binning

  -straight_join_limit
                 size of range over which it is faster to force mysql to use the range for indexing

-minbin and -maxbin indicate the minimum and maximum sizes of features, and are important for range query optimization. They are set at reasonable values -- in particular, the maximum bin size is set to 100 megabases. Do not change them unless you know what you are doing.

new

 Title   : new
 Usage   : $db = Bio::DB::GFF->new(@args)
 Function: create a new adaptor
 Returns : a Bio::DB::GFF object
 Args    : see below
 Status  : Public

The new constructor is identical to the "dbi" adaptor's new() method, except that the prefix "dbi:mysql" is added to the database DSN identifier automatically if it is not there already.

  Argument       Description
  --------       -----------

  -dsn           the DBI data source, e.g. 'dbi:mysql:ens0040' or "ens0040"

  -fasta         path to a directory containing FASTA files for this database
                    (e.g. "/usr/local/share/fasta")

  -acedb         an acedb URL to use when converting features into ACEDB
                    objects (e.g. sace://localhost:2005)

  -user          username for authentication

  -pass          the password for authentication

  -minbin        minimum value to use for binning

  -maxbin        maximum value to use for binning

The path indicated by -fasta must be writable by the current process. This is needed in order to build an index of the fasta files.

-minbin and -maxbin indicate the minimum and maximum sizes of features, and are important for range query optimization. They are set at reasonable values -- in particular, the maximum bin size is set to 100 megabases. Do not change them unless you know what you are doing.

freshen_ace

 Title   : freshen
 Usage   : $flag = Bio::DB::GFF->freshen_ace;
 Function: Refresh internal acedb handle
 Returns : flag if correctly freshened
 Args    : none
 Status  : Public

ACeDB has an annoying way of timing out, leaving dangling database handles. This method will invoke the ACeDB reopen() method, which causes dangling handles to be refreshed. It has no effect if you are not using ACeDB to create ACeDB objects.