CDMI Package

Sapling Database Access Methods

Introduction

The CDMI database represents an instance of the Kbase Central Data Model. This object has minimal capabilities: most of its power comes the ERDB base class.

The fields in this object are as follows.

loadDirectory: Name of the directory containing the key load files.
tuning: Reference to a hash of tuning parameters.

Configuration and Construction

The database is governed by tuning parameters in an XML configuration file. The file name should be CdmiConfig.xml in the load directory. The tuning parameters that affect the way the data is loaded. These are specified as attributes in the TuningParameters element, as follows.

maxLocationLength: The maximum number of base pairs allowed in a single location. IsLocatedIn records are split into sections based on this length, so when you are looking for all the features in a particular neighborhood, you can look for locations within the maximum location distance from the neighborhood, and even if you have a huge operon that contains tens of thousands of base pairs, you'll still be able to find it.
maxSequenceLength: The maximum number of base pairs allowed in a single DNA sequence. DNA sequences are broken into segments to prevent excessively large genomes from clogging memory during sequence resolution.

Loading

Unlike a normal ERDB database, the CDMI is loaded in sections, usually one genome at a time, rather than in a massive full-database load. The standard load support is therefore not present.

Tuning Parameter Defaults

Each tuning parameter must have a default value, in case it is not present in the XML configuration file. The defaults are specified in a constant hash reference called TUNING_DEFAULTS.

new

my $cdmi = CDMI->new(%options);

Construct a new CDMI object. The following options are supported.

loadDirectory: Data directory to be used by the loaders. The default is /var/kbase/cdm.
DBD: XML database definition file. The default is taken from the CDMIDBD environment variable, or KSaplingDBD.xml in the load directory if the environment variable is not set.
dbName: Name of the database to use. The default is kbase_sapling.
sock: Socket for accessing the database. The default is the system default.
userData: Name and password used to log on to the database, separated by a slash. The default is a user name of seed and no password.
dbhost: Database host name. The default is localhost.
port: MYSQL port number to use (MySQL only). The default is 3306.
dbms: Database management system to use (e.g. postgres). The default is mysql.
uuid: Data::UUID object for generating annotation IDs. Will not exist unless it's needed.
develop: If TRUE, then the development database will be used. The development database is located on a different server with a different DBD. This option overrides dbhost, externalDBD, dbname, and DBD.

new_for_script

my $cdmi = CDMI->new_for_script(%options);

Construct a new CDMI object for a command-line script. This method uses a call to "getoptions" in GetOpt::Long to parse the command-line options, with the incoming options parameter as a parameter. The following command-line options (all of which are optional) will also be processed by this method and used to construct the CDMI object.

If the command-line parse fails, an undefined value will be returned rather than a CDMI object.

loadDirectory: Data directory to be used by the loaders.
DBD: XML database definition file.
dbName: Name of the database to use.
sock: Socket for accessing the database.
userData: Name and password used to log on to the database, separated by a slash.
dbhost: Database host name.
port: MYSQL port number to use (MySQL only).
dbms: Database management system to use (e.g. postgres, default mysql).
develop: If specified, then the development database will be used. This database is located on a different server with a different DBD. The develop option overrides dbhost, dbname and DBD, and forces use of an external DBD.

Public Methods

ComputeTaxonID

my $taxID = $cdmi->ComputeTaxonID($scientificName);

Compute the best-match taxonomy ID for a genome with the specified scientific name. An attempt will be made to match to the strain and then the genus and species. If no match is found, an undefined value will be returned.

scientificName: Scientific name of the genome whose taxonomy ID is desired.
RETURN: Returns the ID of the best taxonomic grouping at which to attach the named genome, or undef if no such grouping can be found.

GetLocations

my @locs = $cdmi->GetLocations($fid);

Return the locations of the DNA for the specified feature.

fid: ID of the feature whose location is desired.
RETURN: Returns a list of BasicLocation objects for the locations containing the feature's DNA.

GenesInRegion

my @pegs = $cdmi->GenesInRegion($location);

Return a list of the IDs for the features that overlap the specified region on a contig.

location: Location of interest, either in the form of a location string (e.g. 360108.3:NZ_AANK01000002_264528_264007) or a BasicLocation object.
RETURN: Returns a list of feature IDs. The features in the list will be all those that overlap or occur inside the location of interest.

ComputeDNA

my $dna = $sap->ComputeDNA($contig, $beg, $dir, $length);

Return the DNA sequence for the specified location.

contig: The ID of the contig containing the desired DNA.
beg: Location of the first desired base pair.
dir: + for the plus strand and - for the minus strand.
length: Number of base pairs.
RETURN: Returns a string containing the desired DNA. The DNA comes back in pure lower-case.

Taxonomy

my @taxonomy = $sap->Taxonomy($genomeID, $format);

Return the full taxonomy of the specified genome, starting from the domain downward.

genomeID: ID of the genome whose taxonomy is desired.
format (optional): Format of the taxonomy. names will return primary names, numbers will return taxonomy numbers, and both will return taxonomy number followed by primary name. The default is names.
RETURN: Returns a list of taxonomy names, starting from the domain and moving down to the node where the genome is attached.

ComputeNewAnnotationID

my $annotationID = $cdmi->ComputeNewAnnotationID($fid, $timeStamp);

Return a valid annotation ID for the specified feature and time stamp. The ID is formed from the feature ID and a complemented version of the time stamp followed by a UUID. The complemented time stamp causes the annotations to present in reverse chronological order and the feature ID causes annotations for the same feature to cluster together. This provides for efficient retrieval, though the keys are gigantic.

fid: ID of the target feature for the annotation.
timeStamp: time at which the annotation occurred
RETURN: Returns a unique ID to give to the annotation.

TuningParameter

my $parm = $cdmi->TuningParameter($parmName);

Return the value of the specified tuning parameter. Tuning parameters are read from the XML configuration file.

parmName: Name of the parameter whose value is desired.
RETURN: Returns the paramter value.

ReadConfigFile

my $xmlObject = $cdmi->ReadConfigFile();

Return the hash structure created from reading the configuration file, or an undefined value if the file is not found.

Virtual Methods

PreferredName

my $name = $cdmi->PreferredName();

Return the variable name to use for this database when generating code.

LoadDirectory

my $dirName = $cdmi->LoadDirectory();

Return the name of the directory in which load files are kept. The default is the FIG temporary directory, which is a really bad choice, but it's always there.

UseInternalDBD

my $flag = $cdmi->UseInternalDBD();

Return TRUE if this database should be allowed to use an internal DBD. The internal DBD is stored in the _metadata table, which is created when the database is loaded. The Sapling uses an internal DBD.

To install Bio::KBase, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Bio::KBase

CPAN shell

perl -MCPAN -e shell
install Bio::KBase

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)