Sapling Database Access Methods
The CDMI database represents an instance of the Kbase Central Data Model. This object has minimal capabilities: most of its power comes the ERDB base class.
The fields in this object are as follows.
Name of the directory containing the key load files.
Reference to a hash of tuning parameters.
The database is governed by tuning parameters in an XML configuration file. The file name should be CdmiConfig.xml in the load directory. The tuning parameters that affect the way the data is loaded. These are specified as attributes in the TuningParameters element, as follows.
CdmiConfig.xml
The maximum number of base pairs allowed in a single location. IsLocatedIn records are split into sections based on this length, so when you are looking for all the features in a particular neighborhood, you can look for locations within the maximum location distance from the neighborhood, and even if you have a huge operon that contains tens of thousands of base pairs, you'll still be able to find it.
The maximum number of base pairs allowed in a single DNA sequence. DNA sequences are broken into segments to prevent excessively large genomes from clogging memory during sequence resolution.
Unlike a normal ERDB database, the CDMI is loaded in sections, usually one genome at a time, rather than in a massive full-database load. The standard load support is therefore not present.
Each tuning parameter must have a default value, in case it is not present in the XML configuration file. The defaults are specified in a constant hash reference called TUNING_DEFAULTS.
TUNING_DEFAULTS
my $cdmi = CDMI->new(%options);
Construct a new CDMI object. The following options are supported.
Data directory to be used by the loaders. The default is /var/kbase/cdm.
/var/kbase/cdm
XML database definition file. The default is taken from the CDMIDBD environment variable, or KSaplingDBD.xml in the load directory if the environment variable is not set.
CDMIDBD
KSaplingDBD.xml
Name of the database to use. The default is kbase_sapling.
kbase_sapling
Socket for accessing the database. The default is the system default.
Name and password used to log on to the database, separated by a slash. The default is a user name of seed and no password.
seed
Database host name. The default is localhost.
localhost
MYSQL port number to use (MySQL only). The default is 3306.
3306
Database management system to use (e.g. postgres). The default is mysql.
postgres
mysql
Data::UUID object for generating annotation IDs. Will not exist unless it's needed.
If TRUE, then the development database will be used. The development database is located on a different server with a different DBD. This option overrides dbhost, externalDBD, dbname, and DBD.
dbhost
externalDBD
dbname
DBD
my $cdmi = CDMI->new_for_script(%options);
Construct a new CDMI object for a command-line script. This method uses a call to "getoptions" in GetOpt::Long to parse the command-line options, with the incoming options parameter as a parameter. The following command-line options (all of which are optional) will also be processed by this method and used to construct the CDMI object.
If the command-line parse fails, an undefined value will be returned rather than a CDMI object.
Data directory to be used by the loaders.
XML database definition file.
Name of the database to use.
Socket for accessing the database.
Name and password used to log on to the database, separated by a slash.
Database host name.
MYSQL port number to use (MySQL only).
Database management system to use (e.g. postgres, default mysql).
If specified, then the development database will be used. This database is located on a different server with a different DBD. The develop option overrides dbhost, dbname and DBD, and forces use of an external DBD.
develop
my $taxID = $cdmi->ComputeTaxonID($scientificName);
Compute the best-match taxonomy ID for a genome with the specified scientific name. An attempt will be made to match to the strain and then the genus and species. If no match is found, an undefined value will be returned.
Scientific name of the genome whose taxonomy ID is desired.
Returns the ID of the best taxonomic grouping at which to attach the named genome, or undef if no such grouping can be found.
undef
my @locs = $cdmi->GetLocations($fid);
Return the locations of the DNA for the specified feature.
ID of the feature whose location is desired.
Returns a list of BasicLocation objects for the locations containing the feature's DNA.
my @pegs = $cdmi->GenesInRegion($location);
Return a list of the IDs for the features that overlap the specified region on a contig.
Location of interest, either in the form of a location string (e.g. 360108.3:NZ_AANK01000002_264528_264007) or a BasicLocation object.
360108.3:NZ_AANK01000002_264528_264007
Returns a list of feature IDs. The features in the list will be all those that overlap or occur inside the location of interest.
my $dna = $sap->ComputeDNA($contig, $beg, $dir, $length);
Return the DNA sequence for the specified location.
The ID of the contig containing the desired DNA.
Location of the first desired base pair.
+ for the plus strand and - for the minus strand.
+
-
Number of base pairs.
Returns a string containing the desired DNA. The DNA comes back in pure lower-case.
my @taxonomy = $sap->Taxonomy($genomeID, $format);
Return the full taxonomy of the specified genome, starting from the domain downward.
ID of the genome whose taxonomy is desired.
Format of the taxonomy. names will return primary names, numbers will return taxonomy numbers, and both will return taxonomy number followed by primary name. The default is names.
names
numbers
both
Returns a list of taxonomy names, starting from the domain and moving down to the node where the genome is attached.
my $annotationID = $cdmi->ComputeNewAnnotationID($fid, $timeStamp);
Return a valid annotation ID for the specified feature and time stamp. The ID is formed from the feature ID and a complemented version of the time stamp followed by a UUID. The complemented time stamp causes the annotations to present in reverse chronological order and the feature ID causes annotations for the same feature to cluster together. This provides for efficient retrieval, though the keys are gigantic.
ID of the target feature for the annotation.
time at which the annotation occurred
Returns a unique ID to give to the annotation.
my $parm = $cdmi->TuningParameter($parmName);
Return the value of the specified tuning parameter. Tuning parameters are read from the XML configuration file.
Name of the parameter whose value is desired.
Returns the paramter value.
my $xmlObject = $cdmi->ReadConfigFile();
Return the hash structure created from reading the configuration file, or an undefined value if the file is not found.
my $name = $cdmi->PreferredName();
Return the variable name to use for this database when generating code.
my $dirName = $cdmi->LoadDirectory();
Return the name of the directory in which load files are kept. The default is the FIG temporary directory, which is a really bad choice, but it's always there.
my $flag = $cdmi->UseInternalDBD();
Return TRUE if this database should be allowed to use an internal DBD. The internal DBD is stored in the _metadata table, which is created when the database is loaded. The Sapling uses an internal DBD.
_metadata
To install Bio::KBase, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::KBase
CPAN shell
perl -MCPAN -e shell install Bio::KBase
For more information on module installation, please visit the detailed CPAN module installation guide.