-
-
12 Sep 2011 15:47:19 UTC
- Distribution: Bio-BLAST
- Module version: 0.4
- Source (raw)
- Browse (raw)
- Changes
- Homepage
- How to Contribute
- Repository (git clone)
- Issues
- Testers (286 / 0 / 0)
- Kwalitee
Bus factor: 0- % Coverage
- License: perl_5
- Activity
24 month- Tools
- Download (38.37KB)
- MetaCPAN Explorer
- Permissions
- Subscribe to distribution
- Permalinks
- This version
- Latest version
and 1 contributors-
Robert Buels
- Dependencies
- Bio::PrimarySeq
- Bio::PrimarySeqI
- Bio::Seq::LargePrimarySeq
- Carp
- Class::Accessor::Fast
- File::Basename
- File::Copy
- File::Path
- File::Slurp
- File::Spec::Functions
- IO::Pipe
- IPC::Cmd
- IPC::System::Simple
- List::MoreUtils
- List::Util
- Memoize
- POSIX
- namespace::clean
- Reverse dependencies
- CPAN Testers List
- Dependency graph
NAME
Bio::BLAST::Database - work with formatted BLAST databases
SYNOPSIS
use Bio::BLAST::Database; # open an existing bdb for reading my $fs = Bio::BLAST::Database->open( full_file_basename => '/path/to/my_bdb', ); # will read from /path/to/my_bdb.nin, /path/to/my_bdb.nsq, etc my @filenames = $fs->list_files; #reopen it for writing $fs = Bio::BLAST::Database->open( full_file_basename => '/path/to/my_bdb', write => 1, ); # replace it with a different set of sequences $fs->format_from_file('myseqs.seq'); # can also get some metadata about it print "db's title is ".$fs->title; print "db was last formatted on ".localtime( $fs->format_time ); print "db file modification was ".localtime( $fs->file_modtime );
DESCRIPTION
Each object of this class represents an NCBI-formatted sequence database on disk, which is a set of files, the exact structure of which varies a bit with the type and size of the sequence set.
This is mostly an object-oriented wrapper for using NCBI's
fastacmd
andformatdb
tools.ATTRIBUTES
full_file_basename
Full path to the blast database file basename. This is the entire path to the BLAST database files, except for the final suffixes (
.nin
,.nsq
, etc).my $basename = $db->full_file_basename; #returns '/data/shared/blast/databases/genbank/nr'
create_dirs
true/false flag for whether to create any necessary dirs at format time
write
true/false flag for whether to write any files that are in the way when formatted
title
title of this blast database, if set
indexed_seqs
return whether this blast database is indexed
type
accessor for type of blastdb. must be set in new(), but open() looks at the existing files and sets this
METHODS
open
Usage: my $fs = Bio::BLAST::Database->open({ full_file_basename => $ffbn, write => 1, create_dirs => 1, }); Desc : open a BlastDB with the given ffbn. Args : hashref of params as: { full_file_basename => full path plus basename of files in this blastdb, type => 'nucleotide' or 'protein' write => default false, set true to write any files in the way, create_dirs => default false, set true to create any necessary directories if formatted } Ret : Bio::BLAST::Database object Side Effects: none if no files are present at the given ffbn. overwise, dies if files are present and write is not specified, or if dir does not exist and create_dirs was not specified Example:
to_fasta
Usage: my $fasta_fh = $bdb->to_fasta; Desc : get the contents of this blast database in FASTA format Ret : an IO::Pipe filehandle Args : none Side Effects: runs 'fastacmd' in a forked process, cleaning up its output, and passing it to you
format_from_file
Usage: $db->format_from_file(seqfile => 'mysequences.seq'); Desc : format this blast database from the given source file, into its proper place on disk, overwriting the files already present Ret : nothing meaningful Args : hash-style list as: seqfile => filename containing sequences, title => (optional) title for this blast database, indexed_seqs => (optional) if true, formats the database with indexing (and sets indexed_seqs in this obj) Side Effects: runs 'formatdb' to format the given sequences, dies on failure
file_modtime
Desc: get the earliest unix modification time of the database files Args: none Ret : unix modification time of the database files Side Effects: Example:
format_time
Usage: my $time = $db->format_time; Desc : get the format time of these db files Ret : the value time() would have returned when this database was last formatted, or undef if that could not be determined (like if the files aren't there) Args : none Side Effects: runs 'fastacmd' to extract the formatting time from the database files NOTE: This function assumes that the computer that last formatted this database had the same time zone set as the computer we are running on. Also, the time returned by this function is rounded down to the minute, because fastacmd does not print the format time in seconds.
check_format_permissions
Usage: $bdb->check_format_from_file() or die "cannot format!\n"; Desc : check directory existence and file permissions to see if a format_from_file() is likely to succeed. This is useful, for example, when you have a script that downloads some remote database and you'd like to check first whether we even have permissions to format before you take the time to download something. Args : (optional) alternate full file basename to write blast DB to e.g. '/tmp/mytempdir/tester_blast_db' Ret : nothing if everything looks good, otherwise a string error message summarizing the reason for failure Side Effects: reads from filesystem, may stat some files
is_split
Usage: print "that thing is split, yo" if $db->is_split; Desc : determine whether this database is in multiple parts Ret : true if this database has been split into multiple files by formatdb (e.g. nr.00.pin, nr.01.pin, etc.) Args : none Side Effects: looks in filesystem
files_are_complete
Usage: print "complete!" if $db->files_are_complete; Desc : tell whether this blast db has a complete set of files on disk Ret : true if the set of files on disk looks complete, false if not Args : (optional) true value if the files should only be considered complete if the sequences are indexed for retrieval Side Effects: lists files on disk
list_files
Usage: my @files = $db->list_files; Desc : get the list of files that belong to this blast database Ret : list of full paths to all files belonging to this blast database, Args : none Side Effects: looks in the filesystem
sequences_count
Desc: get the number of sequences in this blast database Args: none Ret : number of distinct sequences in this blast database, or undef if it could not be determined due to some error or other Side Effects: runs 'fastacmd' to get stats on the blast database file
get_sequence
Usage: my $seq = $fs->get_sequence('LE_HBa0001A02'); Desc : get a particular sequence from this db Args : sequence name to retrieve Ret : Bio::PrimarySeqI-implementing object, or nothing if not found Side Effects: dies on error
BASE CLASS(ES)
AUTHOR
Robert Buels <rmb32@cornell.edu>
COPYRIGHT AND LICENSE
This software is copyright (c) 2011 by Robert Buels.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
Module Install Instructions
To install Bio::BLAST::Database, copy and paste the appropriate command in to your terminal.
cpanm Bio::BLAST::Database
perl -MCPAN -e shell install Bio::BLAST::Database
For more information on module installation, please visit the detailed CPAN module installation guide.