The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Bio::ToolBox::Data::core - Common functions to Bio:ToolBox::Data family

DESCRIPTION

Common methods for metadata and manipulation in a Bio::ToolBox::Data data table and Bio::ToolBox::Data::Stream file stream. This module should not be used directly. See the respective modules for more information.

METHODS REFERENCE

For quick reference only. Please see Bio::ToolBox::Data for implementation.

new

Generate new object. Used as a common base for Bio::ToolBox::Data and Bio::ToolBox::Data::Stream.

verify

Verify the integrity of the Data object. Checks multiple things, including metadata, table integrity (consistent number of rows and columns), and special file format structure.

open_database

This is wrapper method that tries to do the right thing and passes on to either "open_meta_database" or "open_new_database" methods. Basically a legacy method for "open_meta_database".

open_meta_database

Open the database that is listed in the metadata. Returns the database connection. Pass a true value to force a new database connection to be opened, rather than returning a cached connection object (useful when forking).

open_new_database

Convenience method for opening a second or new database that is not specified in the metadata, useful for data collection. This is a shortcut to "open_db_connection" in Bio::ToolBox::db_helper. Pass the database name.

verify_dataset

Verifies the existence of a dataset or data file before collecting data from it. Multiple datasets may be verified. This is a convenience method to "verify_or_request_feature_types" in Bio::ToolBox::db_helper. Pass the name of the dataset to verify.

delete_column

Delete one or more columns in a data table. Pass a list of the indices to delete.

reorder_column

Reorder the columns in a data table. Allows for skipping (deleting) and duplicating columns. Pass a list of the new index order.

feature

Returns or sets the string of the feature name listed in the metadata.

feature_type

Returns "named", "coordinate", or "unknown" based on what kind of feature is present in the data table.

program

Returns or sets the program string in the metadata.

database

Returns or sets the name of the database in the metadata.

bam_adapter

Returns or sets the short name of bam adapter being used: "sam" or "hts".

big_adapter

Returns or sets the short name of the bigWig and bigBed adapter being used: "ucsc" or "big".

format

Returns a text string describing the format of the file contents, such as gff3, gtf, bed, genePred, narrowPeak, etc.

gff

Returns or sets the GFF version value in the metadata.

bed

Returns or sets the number of BED columns in the metadata.

ucsc

Returns or sets the number of columns in a UCSC-type file format, including genePred and refFlat.

vcf

Returns or sets the VCF version value in the metadata.

number_columns

Returns the number of columns in the data table.

number_rows

Returns the number of rows in the data table.

last_column

Returns the array index of the last column in the data table.

last_row

Returns the array index of the last row in the data table.

filename

Returns the complete filename listed in the metadata.

basename

Returns the base name of the filename listed in the metadata.

path

Returns the path portion of the filename listed in the metadata.

extension

Returns the recognized extension of the filename listed in the metadata.

comments

Returns an array of comment lines present in the metadata.

add_comment

Adds a string to the list of comments to be included in the metadata.

delete_comment

Deletes the indicated array index from the metadata comments array.

vcf_headers

Partially parses VCF metadata header lines into a hash structure.

rewrite_vcf_headers

Rewrites the vcf headers back into the metadata comments array.

list_columns

Returns an array of the column names

name

Returns or sets the name of the column. Pass the index, and optionally new name.

metadata($index, $key)

Returns or sets the metadata key/value pair for a specific column. Pass the index, key, and optionally new value.

delete_metadata

Deletes the metadata key for a column. Pass the index and key.

copy_metadata

Copies the metadata values from one column to another column. Pass the source and target indices.

find_column

Returns the column index for the column with the specified name. Name searches are case insensitive and can tolerate a # prefix character. The first match is returned. Pass the name to search.

chromo_column

Returns the index of the column that best represents the chromosome column.

start_column

Returns the index of the column that best represents the start, position, or transcription start column.

stop_column
end_column

Returns the index of the column that best represents the stop or end column.

strand_column

Returns the index of the column that best represents the strand.

name_column

Returns the index of the column that best represents the name.

type_column

Returns the index of the column that best represents the type.

id_column

Returns the index of the column that represents the Primary_ID column used in databases.

score_column

Returns the index of the column that represents the Score column in certain formats, such as GFF, BED, bedGraph, etc.

zero_start
interbase

Returns true (1) or false (0) if the coordinate system appears to be an interbase, half-open, or zero-based coordinate system. This is based on file type, e.g. .bed, or if the start coordinate column name is start0. The coordinate system can also be explicitly changed by passing an appropriate value; note that this will also change the start coordinate column name as appropriate.

get_seqfeature

Returns the stored SeqFeature object for a given row.

SEE ALSO

Bio::ToolBox::Data

AUTHOR

 Timothy J. Parnell, PhD
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.