data2gff.pl
A script to convert data into a frequency distribution, useful for graphing.
data2frequency.pl --bins <integer> --size <number> <filename>
data2frequency.pl --bins <integer> --max <number> <filename>
data2frequency.pl --size <number> --max <number> <filename>
Options: --in <filename> --bins <integer> --size <number> --index <list|range> --min <number> --max <number> --out <filename> --version --help
The command line flags and descriptions:
Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Text files generated by other BioToolBox scripts are acceptable. Files may be gzipped compressed.
Specify the number of bins or partitions into which the data will be grouped. This argument is optional if --max and --size are provided.
Specify the size of each bin or partition. A decimal number may be provided. This argument is optional if --bins and --max are provided.
Optionally indicate the minimum value of the bins. When generating the list of bins, this is used as the starting value. All values less than this value will be ignored. The default is 0. A negative number may be provided using the format --min=-1.
Specify the maximum bin value. All values greater than this value will be ignored. This argument is optional if --bins and --size are provided.
Specify the datasets in the input data file to be converted to a distribution. The 0-based column number of the datasets should be provided. Multiple datasets may be provided as a comma-delimited list, as a consecutive list (start-stop), or a combination of both. Do not include spaces! If no datasets are provided, the program will interactively present to the user a list of possible datasets to convert.
Specify the output file name. The default is to take the input file base name and append '_frequency' to it.
Print the version number.
Display this help
This program will convert a datasets in a data file into a distribution. This may then be used to conveniantly plot a histogram using a program such as 'graph_profile.pl'.
Set the distribution parameters using the --bins and --binsize arguments, which set the number of bins and the size of each bin, respectively. The start number and maximum bin value may be optionally set as well.
One or more datasets within the data file may be converted. These may be specified on the command line or chosen interactively from a list presented to the user.
A data text file will be written as output. The bin values are listed as the first column, and the number of datapoints within each bin are listed in subsequent columns for each dataset requested.
Timothy J. Parnell, PhD Howard Hughes Medical Institute Dept of Oncological Sciences Huntsman Cancer Institute University of Utah Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.
To install Bio::ToolBox, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Bio::ToolBox
CPAN shell
perl -MCPAN -e shell install Bio::ToolBox
For more information on module installation, please visit the detailed CPAN module installation guide.