The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

data2gff.pl

A script to convert data into a frequency distribution, useful for graphing.

SYNOPSIS

data2frequency.pl --bins <integer> --size <number> <filename>

data2frequency.pl --bins <integer> --max <number> <filename>

data2frequency.pl --size <number> --max <number> <filename>

  Options:
  --in <filename>
  --bins <integer>
  --size <number>
  --index <list|range>
  --min <number>
  --max <number>
  --out <filename>
  --version
  --help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Text files generated by other BioToolBox scripts are acceptable. Files may be gzipped compressed.

--bins <integer>

Specify the number of bins or partitions into which the data will be grouped. This argument is optional if --max and --size are provided.

--size <number>

Specify the size of each bin or partition. A decimal number may be provided. This argument is optional if --bins and --max are provided.

--min <number>

Optionally indicate the minimum value of the bins. When generating the list of bins, this is used as the starting value. All values less than this value will be ignored. The default is 0. A negative number may be provided using the format --min=-1.

--max <number>

Specify the maximum bin value. All values greater than this value will be ignored. This argument is optional if --bins and --size are provided.

--index <list|range>

Specify the datasets in the input data file to be converted to a distribution. The 0-based column number of the datasets should be provided. Multiple datasets may be provided as a comma-delimited list, as a consecutive list (start-stop), or a combination of both. Do not include spaces! If no datasets are provided, the program will interactively present to the user a list of possible datasets to convert.

--out <filename>

Specify the output file name. The default is to take the input file base name and append '_frequency' to it.

--version

Print the version number.

--help

Display this help

DESCRIPTION

This program will convert a datasets in a data file into a distribution. This may then be used to conveniantly plot a histogram using a program such as 'graph_profile.pl'.

Set the distribution parameters using the --bins and --binsize arguments, which set the number of bins and the size of each bin, respectively. The start number and maximum bin value may be optionally set as well.

One or more datasets within the data file may be converted. These may be specified on the command line or chosen interactively from a list presented to the user.

A data text file will be written as output. The bin values are listed as the first column, and the number of datapoints within each bin are listed in subsequent columns for each dataset requested.

AUTHOR

 Timothy J. Parnell, PhD
 Howard Hughes Medical Institute
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.