The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

get_intersecting_features.pl

A script to pull out overlapping features from the database.

SYNOPSIS

get_intersecting_features.pl [--options] <filename>

  Options:
  --in <filename>
  --db <database>
  --feature <text>
  --start <integer>
  --stop <integer>
  --pos [5 | m | 3]
  --extend <integer>
  --ref [start | mid]
  --out <filename>
  --gz
  --version
  --help

OPTIONS

The command line flags and descriptions:

--in <filename>

Specify an input file containing either a list of database features or genomic coordinates for which to collect data. The file should be a tab-delimited text file, one row per feature, with columns representing feature identifiers, attributes, coordinates, and/or data values. The first row should be column headers. Bed files are acceptable, as are text files generated by other BioToolBox scripts. Files may be gzipped compressed.

--db <database>

Specify the name of a Bio::DB::SeqFeature::Store annotation database from which gene or feature annotation may be derived. A database is required for generating new data files with features. This option may skipped when using coordinate information from an input file (e.g. BED file), or when using an existing input file with the database indicated in the metadata. For more information about using annotation databases, see https://code.google.com/p/biotoolbox/wiki/WorkingWithDatabases.

--feature <text>

Specify the name of the target features to search for in the database that intersect with the list of reference features. The type may be a either a GFF "type" or a "type:method" string. If not specifed, then the database will be queried for potential GFF types and a list presented to the user to select one.

--start <integer>
--stop <integer>

Optionally specify the relative start and stop positions from the 5' end (default) or the end specified by the "--pos" option with which to restrict the search region for target features. For example, specify "--start=-200 --stop=0" to restrict to the promoter region of genes. Both positions must be specified. Default is to take the entire region of the reference feature.

--pos [ 5 | m | 3 ]

Indicate the relative position from which to make the adjustments to the search window. Both start and stop adjustments may be made from the respective 5 prime, 3 prime, or middle position as dictated by the feature's strand value.

--extend <integer>

Optionally specify the number of bp to extend the reference feature's region on each side. Useful when you have small reference regions and you want to include a larger search region.

--ref [start | mid]

Indicate the reference point from which to calculate the distance between the reference and target features. The same reference point is used for both features. Valid options include "start" (or 5' end for stranded features) and "mid" (for midpoint). Default is "start".

--out <filename>

Optionally specify a new filename. A standard tim data text file is written. The default is to rewrite the input file.

--gz

Specify whether the output file should (not) be compressed with gzip.

--version

Print the version number.

--help

Display the POD documentation

DESCRIPTION

This program will take a list of reference features and identify target features which intersect them. The reference features may be either named features (name and type) or genomic regions (chromosome, start, stop). By default, the search region for each reference feature is the entire feature, but may be restricted or expanded in size with appropriate modifiers (--start, --stop, --extend). The target features are specifed as specific types.

Several attributes of the found features are appended to the original input file data. First, the number of target features are reported. If more than one are found, the feature with the most overlap with the reference feature is preferentially listed. The name, type, and strand of the selected target feature is reported. Finally, the distance from the reference feature to the target feature is reported. The reference points for measuring the distance is by default the start or 5' end of the features, or optionally the midpoints. Note that the distance measurement is relative to the coordinates after adjustment with the --start, --stop, and --extend options.

A standard tim data text file is written.

AUTHOR

 Timothy J. Parnell, PhD
 Howard Hughes Medical Institute
 Dept of Oncological Sciences
 Huntsman Cancer Institute
 University of Utah
 Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.