The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Text::Tidx - Index a delimited text file containing start-stop positions

SYNOPSIS

  use Text::Tidx;
  Text::Tidx::build("annot.txt");
  $idx = Text::Tidx->new("annot.txt");
  print $idx->query("chr1",240034);

FUNCTION

new(FILE)

Loads an index from a file.

query(CHR, POS [, END])

Query a loaded index, returning an array of text lines corresponding to the specified chr string and integer pos. If an end is specified, then all overlapping regions are returned.

build(FILE [, option1=>value1, ...])

Builds an index. Default is to index on the first 3 columns.

The following options may be used:

sep

Field separator, default to a tab

chr

1-based index of the string key field, can be -1 for "Not applicable", default is 1

beg

1-based index of the field containing the start of the integer numeric range, default is 2

end

1-based index of the field containing the end of the integer numeric range, default is 3

skip

If an integer, then it is the number of rows to skip. If it's a character, then skips all rows beginning with that character. Default is '#', skipping comment chars (compatible with gffs, vcfs, etc.)

sub_e

If nonzero, then the "end" of the range is not included in the range, ie: one is subtracted from the end positions.

DESCRIPTION

Text:Tidx allows you to index any text file using a key field and range coordinates, and, later, use that index for O(log(n)) range-lookups into the file.

This was written because it was, for me significantly faster, for very large files (>100k rows) and many searches ( > 10k), then entering all of the information into a database and doing range querys, even faster than SQLITE's rtree extension, or the "tabix" program both of which are do similar things and do them rather well.

Although it was designed for chromosome, stop, start indexing, it is not genome specific, and can index any delimited text file.

Indexes are loaded into RAM. If you only have a few lookups to do perl instance, this is expensive, and a database will be faster.

AUTHOR

Erik Aronesty, <earonesty@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2012 by Erik Aronesty

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.1 or, at your option, any later version of Perl 5 you may have available.