Jeremy Volkening
and 1 contributors


BioX::Seq::Fetch - Fetch records from indexed FASTA non-sequentially


    use BioX::Seq::Fetch;

    my $parser = BioX::Seq::Fetch->new($filename);

    my $seq = $parser->fetch_seq('seq_ABC');
    my $sub = $parser->fetch_seq('seq_XYZ', 8 => 15);


BioX::Seq::Fetch provides non-sequential access to records from indexed sequence files. Currently only FASTA files indexed using samtoools faidx or another compatible method are supported. The module will now create samtools-compatible index files automatically if they are missing.



    my $parser = BioX::Seq::Fetch->new(
        with_descriptions => 1,

Create a new BioX::Seq::Fetch parser. Requires an input filename (STDIN or open filehandles are not supported, as a filename is needed to find the corresponding index file and to ensure than seek()-ing is supported). Takes one optional boolean argument ('with_descriptions') indicating whether to enable backtracking to find and include any sequence description present (normally this is absent as the FASTA index includes the offset to the sequence itself and not the defline). This option is currently experimental and may slow down sequence fetches, so it is turned off by default.



    my $seq = $parser->fetch_seq(

Returns the requested sequence as a BioX::Seq object, or undef if no matching sequence is found. Requires a valid sequence identifier and optionally 1-based start and end coordinates to retrieve a substring (the entire sequence is returned by default). A fatal error is thrown if the provided coordinates are outside the range of [1-length(sequence)].


    $parser->write_index( 'path/to/file.fa.fai' );

Writes a samtools-compatible index file for the underlying sequence file. Accepts one optional argument specifying the path of the file to create (the default, which should usually not be changed, is the same as the underlying sequence file with a '.fai' extension added).

This method is now called automatically if a FASTA file is opened with no index file present.


    my @seq_ids = $parser->ids;

Returns an array of sequence IDs, ordered by their occurence in the underlying file.


    my $len = $parser->length( $seq_id );

Returns the length of the sequence given by $seq_id. May be marginally faster than fetching the sequence object and then finding the length.


BioX::Seq::Fetch supports files compressed with blocked gzip (BGZIP), typically using the bgzip utility. This allows for pseudo-random access without the need for full file decompression. The Compress::BGZIP module is required for this functionality.


Jeremy Volkening <jeremy *at*>


