NAME

Bio::Gonzales::Seq::IO - fast utility functions for sequence IO

SYNOPSIS

    use Bio::Gonzales::Seq::IO qw( faslurp faspew fahash fasubseq faiterate )

DESCRIPTION

SUBROUTINES

@seqs = faslurp(@filenames)
$seqsref = faslurp(@filenames)

faslurp reads in all sequences from @filenames and returns an array in list or an arrayref in scalar context of the read sequences. The sequences are stored as FAlite2::Entry objects.

$iterator = faiterate($filename)

Allows you to create an iterator for the fasta file $filename. This iterator can be used to loop over the sequence file w/o reading in all content at once. Iterator usage:

    while(my $sequence_object = $iterator->()) {
        #do something with the sequence object
    }
$seqs = fasubseq($file, \@ids_with_locations, \%c)
$seqs = fasubseq($file, \@id_list, \%c)
    #ARRAY OF ARRAYS
    @ids_with_locations = (
        [ $id, $begin, $end, $strand ],
        ...
    );

Config options can be:

    %c = (
        keep_id => 1, # keeps the original id of the sequence
        wrap => 1, # see further down
        relaxed_range => 1, # substitute 0 or undef for $begin with '^' and for $end with '$'
    );

There are several possibilities for $begin and $end:

    GGCAAAGGA ATGATGGTGT GCAGGCTTGG CATGGGAGAC
    ^..........^                                (1,11) OR ('^', 11)
       ^.....................................^  (4,'$')
                          ^..............^      (21,35) { with wrap on: OR (-19,35) OR (-19, -5) }
                          ^..................^  (21,35) { with wrap on: OR (-19,'$') }
    

wrap: The default is to limit all negative values to the sequence boundaries, so a negative begin would be equal to 1 or '^' and a negative end would be equal to '$'.

$sref = fahash(@filenames)
%seqs = fahash(@filenames)

Does the same as faslurp, but returns an hash with the sequence ids as keys and the sequence objects as values.

faspew($file, $seq1, $seq2, ...)

"spew" out the given sequences to a file. Every $seqN argument can be an hash reference with FAlite2::Entry objects as values or an array reference of FAlite2::Entry objects or just plain FAlite2::Entry objects.

$iterator = faspew_iterate($filename)
$iterator = faspew_iterate($fh)

Creates an iterator that writes the sequences to the given $filename or $fh.

    for my $sequence_object (@sequences) {
        $iterator->($sequence_object)
    }
    #DO NOT FORGET THIS, THIS CALL WILL CLOSE THE FILEHANDLE
    $iterator->();

    #this is equal to:

    $iterator->(@sequences);
    $iterator->();
    #or
    $iterator->(\@sequences);
    $iterator->();


    #DO NOT DO THIS:

    $iterator->();

The filehandle will not be closed in case one supplies not a $filename but a $fh handle.

ADVANCED

change the output format
    $Bio::Gonzales::Seq::IO::WIDTH = 60; #sequence width in fasta output

    #but only if set to 'all_pretty' ('all' is default)
    $Bio::Gonzales::Seq::IO::SEQ_FORMAT = 'all_pretty'; 

SEE ALSO

AUTHOR

jw bargsten, <joachim.bargsten at wur.nl>