Thomas Sibley 🚲
and 1 contributors

NAME

Bio::Cigar - Parse CIGAR strings and translate coordinates to/from reference/query

SYNOPSIS

    use 5.014;
    use Bio::Cigar;
    my $cigar = Bio::Cigar->new("2M1D1M1I4M");
    say "Query length is ", $cigar->query_length;
    say "Reference length is ", $cigar->reference_length;
    
    my ($qpos, $op) = $cigar->rpos_to_qpos(3);
    say "Alignment operation at reference position 3 is $op";

DESCRIPTION

Bio::Cigar is a small library to parse CIGAR strings ("Compact Idiosyncratic Gapped Alignment Report"), such as those used in the SAM file format. CIGAR strings are a run-length encoding which minimally describes the alignment of a query sequence to an (often longer) reference sequence.

Parsing follows the SAM v1 spec for the CIGAR column.

Parsed strings are represented by an object that provides a few utility methods.

ATTRIBUTES

All attributes are read-only.

string

The CIGAR string for this object.

reference_length

The length of the reference sequence segment aligned with the query sequence described by the CIGAR string.

query_length

The length of the query sequence described by the CIGAR string.

ops

An arrayref of [length, operation] tuples describing the CIGAR string. Lengths are integers, possible operations are below.

CIGAR operations

The CIGAR operations are given in the following table, taken from the SAM v1 spec:

    Op  Description
    ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
    M   alignment match (can be a sequence match or mismatch)
    I   insertion to the reference
    D   deletion from the reference
    N   skipped region from the reference
    S   soft clipping (clipped sequences present in SEQ)
    H   hard clipping (clipped sequences NOT present in SEQ)
    P   padding (silent deletion from padded reference)
    =   sequence match
    X   sequence mismatch

    • H can only be present as the first and/or last operation.
    • S may only have H operations between them and the ends of the string.
    • For mRNA-to-genome alignment, an N operation represents an intron.
      For other types of alignments, the interpretation of N is not defined.
    • Sum of the lengths of the M/I/S/=/X operations shall equal the length of SEQ.

CONSTRUCTOR

new

Takes a CIGAR string as the sole argument and returns a new Bio::Cigar object.

METHODS

rpos_to_qpos

Takes a reference position (origin 1, base-numbered) and returns the corresponding position (origin 1, base-numbered) on the query sequence. Indels affect how the numbering maps from reference to query.

In list context returns a tuple of [query position, operation at position]. Operation is a single-character string. See the table of CIGAR operations.

If the reference position does not map to the query sequence (as with a deletion, for example), returns undef or [undef, operation].

qpos_to_rpos

Takes a query position (origin 1, base-numbered) and returns the corresponding position (origin 1, base-numbered) on the reference sequence. Indels affect how the numbering maps from query to reference.

In list context returns a tuple of [references position, operation at position]. Operation is a single-character string. See the table of CIGAR operations.

If the query position does not map to the reference sequence (as with an insertion, for example), returns undef or [undef, operation].

op_at_rpos

Takes a reference position and returns the operation at that position. Simply a shortcut for calling "rpos_to_qpos" in list context and discarding the first return value.

op_at_qpos

Takes a query position and returns the operation at that position. Simply a shortcut for calling "qpos_to_rpos" in list context and discarding the first return value.

AUTHOR

Thomas Sibley <trsibley@uw.edu>

COPYRIGHT

Copyright 2014- Mullins Lab, Department of Microbiology, University of Washington.

LICENSE

This library is free software; you can redistribute it and/or modify it under the GNU General Public License, version 2.

SEE ALSO

SAMv1 spec