Text::Levenshtein::Edlib - XS edit distance and optimal alignment path calculation
use feature 'say'; use Text::Levenshtein::Edlib qw/:all/; say distance 'kitten', 'sitting'; # prints '3' say 'Distance > 2!' if !defined distance 'kitten', 'sitting', 2; # prints 'Distance > 2!' my $align = align('kitten', 'sitting'); say "Edit distance is: $align->{editDistance}"; say "Alphabet length is: $align->{alphabetLength}"; say "Start locations are: @{$align->{startLocations}}"; say "End locations are: @{$align->{endLocations}}"; say "Alignment path is: @{$align->{alignment}}"; say "Alignment path (in CIGAR format): ", to_cigar $align->{alignment}; say "Alignment path (in extended CIGAR format): ", to_cigar $align->{alignment}, EDLIB_CIGAR_EXTENDED;
Text::Levenshtein::Edlib is a wrapper around the edlib library that computes Levenshtein edit distance and optimal alignment path for a pair of strings.
It does not handle UTF-8 strings, for those Text::Levenshtein::XS can compute edit distance but not alignment path.
This module has two functions:
This is the basic interface to the library. It is compatible with the function of the same name in Text::Levenshtein::XS.
It returns the edit distance between the two given strings. If the third argument is specified, and the edit distance is greater than the value of the third argument, then the function finishes the computation early and returns undef. See below for the meaning of the optional $mode argument.
This is the full-featured interface to the library.
It returns a hashref with the following keys:
editDistance
The edit distance of the two strings.
alphabetLength
The number of different characters in the query and target together.
endLocations
Array of zero-based positions in target where optimal alignment paths end. If gap after query is penalized, gap counts as part of query (NW), otherwise not.
startLocations
Array of zero-based positions in target where optimal alignment paths start, they correspond to endLocations. If gap before query is penalized, gap counts as part of query (NW), otherwise not.
alignment
Alignment is found for first pair of start and end locations. Alignment is sequence of numbers: 0, 1, 2, 3. 0 stands for match. 1 stands for insertion to target. 2 stands for insertion to query. 3 stands for mismatch. You can use the EDLIB_EDOP_* constants instead of 0, 1, 2, and 3. Alignment aligns query to target from begining of query till end of query. If gaps are not penalized, they are not in alignment.
EDLIB_EDOP_*
The third argument, $max_distance, works similarly to the third argument of distance: if the distance is more than its value, this function returns an empty hashref. Default value is -1, which disables this optimization.
The fourth argument, $mode, chooses how Edlib should treat gaps before and after query. The options are:
EDLIB_MODE_NW
Global method - gaps are not ignored. This is the standard Levenshtein distance, and is the default if $mode is not specified.
EDLIB_MODE_SHW
Prefix method - gaps at query end are ignored. So the edit distance between AACT and AACTGGC is 0, because we can ignore the GGC at the end.
AACT
AACTGGC
GGC
EDLIB_MODE_HW
Infix method - gaps at both query start and end are ignored. So the edit distance between ACT and CGACTGAC is 0, because CG at the beginning and GAC at the end of the target are ignored.
ACT
CGACTGAC
CG
GAC
The fifth argument, $task, chooses what we want to compute. The options are:
EDLIB_TASK_PATH
All the keys described above will be computed.
EDLIB_TASK_LOC
All keys except for alignment will be computed.
EDLIB_TASK_DISTANCE
All keys except for alignment and startLocations will be computed.
The less the function computes, the faster it runs.
All constants by default. You can export the functions align, distance and to_cigar and any of the constants below. You can use the tags :constants to export every constant, and :all to export every constant, align, distance and to_cigar.
align
distance
to_cigar
:constants
:all
EDLIB_CIGAR_EXTENDED EDLIB_CIGAR_STANDARD EDLIB_EDOP_DELETE EDLIB_EDOP_INSERT EDLIB_EDOP_MATCH EDLIB_EDOP_MISMATCH EDLIB_MODE_HW EDLIB_MODE_NW EDLIB_MODE_SHW EDLIB_STATUS_ERROR EDLIB_STATUS_OK EDLIB_TASK_DISTANCE EDLIB_TASK_LOC EDLIB_TASK_PATH
https://github.com/Martinsos/edlib/, http://martinsosic.com/edlib/edlib_8h.html
Marius Gavrilescu, <marius@ieval.ro>
Copyright (C) 2017 by Marius Gavrilescu
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.22.3 or, at your option, any later version of Perl 5 you may have available.
To install Text::Levenshtein::Edlib, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Text::Levenshtein::Edlib
CPAN shell
perl -MCPAN -e shell install Text::Levenshtein::Edlib
For more information on module installation, please visit the detailed CPAN module installation guide.