PDL::EditDistance - Wagner-Fischer edit distance and alignment for PDLs.
use PDL; use PDL::EditDistance; ##-- input PDLs $a = pdl([map { ord($_) } qw(G U M B O)]); $b = pdl([map { ord($_) } qw(G A M B O L)]); $a1 = pdl([0, map { ord($_) } qw(G U M B O)]); $b1 = pdl([0, map { ord($_) } qw(G A M B O L)]); ##------------------------------------------------------------- ## Levenshtein distance $dist = edit_distance_static($a,$b, 0,1,1,1); ($dist,$align) = edit_align_static($a,$b, 0,1,1,1); ##------------------------------------------------------------- ## Wagner-Fischer distance @costs = ($costMatch=0,$costInsert=1,$costDelete=1,$costSubstitute=2); $dist = edit_distance_static($a,$b, @costs); ($dist,$align) = edit_align_static($a,$b, @costs); ##------------------------------------------------------------- ## General edit distance $costsMatch = random($a->nelem+1, $b->nelem+1); $costsIns = random($a->nelem+1, $b->nelem+1); $costsDel = random($a->nelem+1, $b->nelem+1); $costsSubst = random($a->nelem+1, $b->nelem+1); @costs = ($costsMatch,$costsIns,$costDel,$costsSubst); $dist = edit_distance_full($a,$b,@costs); ($dist,$align) = edit_align_full($a,$b,@costs); ##------------------------------------------------------------- ## Alignment $op_match = align_op_match(); ##-- constant $op_del = align_op_insert1(); ##-- constant $op_ins = align_op_insert2(); ##-- constant $op_subst = align_op_substitute(); ##-- constant ($apath,$bpath,$pathlen) = edit_bestpath($align); ($ai,$bi,$ops,$pathlen) = edit_pathtrace($align); ##------------------------------------------------------------- ## Longest Common Subsequence $lcs = edit_lcs($a,$b); ($ai,$bi,$lcslen) = lcs_backtrace($a,$b,$lcs);
Signature: (a(N); [o]apdl(N+1))
Convenience method. Returns a pdl $apdl() suitable for representing $a(), which can be specified as a UTF-8 or byte-string, as an arrays of numbers, or as a PDL. $apdl(0) is always set to zero.
Signature: (PDL::Type type; int N; int M; [o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))
Convenience method. Ensures existence and proper dimensionality of cost matrices for inputs of length N and M.
Signature: (PDL::Type type; int N1; int M1; [o]costsMatch(N1,M1); [o]costsIns(N1,M1); [o]costsDel(N1,M1); [o]costsSubst(N1,M1))
Low-level method. Ensures existence and proper dimensionality of cost matrices for inputs of length N1-1 and M1-1.
Signature: (PDL::Type type; int N; int M; staticCostMatch(); staticCostIns(); staticCostSubst(); [o]costsMatch(N+1,M+1); [o]costsIns(N+1,M+1); [o]costsDel(N+1,M+1); [o]costsSubst(N+1,M+1))
Convenience method.
Signature: (a(N); b(M); costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,M+1); costsSubst(N+1,M+1); [o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance matrix for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.
Signature: (a1(N1); b1(M1); costsMatch(N1,M1); costsIns(N1,M1); costsDel(N1,M1); costsSubst(N1,M1); [o]dist(N1,M1))
Low-level method. Compute the edit distance matrix for input PDLs $a1() and $b1() and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst().
The first elements of $a1() and $b1() are ignored.
_edit_distance_full does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(N); b(M); costsMatch(N+1,M+1); costsIns(N+1,M+1); costsDel(N+1,N+1); costsSubst(N+1,M+1); [o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b(), and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst(). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings.
Signature: (a1(N1); b1(M1); costsMatch(N1,M1); costsIns(N1,M1); costsDel(N1,M1); costsSubst(N1,M1); [o]dist(N1,M1); byte [o]align(N1,M1))
Low-level method. Compute the edit distance and alignment matrix for input PDLs $a1() and $b1() and cost matrices $costsMatch(), $costsIns(), $costsDel(), and $costsSubst().
_edit_align_full does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(N); b(M); staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst(); [o]dist(N+1,M+1))
Convenience method. Compute the edit distance matrix for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_distance_full($matches,@costs,$dist), but slightly faster.
Signature: (a1(N1); b1(M1); costMatch(); costIns(); costDel(); costSubst(); [o]dist(N1,M1))
Low-level method. Compute the edit distance matrix for input PDLs $a1() and $b1() given a static cost schema @costs = ($costMatch(), $costIns(), $costDel(), $costSubst()). Functionally identitical to _edit_distance_matrix_full($matches,@costs,$dist), but slightly faster.
_edit_distance_static does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(N); b(M); staticCostMatch(); staticCostIns(); staticCostDel(); staticCostSubst(); [o]dist(N+1,M+1); [o]align(N+1,M+1))
Convenience method. Compute the edit distance and alignment matrices for inputs $a() and $b() given a static cost schema @costs = ($staticCostMatch(), $staticCostIns(), $staticCostDel(), and $staticCostSubst()). $a() and $b() may be specified as PDLs, arrays of numbers, or as strings. Functionally equivalent to edit_align_full($matches,@costs,$dist), but slightly faster.
Signature: (a1(N1); b1(M1); costMatch(); costIns(); costDel(); costSubst(); [o]dist(N1,M1); byte [o]align(N1,M1))
Low-level method. Compute the edit distance and alignment matrices for input PDLs $a1() and $b1() given a static cost schema @costs = ($costMatch(), $costIns(), $costDel(), $costSubst()). Functionally identitical to _edit_distance_matrix_full($matches,@costs,$dist), but slightly faster.
_edit_align_static does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: ([o]a())
Alignment matrix value constant for insertion operations on $a() string.
align_op_insert1 does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
align_op_insert2 does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Alignment matrix value constant for matches.
align_op_match does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Alignment matrix value constant for substitution operations.
align_op_substitute does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Alias for align_op_insert1()
Alias for align_op_insert2()
Signature: ([o]ops(4))
Alignment matrix value constants 4-element pdl (match,insert1,insert2,substitute).a
Signature: (align(N+1,M+1); [o]apath(N+M+2); [o]bpath(N+M+2); [o]pathlen())
Convenience method. Compute best path through alignment matrix $align(). Stores paths for original input strings $a() and $b() in $apath() and $bpath() respectively. Negative values in $apath() and $bpath() indicate insertion/deletion operations. On completion, $pathlen() holds the actual length of the paths.
Signature: (align(N1,M1); int [o]apath(L); int [o]bpath(L); int [o]len(); int ifinal; int jfinal)
Low-level method. Compute best path through alignment matrix $align() from final index ($ifinal,$jfinal). Stores paths for (original) input strings $a() and $b() in $apath() and $bpath() respectively. Negative values in $apath() and $bpath() indicate insertion/deletion operations. On completion, $pathlen() holds the actual length of the paths.
_edit_bestpath does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: ( align(N+1,M+1); [o]ai(L); [o]bi(L); [o]ops(L); [o]$pathlen() )
Convenience method. Compute alignment path backtrace through alignment matrix $align() from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively. Unlike edit_bestpath(), null-moves for $ai() and $bi() are not stored here as negative values. Returned pdls ($ai,$bi,$ops) are trimmed to the appropriate path length.
Signature: (align(N1,M1); int [o]ai(L); int [o]bi(L); int [o]ops(L); int [o]len(); int ifinal; int jfinal)
Low-level method. Compute alignment path backtrace through alignment matrix $align() from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively. Unlike edit_bestpath(), null-moves for $ai() and $bi() are not stored here as negative values. Returned pdls ($ai,$bi,$ops) are trimmed to the appropriate path length.
_edit_pathtrace does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(N); b(M); int [o]lcs(N+1,M+1);)
Convenience method. Compute the longest common subsequence (LCS) matrix for input PDLs $a1() and $b1(). The output matrix $lcs() contains at cell ($i+1,$j+1) the length of the LCS between $a1(0..$i) and $b1(0..$j); thus $lcs($N,$M) contains the length of the LCS between $a() and $b().
Signature: (a1(N1); b1(M1); int [o]lcs(N1,M1))
Low-level method. Compute the longest common subsequence (LCS) matrix for input PDLs $a1() and $b1(). The initial (zeroth) elements of $a1() and $b1() are ignored. The output matrix $lcs() contains at cell ($i,$j) the length of the LCS between $a1(1..$i) and $b1(1..$j); thus $lcs($N1-1,$M1-1) contains the length of the LCS between $a1() and $b1().
_edit_lcs does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(N); b(M); int lcs(N+1,M+1); int ifinal(); int jfinal(); int [o]ai(L); int [o]bi(L); int [o]len())
Convenience method. Compute longest-common-subsequence backtrace through LCS matrix $lcs() for original input strings ($a(),$b()) from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively.
Signature: (a1(N1); b1(M1); int lcs(N1,M1); int ifinal(); int jfinal(); [o]ai(L); [o]bi(L); int [o]len())
Low-level method. Compute longest-common-subsequence backtrace through LCS matrix $lcs() for initial-padded strings ($a1(),$b1()) from final index ($ifinal,$jfinal). Stores raw paths for (original) input strings $a() and $b() in $ai() and $bi() respectively.
_lcs_backtrace does not process bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Perl by Larry Wall.
PDL by Karl Glazebrook, Tuomas J. Lukka, Christian Soeller, and others.
Probably many.
Bryan Jurish <moocow@cpan.org>
Copyright (C) 2006-2015, Bryan Jurish. All rights reserved.
This package is free software, and entirely without warranty. You may redistribute it and/or modify it under the same terms as Perl itself, either Perl 5.20.2, or at your option any later version of Perl 5.
perl(1), PDL(3perl).
To install PDL::EditDistance, copy and paste the appropriate command in to your terminal.
cpanm
cpanm PDL::EditDistance
CPAN shell
perl -MCPAN -e shell install PDL::EditDistance
For more information on module installation, please visit the detailed CPAN module installation guide.