NAME
String::Fuzzy - Python-style fuzzy string matching (fuzzywuzzy port)
SYNOPSIS
# Basic ratio with normalization (default)
my
$score
= ratio(
"Hello"
,
"hello"
);
# 100 (normalized)
# Disable normalization for case-sensitive matching
my
$raw_score
= ratio(
"Hello"
,
"hello"
,
normalize
=> 0 );
# ~80
# Find best match with index
my
$best
= extract_best(
"cat"
, [
"cat"
,
"category"
,
"dog"
],
scorer
=> \
&partial_ratio
);
"Best: $best->[0], Score: $best->[1], Index: $best->[2]\n"
;
# Get all matches sorted by score
my
$all
= extract_all(
"cat"
, [
"cat"
,
"category"
,
"dog"
] );
for
(
@$all
) {
"Match: $_->[0], Score: $_->[1]\n"
; }
# Practical example: Find the best vendor match with a typo
my
@vendors
=
qw( SendGrid Mailgun SparkPost Postmark )
;
my
$input
=
"SpakPost Invoice"
;
my
$best_score
= 0;
my
$best_vendor
;
for
my
$vendor
(
@vendors
) {
my
$score
= fuzzy_substring_ratio(
$vendor
,
$input
);
if
(
$score
>
$best_score
) {
$best_score
=
$score
;
$best_vendor
=
$vendor
;
}
}
if
(
$best_score
>= 85 ) {
"Matched '$best_vendor' with score $best_score\n"
;
# SparkPost, 88.89
}
VERSION
v0.1.1
DESCRIPTION
This module provides fuzzy string matching similar to Python's fuzzywuzzy library, faithfully replicating its core functionality and behavior in a Perl context. It supports multiple strategies for comparing strings with typos, extra words, or inconsistent formatting. By default, strings are normalized (lowercased, diacritics removed, punctuation stripped), but this can be disabled with the normalize
option.
FUNCTIONS
All functions accept an optional normalize
parameter (default: 1) to toggle string normalization.
ratio($a, $b, %opts)
Computes Levenshtein similarity between two strings, returning a score from 0 to 100. Returns a float for precision.
partial_ratio($a, $b, %opts)
Slides the shorter string over the longer one to find the best fixed-length match.
Returns 100 if the shorter string is fully contained in the longer one.
fuzzy_substring_ratio($needle, $haystack, %opts)
Searches for the best fuzzy match of $needle
in $haystack
across variable-length windows. Useful for OCR noise or embedded typos.
token_sort_ratio($a, $b, %opts)
Ignores word order by sorting tokens before comparison.
token_set_ratio($a, $b, %opts)
Focuses on common word tokens, ignoring duplicates and order.
extract_best($query, \@choices, %opts)
Returns the best match as [$string, $score, $index]
. Accepts scorer
(default: \&ratio
) and limit
(default: 1) for top-N results.
extract_all($query, \@choices, %opts)
Returns all matches as [[string, score], ...]
, sorted by score descending.
Accepts scorer
(default: \&ratio
).
AUTHOR
Albert (ChatGPT) from OpenAI, with enhancements by Grok 3 from xAI.
Supported by Jacques Deguest <jack@deguest.jp>.
SEE ALSO
Text::Approx, Text::Levenshtein::XS, Text::Fuzzy, String::Approx, Text::Levenshtein::Damerau
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.