Text::OverlapFinder - Find Overlapping Words in Strings


    # this will list out the overlaps found in two strings
    # note that the overlaps are found among space separated
    # tokens, there are no partial word matches
    # ('cat' will not match 'at' or 'cats', for example)

    use Text::OverlapFinder;
    my $finder = Text::OverlapFinder->new;
    defined $finder or die "Construction of Text::OverlapFinder failed";

    my $string1 = 'aaa bbb ccc ddd eee';
    my $string2 = 'aa bbb ccc dd ee aaa';

    # overlaps is a hash of references to the overlaps found
    # len1 and len2 are the lengths of the strings in terms of words

    my ($overlaps, $len1, $len2) = $finder->getOverlaps ($string1, $string2); 
    foreach my $overlap (keys %$overlaps) {
        print "$overlap occurred $overlaps->{$overlap} times.\n";
    print "length of string 1 = $len1 length of string 2 = $len2\n";


This module finds word overlaps in strings. It finds the longest possible overlap, and keeps track of how many time each overlap occurs.

There is a mechanism available for a user to provide a stemming module, but no stemmer is provided by this package as yet.


 Ted Pedersen, University of Minnesota, Duluth
 tpederse at

 Siddharth Patwardhan, University of Utah
 sidd at

 Satanjeev Banerjee, Carnegie-Mellon University
 banerjee at

 Jason Michelizzi 

 Ying Liu, University of Minnesota, Twin Cities
 liux0395 at

Last modified by: $Id:,v 1.4 2015/10/08 13:06:27 tpederse Exp $


Copyright (C) 2004-2010 by Jason Michelizzi, Ted Pedersen, Siddharth Patwardhan, Satanjeev Banerjee and Ying Liu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA