The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Image::Similar - find out how similar two images are

SYNOPSIS

    use Image::Similar 'load_image';
    use Imager;
    use FindBin '$Bin';
    my $x = Imager->new ();
    # Get image data from file
    $x->read (file => "$Bin/x.png");
    # Load image into Image::Similar
    my $xi = load_image ($x);
    my $y = Imager->new ();
    # Get image data from file
    $y->read (file => "$Bin/y.jpg");
    # Load image into Image::Similar
    my $yi = load_image ($y);
    print "The difference is ", $xi->diff ($yi), ".\n";

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents Image::Similar version 0.07 corresponding to git commit b26adc57d72672372aa48cd62a27696160b74ba1 released on Sat Jul 15 15:57:13 2017 +0900.

DESCRIPTION

This is an experimental module for comparing images. It uses a simplified form of the algorithm described in "An image signature for any kind of image" to calculate image signatures and distances between images.

The algorithm consists of converting the image into greyscale, chopping it into a grid, and then computing a signature based on relative lightness and darkness of the blocks of the grid.

The module does not contain its own image-reading facility, so images must be loaded to the module via one of the following supported Perl modules:

Imager (recommended)

All image types are supported. If you have no preference, I suggest using Imager, since it is a very well-behaved module. The conversion to greyscale is done using Imager's own routines.

GD

All image types are supported. RGB images are combined to greyscale using constants taken from the source code of Imager.

Image::Imlib2

All image types are supported. RGB images are combined to greyscale using constants taken from the source code of Imager.

Image::PNG::Libpng

This module is used for some internals of Image::Similar related to testing, thus it was installed when you installed Image::Similar. However, Image::PNG::Libpng is only for PNG images.

Image::Similar supports all PNG image types. It currently only supports bit depths of eight.

RGB images are combined to greyscale using constants taken from the source code of Imager. As of this version, there is no handling of the alpha channel (transparent pixels) and the background value is ignored.

Use "load_image" to load the image.

FUNCTIONS

load_image

This loads image data from various modules into an Image::Similar object. The return value is the Image::Similar object.

Using Imager:

    use Imager;
    my $img = Imager->new ();
    $img->read (file => 'my.jpg');
    my $is = load_image ($img);
    

Using Image::PNG::Libpng:

    use Image::PNG::Libpng ':all';
    my $img = read_png_file ('my.png');
    my $is = load_image ($img);
    

The return value is an Image::Similar object.

Using GD:

    use Image::Similar 'load_image';
    use GD;
    my $gd = GD::Image->newFromPng ("t/images/chess/chess-100.png");
    my $is = load_image ($gd);

Using Image::Imlib2:

    use Image::Similar 'load_image';
    use Image::Imlib2;
    my $imlib2 = Image::Imlib2->load ("t/images/chess/chess-100.png");
    my $is = load_image ($imlib2);

METHODS

new

    my $is = Image::Similar->new (height => 10, width => 10);

Unless you want to change internals, use "load_image" instead of this.

The returned image currently contains a field $is->{image} which you need to use the "set_pixel" method on to set the pixels.

diff

    my $diff = $is1->diff ($is2);

This returns a floating-point number which is the difference between images $is1 and $is2. This is meant to be approximately the same value as given by "vector_euclidean_length()" in Image::Libpuzzle, but no validation has been carried out. Both $is1 and $is2 are Image::Similar objects created using "load_image".

signature

    my $sig = $is->signature ();

Get the signature of the image. This is a text string consisting of digits 0-4 which identifies the image. The following example demonstrates getting the signature of two similar images.

    use FindBin '$Bin';
    use Image::Similar 'load_image';
    use Imager;
    for my $n (100, 1000) {
        my $image = "$Bin/../t/images/lenagercke/lena-$n.png";
        my $imager = Imager->new ();
        $imager->read (file => $image);
        my $is = load_image ($imager);
        print $is->signature (), "\n";
    }
    

(This example is included as show-hash.pl in the distribution.)

Its output looks like this:

    1333333311333333331333312311231111311311312233112213311123331311213111112111111323111133233213332131131111212223112213313323333313211133332323333222233133223313132131331211312223112212313323331311213131112111111123111133233333132131331111222323112111213323333313211312332331333121132133233314132130331211132311112111133123133113233333212111311123131134234334032030231312232312312131131121111133233311112111113123123134234314022030221211232311232131111121111133231333322333333121133134234323032020112221132333212312213123311211112313423313213331
    1333333311333333331333211311331111311311312233112223321122332311213121112111112323111133233323332131131111212222112212312323333313211223332333333121133133232313132231231211322223212212313223231311223131112112111123111133233333132131331111222323112111213323333112211311332331323121132133233313132130331211132311112111133123133133233333312111312123131134234334132030231312232312312131131121111133233311112111113123113134234314022030221211232311232131111121111133231323332333333121133134234323032020112221132333212312213123311211112313423313213331

sig_diff

    my $diff = $is->sig_diff ($sig);

Get the difference between $sig and the image represented by $is.

load_signature

    my $is = load_signature ($sig);

Load $is, an Image::Similar object, from $sig.

TESTING AND INTERNAL METHODS

This section lists the testing and internal methods of the module, for people interested in extending or otherwise improving it. Since these are internal private methods, these are subject to change without notice.

write_png

    $is->write_png ('test.png');

This is used in conjunction with "png_compare" in Image::PNG::Libpng (version 0.42 or later) to check that Image::Similar has correctly read in the image, by writing out Image::Similar's internal data as a PNG file.

load_image_gd

    use Image::Similar 'load_image';
    use GD;
    my $gd = GD::Image->newFromPng ("t/images/chess/chess-100.png");
    my $is = load_image ($gd);

This is the internal routine used by "load_image" to load GD images.

load_image_imlib2

This is the internal routine used by "load_image" to load Image::Imlib2 images.

load_image_imager

    my $is = load_image_imager ($imager, %options);

This is the internal routine used by "load_image" to load Imager images. It is not exported. The options are

make_grey_png
    my $is = load_image_imager ($imager, make_grey_png => 'imager.png');

Make the greyscale PNG for comparing to Image::Similar's internal version. See "write_png" for how to extract Image::Similar's internal version.

load_image_libpng

    my $is = load_image_libpng ($libpng);

This loads an image from the return value of "read_png_file" in Image::PNG::Libpng.

Image::Similar::Image methods

These methods work on the XS object within an Image::Similar, which is called Image::Similar::Image.

fill_grid

    fill_grid ($img);

Calculate the image's signature and store it within $img. All the pixel values should have been set with "set_pixel" before calling this. This method is called automatically by "load_image". "load_signature" overrides it with values from the signature, so this method should only be used when calling "new", filling the pixels by the user, and then making the signature "by hand" rather than via "load_image".

image_diff

    my $diff = image_diff ($img1, $img2);

This computes the value of "diff" from the signatures within $img1 and $img2.

set_pixel

    $img->set_pixel ($x, $y, $grey);

Set a greyscale pixel within the image. $x and $y need to be integers, and $grey needs to be an integer between 0-255. Typically one would first set the width and height of the image with "new", then get the Image::Similar::Image object from the Image::Similar object, then set its pixels with this method, then compute its signature with "fill_grid".

get_rows

    my $rows = $img->get_rows ();

Get the greyscale pixels from $img as an array reference $rows containing strings of bytes, one byte per pixel.

signature

    my $sig = $image->signature ();

Return the signature value which is set either by "fill_grid" or directly by "fill_from_sig".

valid_image

    if ($image->valid_image ()) {
        # do something with image data
    }

This returns a true value only if $image contains valid image data. This is to distinguish between an image which is loaded from a stored signature using "fill_from_sig" and one which is loaded from an actual image.

fill_from_sig

    my $image = Image::Similar::Image::fill_from_sig ($sig);

Fill $image using signature data.

EXAMPLES

Search many files for duplicate images

This script makes a list of all files which may be images:

    # Construct a list of all images on the accessible file systems.
    
    use File::Find;
    use FindBin '$Bin';
    
    # The list of files under construction.
    
    my @files;
    main ();
    exit;
    
    # This returns a true value if its argument is an image file.
    
    sub is_image_file
    {
        my ($file) = @_;
        if ($file =~ /\.(jpg|png|gif|jpeg)$/i) {
            return 1;
        }
        return undef;
    }
    
    sub check_file
    {
        if (is_image_file ($File::Find::name)) {
            push @files, $File::Find::name;
        }
    }
    
    sub write_files
    {
        open my $out, ">", "$Bin/image-list.txt" or die $!;
        for (@files) {
            print $out "$_\n";
        }
        close $out or die $!;
    }
    
    sub main
    {
        find ({
            wanted => \& check_file,
        }, "$Bin/..");
        write_files ();
    }

(This example is included as find-all-images.pl in the distribution.)

This script then gets all the signatures of the images and compares them looking for similar images.

    # Make signatures for all the images.
    
    use utf8;
    use FindBin '$Bin';
    use Image::Similar ':all';
    use Imager;
    
    main ();
    
    sub main
    {
        my %sigs;
        open my $in, "<", "$Bin/image-list.txt" or die $!;
        while (<$in>) {
            chomp;
            if (/^\s*$/) {
                next;
            }
            my $image = $_; 
            my $imager = Imager->new ();
            my $ok = $imager->read (file => $image); 
            if (! $ok) {
                warn "$image is not ok: ", $imager->errstr ();
                next;
            }
            my $is = load_image ($imager);
            my $sig = $is->signature ();
            if (! $sig) {
                die "No signature for $image";
            }
            if ($sigs{$sig}) {
                # Identical match.
                print "$sigs{$sig} looks identical to $image.\n";
            }
            else {
                for my $k (keys %sigs) {
                    my $diff = $is->sig_diff ($k);
                    if ($diff < 0.1) {
                        print "$sigs{$k} looks similar to $image.\n";
                    }
                }
                # Don't overwrite $sigs{$sig} if it already has a value.
                $sigs{$sig} = $image;
            }
        }
        close $in or die $!;
    }

(This example is included as make-signatures.pl in the distribution.)

KNOWN PROBLEMS

Unimplemented parts of the original algorithm

The following parts of the original algorithm are unimplemented as of this version:

Cropping

The 5% and 95% image cropping methods described in the paper are not used.

Soft pixels

The soft pixel method is not used.

Histogram of image

There is no balancing of the greyscale of the image using a histogram, it only uses the raw pixel values.

SEE ALSO

Other CPAN modules

Image::Libpuzzle

This uses a similar algorithm to Image::Similar, but it requires installing a third-party library called libpuzzle, as well as the gd library.

Image::Seek

This uses ImgSeek to find similar pictures in a library. It can load images via Imager, Image::Imlib2, or GD.

References

An image signature for any kind of image

An image signature for any kind of image by H. Chi Wong, Marshall Bern, and David Goldberg, published in Proceedings: 2002 International Conference on Image Processing, Volume 1, date 22-25 September 2002.

Other

Finding Similar Images

An article from Randal Schwartz from 2003. Contains Perl source code for finding similar images.

Questions about image similarity at Stackoverflow

Contains information about more libraries.

findimagedupes

A Perl script for finding duplicate and similar images by Rob Kudla / Jonathan H N Chin.

DEPENDENCIES

Image::PNG::Libpng

This is the fallback image loading module used if no other option is installed.

"looks_like_number" in Scalar::Util

This is used to validate the parameters of "new".

"carp" in Carp

This is used to warn the user about input values.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2016-2017 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.