NAME
Image::Similar - find out how similar two images are
SYNOPSIS
use Image::Similar 'load_image';
use Imager;
use FindBin '$Bin';
my $x = Imager->new ();
# Get image data from file
$x->read (file => "$Bin/x.png");
# Load image into Image::Similar
my $xi = load_image ($x);
my $y = Imager->new ();
# Get image data from file
$y->read (file => "$Bin/y.jpg");
# Load image into Image::Similar
my $yi = load_image ($y);
print "The difference is ", $xi->diff ($yi), ".\n";
(This example is included as synopsis.pl in the distribution.)
VERSION
This documents Image::Similar version 0.07 corresponding to git commit b26adc57d72672372aa48cd62a27696160b74ba1 released on Sat Jul 15 15:57:13 2017 +0900.
DESCRIPTION
This is an experimental module for comparing images. It uses a simplified form of the algorithm described in "An image signature for any kind of image" to calculate image signatures and distances between images.
The algorithm consists of converting the image into greyscale, chopping it into a grid, and then computing a signature based on relative lightness and darkness of the blocks of the grid.
The module does not contain its own image-reading facility, so images must be loaded to the module via one of the following supported Perl modules:
- Imager (recommended)
-
All image types are supported. If you have no preference, I suggest using Imager, since it is a very well-behaved module. The conversion to greyscale is done using Imager's own routines.
- GD
-
All image types are supported. RGB images are combined to greyscale using constants taken from the source code of Imager.
- Image::Imlib2
-
All image types are supported. RGB images are combined to greyscale using constants taken from the source code of Imager.
- Image::PNG::Libpng
-
This module is used for some internals of Image::Similar related to testing, thus it was installed when you installed Image::Similar. However, Image::PNG::Libpng is only for PNG images.
Image::Similar supports all PNG image types. It currently only supports bit depths of eight.
RGB images are combined to greyscale using constants taken from the source code of Imager. As of this version, there is no handling of the alpha channel (transparent pixels) and the background value is ignored.
Use "load_image" to load the image.
FUNCTIONS
load_image
This loads image data from various modules into an Image::Similar object. The return value is the Image::Similar object.
Using Imager:
use Imager;
my $img = Imager->new ();
$img->read (file => 'my.jpg');
my $is = load_image ($img);
Using Image::PNG::Libpng:
use Image::PNG::Libpng ':all';
my $img = read_png_file ('my.png');
my $is = load_image ($img);
The return value is an Image::Similar object.
Using GD:
use Image::Similar 'load_image';
use GD;
my $gd = GD::Image->newFromPng ("t/images/chess/chess-100.png");
my $is = load_image ($gd);
Using Image::Imlib2:
use Image::Similar 'load_image';
use Image::Imlib2;
my $imlib2 = Image::Imlib2->load ("t/images/chess/chess-100.png");
my $is = load_image ($imlib2);
METHODS
new
my $is = Image::Similar->new (height => 10, width => 10);
Unless you want to change internals, use "load_image" instead of this.
The returned image currently contains a field $is->{image}
which you need to use the "set_pixel" method on to set the pixels.
diff
my $diff = $is1->diff ($is2);
This returns a floating-point number which is the difference between images $is1
and $is2
. This is meant to be approximately the same value as given by "vector_euclidean_length()" in Image::Libpuzzle, but no validation has been carried out. Both $is1
and $is2
are Image::Similar objects created using "load_image".
signature
my $sig = $is->signature ();
Get the signature of the image. This is a text string consisting of digits 0-4 which identifies the image. The following example demonstrates getting the signature of two similar images.
use FindBin '$Bin';
use Image::Similar 'load_image';
use Imager;
for my $n (100, 1000) {
my $image = "$Bin/../t/images/lenagercke/lena-$n.png";
my $imager = Imager->new ();
$imager->read (file => $image);
my $is = load_image ($imager);
print $is->signature (), "\n";
}
(This example is included as show-hash.pl in the distribution.)
Its output looks like this:
1333333311333333331333312311231111311311312233112213311123331311213111112111111323111133233213332131131111212223112213313323333313211133332323333222233133223313132131331211312223112212313323331311213131112111111123111133233333132131331111222323112111213323333313211312332331333121132133233314132130331211132311112111133123133113233333212111311123131134234334032030231312232312312131131121111133233311112111113123123134234314022030221211232311232131111121111133231333322333333121133134234323032020112221132333212312213123311211112313423313213331
1333333311333333331333211311331111311311312233112223321122332311213121112111112323111133233323332131131111212222112212312323333313211223332333333121133133232313132231231211322223212212313223231311223131112112111123111133233333132131331111222323112111213323333112211311332331323121132133233313132130331211132311112111133123133133233333312111312123131134234334132030231312232312312131131121111133233311112111113123113134234314022030221211232311232131111121111133231323332333333121133134234323032020112221132333212312213123311211112313423313213331
sig_diff
my $diff = $is->sig_diff ($sig);
Get the difference between $sig
and the image represented by $is
.
load_signature
my $is = load_signature ($sig);
Load $is
, an Image::Similar object, from $sig
.
TESTING AND INTERNAL METHODS
This section lists the testing and internal methods of the module, for people interested in extending or otherwise improving it. Since these are internal private methods, these are subject to change without notice.
write_png
$is->write_png ('test.png');
This is used in conjunction with "png_compare" in Image::PNG::Libpng (version 0.42 or later) to check that Image::Similar has correctly read in the image, by writing out Image::Similar's internal data as a PNG file.
load_image_gd
use Image::Similar 'load_image';
use GD;
my $gd = GD::Image->newFromPng ("t/images/chess/chess-100.png");
my $is = load_image ($gd);
This is the internal routine used by "load_image" to load GD images.
load_image_imlib2
This is the internal routine used by "load_image" to load Image::Imlib2 images.
load_image_imager
my $is = load_image_imager ($imager, %options);
This is the internal routine used by "load_image" to load Imager images. It is not exported. The options are
- make_grey_png
-
my $is = load_image_imager ($imager, make_grey_png => 'imager.png');
Make the greyscale PNG for comparing to Image::Similar's internal version. See "write_png" for how to extract Image::Similar's internal version.
load_image_libpng
my $is = load_image_libpng ($libpng);
This loads an image from the return value of "read_png_file" in Image::PNG::Libpng.
Image::Similar::Image methods
These methods work on the XS object within an Image::Similar, which is called Image::Similar::Image.
fill_grid
fill_grid ($img);
Calculate the image's signature and store it within $img
. All the pixel values should have been set with "set_pixel" before calling this. This method is called automatically by "load_image". "load_signature" overrides it with values from the signature, so this method should only be used when calling "new", filling the pixels by the user, and then making the signature "by hand" rather than via "load_image".
image_diff
my $diff = image_diff ($img1, $img2);
This computes the value of "diff" from the signatures within $img1
and $img2
.
set_pixel
$img->set_pixel ($x, $y, $grey);
Set a greyscale pixel within the image. $x
and $y
need to be integers, and $grey
needs to be an integer between 0-255. Typically one would first set the width and height of the image with "new", then get the Image::Similar::Image object from the Image::Similar object, then set its pixels with this method, then compute its signature with "fill_grid".
get_rows
my $rows = $img->get_rows ();
Get the greyscale pixels from $img
as an array reference $rows
containing strings of bytes, one byte per pixel.
signature
my $sig = $image->signature ();
Return the signature value which is set either by "fill_grid" or directly by "fill_from_sig".
valid_image
if ($image->valid_image ()) {
# do something with image data
}
This returns a true value only if $image
contains valid image data. This is to distinguish between an image which is loaded from a stored signature using "fill_from_sig" and one which is loaded from an actual image.
fill_from_sig
my $image = Image::Similar::Image::fill_from_sig ($sig);
Fill $image
using signature data.
EXAMPLES
Search many files for duplicate images
This script makes a list of all files which may be images:
# Construct a list of all images on the accessible file systems.
use File::Find;
use FindBin '$Bin';
# The list of files under construction.
my @files;
main ();
exit;
# This returns a true value if its argument is an image file.
sub is_image_file
{
my ($file) = @_;
if ($file =~ /\.(jpg|png|gif|jpeg)$/i) {
return 1;
}
return undef;
}
sub check_file
{
if (is_image_file ($File::Find::name)) {
push @files, $File::Find::name;
}
}
sub write_files
{
open my $out, ">", "$Bin/image-list.txt" or die $!;
for (@files) {
print $out "$_\n";
}
close $out or die $!;
}
sub main
{
find ({
wanted => \& check_file,
}, "$Bin/..");
write_files ();
}
(This example is included as find-all-images.pl in the distribution.)
This script then gets all the signatures of the images and compares them looking for similar images.
# Make signatures for all the images.
use utf8;
use FindBin '$Bin';
use Image::Similar ':all';
use Imager;
main ();
sub main
{
my %sigs;
open my $in, "<", "$Bin/image-list.txt" or die $!;
while (<$in>) {
chomp;
if (/^\s*$/) {
next;
}
my $image = $_;
my $imager = Imager->new ();
my $ok = $imager->read (file => $image);
if (! $ok) {
warn "$image is not ok: ", $imager->errstr ();
next;
}
my $is = load_image ($imager);
my $sig = $is->signature ();
if (! $sig) {
die "No signature for $image";
}
if ($sigs{$sig}) {
# Identical match.
print "$sigs{$sig} looks identical to $image.\n";
}
else {
for my $k (keys %sigs) {
my $diff = $is->sig_diff ($k);
if ($diff < 0.1) {
print "$sigs{$k} looks similar to $image.\n";
}
}
# Don't overwrite $sigs{$sig} if it already has a value.
$sigs{$sig} = $image;
}
}
close $in or die $!;
}
(This example is included as make-signatures.pl in the distribution.)
KNOWN PROBLEMS
Unimplemented parts of the original algorithm
The following parts of the original algorithm are unimplemented as of this version:
- Cropping
-
The 5% and 95% image cropping methods described in the paper are not used.
- Soft pixels
-
The soft pixel method is not used.
- Histogram of image
-
There is no balancing of the greyscale of the image using a histogram, it only uses the raw pixel values.
SEE ALSO
Other CPAN modules
- Image::Libpuzzle
-
This uses a similar algorithm to Image::Similar, but it requires installing a third-party library called libpuzzle, as well as the gd library.
- Image::Seek
-
This uses ImgSeek to find similar pictures in a library. It can load images via Imager, Image::Imlib2, or GD.
References
- An image signature for any kind of image
-
An image signature for any kind of image by H. Chi Wong, Marshall Bern, and David Goldberg, published in Proceedings: 2002 International Conference on Image Processing, Volume 1, date 22-25 September 2002.
Other
- Finding Similar Images
-
An article from Randal Schwartz from 2003. Contains Perl source code for finding similar images.
- Questions about image similarity at Stackoverflow
-
Contains information about more libraries.
- findimagedupes
-
A Perl script for finding duplicate and similar images by Rob Kudla / Jonathan H N Chin.
DEPENDENCIES
- Image::PNG::Libpng
-
This is the fallback image loading module used if no other option is installed.
- "looks_like_number" in Scalar::Util
-
This is used to validate the parameters of "new".
- "carp" in Carp
-
This is used to warn the user about input values.
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2016-2017 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.