The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

School::Code::Compare - 'naive' metrics for code similarity

VERSION

version 0.1

SYNOPSIS

This distribution ships a script. You migth want to look at the script compare-code in the bin directory. For documentation of the used libraries, keep on reading.

This calculates the Levenshtein Difference for two files, if they meet certain criterias:

 use School::Code::Compare;

 my $comparer   = School::Code::Compare->new()                                      
                                       ->set_max_relative_difference(2)             
                                       ->set_min_char_total        (20)             
                                       ->set_max_relative_distance(0.8);         
                                                                                    
 my $comparison1 = $comparer->measure('use v5.22; say "Hi"!',               
                                      'use v5.22; say "Hello";'                     
                                   );                                           
 print $comparison1->{distance} if $comparison #

FUNCTIONS

set_max_char_difference

Don't even start comparison, if the difference in char count is higher than set.

set_min_char_total

Don't even start comparison if a file is below this char count.

set_max_distance

Abort comparison (in the midst of comparison), if distance is becoming higher then set value.

measure

Do a comparison for two strings. Gives back a hash reference with different information:

 # (example output from synopsis)
 {
     'delta_length' => 3,
     'length1' => 20,
     'ratio' => 79,
     'length2' => 23,
     'comment' => 'comparison done',
     'distance' => 5
 };
distance

The Levenshtein Distance. See Text::Levenshtein::XS for more information.

ratio

The ratio of the distance in chars to the average length of the compared strings. A ratio of zero means, the strings are similar. A ratio of 50 means, that 50% of a string is different.

My experience is, that if you get a ratio below 30% you have to start looking if the code was copied and altered (if your concern is to find 'cheaters' in educational/school environments). This method of measurement is by no means well established. It may be even 'naive', but it just seems to work out quite well. See School::Code::Compare::Judge to see, how the results are currently interpreted.

comment

A comment on how the comparison went.

delta_length

Difference in length (chars) of the two compared strings.

AUTHOR

Boris Däppen <bdaeppen.perl@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2019 by Boris Däppen.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.