The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Spreadsheet::Compare::Single - Module for comparing two spreadsheet datasets

SYNOPSIS

    use Spreadsheet::Compare::Single;
    my $single = Spreadsheet::Compare::Single->new(%args);
    my $result = $single->compare();

DESCRIPTION

Spreadsheet::Compare::Single analyses differences between two similar record sets according to a defined configuration set.

ATTRIBUTES

All attributes return the object on setting and the current value if called without parameter.

    $single->attr($value);
    my $value = $single->attr;

They will usually be set by Spreadsheet::Compare, after reading the values from a config file.

allow_duplicates

    possible values: 0|1
    default: 0

Try to match identical records even when a unique identity cannot be constructed. This can significantly increase compare times on large datasets.

below_limit_is_equal

    possible values: 0|1
    default: 0

Normally differences that are inside configured limits will still be counted as differences (only marked visually as low priority). Setting below_limit_is_equal to a true value will result in the record counted as equal.

convert_numbers

    possible values: 0|1
    default: 0

Convert content that is treated as a numerical value to an actual numeric value (by simply adding 0). This is e.g. handy for having numerical data in Excel output instead of strings that look like numbers. This will not affect the optional 'All' report that can be created with the report_all_data option.

decimal_separator

    possible values: <string>
    default: '.'

Decimal separator for numerical values

diff_relative

    possible values: <list of column names>
    default: []

Report the relative instead of the absolute difference if "report_diff_row" in Spreadsheet::Compare::Reporter is set to a true value.

Example (as YAML config):

    diff_relative: [2,3,4]

or

    diff_relative:
        - Price
        - Quantity

digital_grouping_symbol

    possible values: <string>
    default: ','

Digital grouping symbol for numerical values

fetch_size

    possible values: <integer>
    default: 1000

When "is_sorted" is set, "fetch_size" determines the number of records fetched into memory at a time.

fetch_limit

    possible values: <integer>
    default: 0

When "is_sorted" is set, "fetch_limit" determines the number of fetches (of size "fetch_size") before the comparison stops. This is useful during setup with large datasets where you may have columns that are different for every row and that you better add to the ignore list. Just remember to unset this value once you are done.

ignore

    possible values: <list of columns>
    default: empty list

Columns to ignore while comparing data. If "header" in Spreadsheet::Compare::Reader is set the column names have to be used. Else use the zero based column number.

ignore_strings

    possible values: 0|1
    default: 0

Only compare numerical data. This skips comparisons where both sides are not considered to be numerical values. This depends on the setting for "decimal_separator" and "digital_grouping_symbol"

is_sorted

    possible values: 0|1
    default: 0

Assume data is sorted by identity. This is needed for fetching data in smaller batches (see "fetch_size") to use less memory.

left

    possible values: <string>
    default: 'left'

Name for the input on the left side of the comparison. Used for reporting.

limit_abs

    possible values: <number or key/value pairs>
    default: 0

Single value or one entry per column for specifying absolute tolorance intervals. Differences inside the tolerance interval will be counted and reported separately from differences outside of it. The default value of 0 means no tolerance limit, the value 'none' skips the limit check with the side effect that the deviation will not be considered in statistics output (column with highest absolute deviation). The special key '__default__' can be used to set a default for all (numerical) columns, and subsequently setting a different limit on selected columns.

Example (as YAML config):

    limit_abs: 0.01

    or

    limit_abs:
        __default__: 0.01
        Price: 0.0001
        Quantity: 1
        Size: none

limit_rel

    possible values: <number or keys/values>
    default: undef

Single value or one entry per column for specifying relative tolerance intervals (decimal value, not a percentage). Differences inside the tolerance interval will be counted and reported separately from differences outside of it. The default value of 0 means no tolerance limit, the value 'none' skips the limit check with the side effect that the deviation will not be considered in statistics output (column with highest relative deviation). The special key '__default__' can be used to set a default for all (numerical) columns, and subsequently setting a different limit on selected columns.

    limit_rel: 0.01

or

    limit_rel:
        __default__: 0.1
        Price: 0.01
        Quantity: 1
        Size: none

readers

This attribute cannot be set from a config file.

    possible values: <list of exactly 2 Reader objects>
    default: []

The readers have to be two objects of Spreadsheet::Compare::Reader subclasses representing the left and the right side of the comparison.

report_all_data

Add an additional output stream containing all data from both sides of the comparison. This will slow down reporting on large datasets.

    possible values: <string>
    default: 'right'

Name for the input on the right side of the comparison. Used for reporting.

title

    possible values: <string>
    default: ''

A title for the comparison

CONSTRUCTOR

new

    my $single = Spreadsheet::Compare::Single->new(%attributes);
    my $single = Spreadsheet::Compare::Single->new(\%attributes);

or

    my $single = Spreadsheet::Compare::Single->new
        ->title('Regression Test 1')
        ->readers([$r_left, $r_right]);

Construct a new Spreadsheet::Compare::Single object. All comparison attributes can be given to the constructor or set individualy via their chainable set methods.

METHODS

compare

Run all configured tests of the run configuration. Return a hashref with counters:

    {
        add   => <number of additional records on the right>,
        diff  => <number of found differences>,
        dup   => <number of duplicate rows (maximum of left and right)>,
        left  => <number of records on the left>,
        limit => <number of record with differences below set ste limits>,
        miss  => <number of records missing on the right>,
        right => <number of records on the right>,
        same  => <number of identical records>,
    }

Before running compare the readers have to be set.

EVENTS

Spreadsheet::Compare::Single is a Mojo::EventEmitter.

The reporting events correspond to the methods that are implemented by sublasses of Spreadsheet::Compare::Reporter. Spreadsheet::Compare will subscribe to the events and call the methods.

Spreadsheet::Compare::Single emits the following events

add_stream

    $single->on(add_stream => sub ($obj, $name) {
        say "new stream $name for ", $obj->title;
    });

Reporting event. Signaling that a new reporting stream should be created and will be later referenced for reporting data lines.

The possible stream names are 'Differences', 'Missing', 'Additional', 'Duplicates' and 'All'.

after_fetch

    $single->on(after_fetch => sub ($obj) {
        say "next fetch for ", $obj->title;
    });

Emitted directly after a fetch from the readers.

counters

    require Data::Dumper;
    $single->on(counters => sub ($obj, $counters) {
        say "next fetch for ", $obj->title, ":", Dumper($counters);
    });

Emitted for every handled record. Don't rely on the numbers of calls, but on the content of the %$counters hash if you want to know how many lines where actually read from the readers. This can be used for progress reporting.

final_counters

    require Data::Dumper;
    $single->on(final_counters => sub ($obj, $counters) {
        say "next fetch for $title:", Dumper($counters);
    });

Emitted after completing a single comparison.

mark_header

    $single->on(mark_header => sub ($obj, 'Differences', $mask) {
        # mark columns
    });

Reporting event. Emitted after completing a single comparison with a mask describing which columns had differences (key:column_index, value:false/true)

write_fmt_row

    $single->on(write_fmt_row => sub ($obj, 'Differences', $record) {
        # write record to stream;
    });

Reporting event. Write a formatted record to the 'Differences' output stream The record is an Spreadsheet::Compare::Record and will contain information about the differences found (see "limit_mask" in Spreadsheet::Compare::Record).

write_header

    $single->on(write_header => sub ($obj, $stream) {
        # write header for stream;
    });

Reporting event. Write the header for an output stream.

write_row

    $single->on(write_row => sub ($obj, $stream, $record) {
        # write record to stream;
    });

Reporting event. Write a default formatted record to an output stream. The record is an Spreadsheet::Compare::Record