The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Lingua::Diversity::Internals - utility subroutines for developers of classes derived from Lingua::Diversity

VERSION

This documentation refers to Lingua::Diversity::Internals version 0.02.

SYNOPSIS

    package Lingua::Diversity::MyMeasure;

    use Moose;

    extends 'Lingua::Diversity';

    use Lingua::Diversity::Internals qw(
        _validate_size
        _get_average
        _prepend_unit_with_category
    );
    
    sub measure {
        my ( $self, $array_ref ) = @_;

        _validate_size(
            'unit_array_ref'    => $array_ref,
            'min_num_items'     => 50,
            'max_num_items'     => 1000000,
        );

        # Further instructions, until at some point...
        my @numbers = 1..100;
        my (
            $average,
            $variance,
            $num_observations,
        ) = _get_average( \@numbers );
        
        # More instructions...
    }

    sub measure_per_category {
        my ( $self, $unit_array_ref, $category_array_ref ) = @_;

        _validate_size(
            'unit_array_ref'        => $unit_array_ref,
            'category_array_ref'    => $category_array_ref,
            'min_num_items'         => 50,
            'max_num_items'         => 1000000,
        );

        # Recode units to avoid homophony...
        my $recoded_unit_array_ref = _prepend_unit_with_category(
            $unit_array_ref,
            $category_array_ref,
        );

        # Further instructions, until at some point...
        my @numbers = 1..100;
        my @weights = 1..100;
        my (
            $weighted_average,
            $weighted_variance,
            $num_observations,
        ) = _get_average( \@numbers, \@weights );

        # Yet more instructions...
    }

DESCRIPTION

This module provides utility subroutines intended to facilitate the development of classes derived from Lingua::Diversity. These subroutines are marked as internal because they are meant to be used by developers creating classes derived from Lingua::Diversity (as opposed to being used by clients of such classes).

SUBROUTINES

_validate_size()

Check that the subroutine is called with at least a parameter 'unit_array_ref' containing an array ref. Check that the size of the array is within specified bounds. If called from within method measure_per_category(), further check that a second array ref is provided via parameter 'category_array_ref', and that it has the same size as the first.

NB: This subroutine is meant to be used within implementations of methods measure() and measure_per_category(). Use of this subroutine in other contexts has not been tested and probably doesn't make any sense.

The subroutine requires one named parameter and may take up to four of them.

unit_array_ref (required)

A reference to an array of text units (e.g. words).

category_array_ref

A reference to an array of categories (e.g. lemmas).

min_num_items

The minimum number of items that should be in the array(s).

max_num_items

The maximum number of items that should be in the array(s).

_get_average()

Compute the (possibly weighted) average and variance of a list of numbers. Return the average, variance, and number of observations.

The subroutine requires a reference to an array of numbers as argument. Passing an empty array throws an exception.

Optionally, a reference to an array of counts may be passed as a second argument. An exception is thrown if this array's size does not match the first one. Counts may be real instead of integers, in which case the number of observations returned may not be an integer.

_prepend_unit_with_category()

Take a reference to an array of units and an array of categories, and return a reference to an array where each element is a unit prepended with its category. E.g. from units [ qw( can be can ) ] and categories [ qw( VERB VERB NOUN ) ] return [ qw( VERBcan VERBbe NOUNcan ) ].

It is recommended to use such a recoded array of units instead of the original one when writing the measure_per_category() method. This makes it possible to process separately homophonous units that correspond to distinct categories, such as 'can' as a verb or noun form in the above example.

NB: It is assumed that two non-empty arrays of identical size are passed in argument, which can and should be checked previously with subroutine _validate_size().

DIAGNOSTICS

The following error message targets developers of classes derived from Lingua::Diversity:

Missing parameter 'unit_array_ref' in call to subroutine _validate_size()

This exception is raised when subroutine _validate_size() is called without its only required argument, 'unit_array_ref' (a reference to an array).

The following error messages target clients of classes derived from Lingua::Diversity. They should be copied almost verbatim in the documentation of these classes when the corresponding subroutines are used.

Method [measure()/measure_per_category()] must be called with a reference to an array as 1st argument

This exception is raised when either method measure() or method measure_per_category() is called without a reference to an array as a first argument.

Method measure_per_category() must be called with a reference to an array as 2nd argument

This exception is raised when method measure_per_category() is called without a reference to an array as a second argument.

Method [measure()/measure_per_category()] was called with an array containing N item(s) while this measure requires [at least/at most] M item(s)

This exception is raised when either method measure() or method measure_per_category() is called with an argument array that is either too small or too large relative to conditions set by the selected measure.

DEPENDENCIES

This module is part of the Lingua::Diversity distribution.

BUGS AND LIMITATIONS

There are no known bugs in this module.

Please report problems to Aris Xanthos (aris.xanthos@unil.ch)

Patches are welcome.

AUTHOR

Aris Xanthos (aris.xanthos@unil.ch)

LICENSE AND COPYRIGHT

Copyright (c) 2011 Aris Xanthos (aris.xanthos@unil.ch).

This program is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

SEE ALSO

Lingua::Diversity