The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

analogize - classify data with AM from the command line

VERSION

version 3.12

SYNOPSIS

analogize --format <format> [--exemplars <file>] [--test <file>] [--project <dir>] [--print <config_info,statistical_summary, analogical_set_summary,gang_summary,gang_detailed>] [--help]

DESCRIPTION

Classify data with analogical modeling from the command line. Required arguments are format and either exemplars or project. You can use old AM::Parallel projects (a directory containing data and test files) or specify individual data and test files. By default, only the accuracy of the predicted outcomes is printed. More detail may be printed using the print option.

OPTIONS

format

specify either commas or nocommas format for exemplar and test data files (= should be used for "null" variables). See "dataset_from_file" in Algorithm::AM::DataSet for details on the two formats.

exemplars, data or train

path to the file containing the examplar/training data

project

path to an AM::Parallel-style project (ignores 'outcome' file); this should be a directory containing a file called data containing known exemplars and test containing test exemplars. If the test file does not exist, then a leave-one-out scheme is used for testing using the exemplars in the data file.

test

path to the file containing the test data. If none is specified, performs leave-one-out classification with the exemplar set.

print

reports to print, separated by commas (be careful not to add spaces between report names!). For example, --print analogical_set_summary,gang_summary would print analogical sets and gang summaries.

Available options are:

config_info

Describes the configuration used and some simple information about the data, i.e. cardinality, etc.

statistical_summary

A statistical summary of the classification results, including all predicted outcomes with their scores and percentages and the total score for all outcomes. Whether the predicted class is correct, incorrect, or a tie is also included, if the test item had a known class.

analogical_set_summary

The analogical set, showing all items that contributed to the predicted outcome, along with the amount contributed by each item (score and percentage overall).

gang_summary

A summary of the gang effects on the outcome prediction.

gang_detailed

Same as gang_summary, but also includes lists of exemplars for each gang.

include_given

Allow a test item to be included in the data set during classification. If false (default), test items will be removed from the dataset during classification.

include_nulls

Treat null variables in a test item as regular variables. If false (default), these variables will be excluded and not considered during classification.

linear

Calculate scores using occurrences (linearly) instead of using pointers (quadratically).

help or ?

print help message

EXAMPLES

This distribution comes with a sample dataset in the datasets/soybean directory. Data exemplars are in data and a single test exemplar is in test. The files are in the commas format. The following two commands are equivalent and will analyze the test exemplar and output a summary of gang effects to gang.txt:

    analogize --exemplars datasets/soybean/data --test datasets/soybean/test --format commas --print gang_summary > gang.txt

    analogize --project datasets/soybean --format commas --print gang_summary > gang.txt

The resulting files are best viewed in a text editor with word wrap turned off.

AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2021 by Royal Skousen.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.