AI::Perceptron::Simple
A Newbie Friendly Module to Create, Train, Validate and Test Perceptrons / Neurons
Version 1.04
#!/usr/bin/perl use AI::Perceptron::Simple qw(...); # create a new nerve / neuron / perceptron $nerve = AI::Perceptron::Simple->new( { initial_value => $size_of_each_dendrite, learning_rate => 0.3, # optional threshold => 0.85, # optional attribs => \@dendrites, } ); # train $nerve->tame( ... ); $nerve->exercise( ... ); $nerve->train( $training_data_csv, $expected_column_name, $save_nerve_to ); # or $nerve->train( $training_data_csv, $expected_column_name, $save_nerve_to, $show_progress, $identifier); # these two parameters must go together # validate $nerve->take_lab_test( ... ); $nerve->take_mock_exam( ... ); # fill results to original file $nerve->validate( { stimuli_validate => $validation_data_csv, predicted_column_index => 4, } ); # or # fill results to a new file $nerve->validate( { stimuli_validate => $validation_data_csv, predicted_column_index => 4, results_write_to => $new_csv } ); # test - see "validate" method, same usage $nerve->take_real_exam( ... ); $nerve->work_in_real_world( ... ); $nerve->test( ... ); # confusion matrix my %c_matrix = $nerve->get_confusion_matrix( { full_data_file => $file_csv, actual_output_header => $header_name, predicted_output_header => $predicted_header_name, more_stats => 1, # optional } ); # accessing the confusion matrix my @keys = qw( true_positive true_negative false_positive false_negative total_entries accuracy sensitivity ); for ( @keys ) { print $_, " => ", $c_matrix{ $_ }, "\n"; } # output to console $nerve->display_confusion_matrix( \%c_matrix, { zero_as => "bad apples", # cat milk green etc. one_as => "good apples", # dog honey pink etc. } ); # saving and loading data of perceptron locally # NOTE: nerve data is automatically saved after each trainning process use AI::Perceptron::Simple ":local_data"; my $nerve_file = "apples.nerve"; preserve( ... ); save_perceptron( $nerve, $nerve_file ); # load data of percpetron for use in actual program my $apple_nerve = revive( ... ); my $apple_nerve = load_perceptron( $nerve_file ); # for portability of nerve data use AI::Perceptron::Simple ":portable_data"; my $yaml_nerve_file = "pearls.yaml"; preserve_as_yaml ( ... ); save_perceptron_yaml ( $nerve, $yaml_nerve_file ); # load nerve data on the other computer my $pearl_nerve = revive_from_yaml ( ... ); my $pearl_nerve = load_perceptron_yaml ( $yaml_nerve_file ); # processing data use AI::Perceptron::Simple ":process_data"; shuffle_stimuli ( ... ) shuffle_data ( ORIGINAL_STIMULI, $new_file_1, $new_file_2, ... ); shuffle_data ( $original_stimuli => $new_file_1, $new_file_2, ... );
None by default.
All the subroutines from DATA PROCESSING RELATED SUBROUTINES, NERVE DATA RELATED SUBROUTINES and NERVE PORTABILITY RELATED SUBROUTINES sections are importable through tags or manually specifying them.
DATA PROCESSING RELATED SUBROUTINES
NERVE DATA RELATED SUBROUTINES
NERVE PORTABILITY RELATED SUBROUTINES
The tags available include the following:
:process_data
:local_data
:portable_data
Most of the stuff are OO.
This module provides methods to build, train, validate and test a perceptron. It can also save the data of the perceptron for future use for any actual AI programs.
This module is also aimed to help newbies grasp hold of the concept of perceptron, training, validation and testing as much as possible. Hence, all the methods and subroutines in this module are decoupled as much as possible so that the actual scripts can be written as simple complete programs.
The implementation here is super basic as it only takes in input of the dendrites and calculate the output. If the output is higher than the threshold, the final result (category) will be 1 aka perceptron is activated. If not, then the result will be 0 (not activated).
Depending on how you view or categorize the final result, the perceptron will fine tune itself (aka train) based on the learning rate until the desired result is met. Everything from here on is all mathematics and numbers which only makes sense to the computer and not humans anymore.
Whenever the perceptron fine tunes itself, it will increase/decrease all the dendrites that is significant (attributes labelled 1) for each input. This means that even when the perceptron successfully fine tunes itself to suite all the data in your file for the first round, the perceptron might still get some of the things wrong for the next round of training. Therefore, the perceptron should be trained for as many rounds as possible. The more "confusion" the perceptron is able to correctly handle, the more "mature" the perceptron is. No one defines how "mature" it is except the programmer himself/herself :)
Please take note that not all subroutines/method must be used to make things work. All the subroutines and methods are listed out for the sake of writing the documentation.
Private methods/subroutines are prefixed with _ or &_ and they aren't meant to be called directly. You can if you want to. There are quite a number of them to be honest, just ignore them if you happen to see them :)
_
&_
Synonyms are placed before the actual ie. technical subroutines/methods. You will see ... as the parameters if they are synonyms. Move to the next subroutine/method until you find something like \%options as the parameter or anything that isn't ... for the description.
...
\%options
This module can only process CSV files.
Any field ie columns that will be used for processing must be binary ie. 0 or 1 only. Your dataset can contain other columns with non-binary data as long as they are not one of the dendrites.
0
1
There are soem sample dataset which can be found in the t directory. The original dataset can also be found in docs/book_list.csv. The files can also be found here.
t
docs/book_list.csv
The perceptron/neuron data is stored using the Storable module.
Storable
See Portability of Nerve Data section below for more info on some known issues.
Portability of Nerve Data
These subroutines can be imported using the tag :process_data.
These subroutines should be called in the procedural way.
The parameters and usage are the same as shuffled_data. See the next two subroutines.
shuffled_data
Shuffles $original_data or ORIGINAL_DATA and saves them to other files.
$original_data
ORIGINAL_DATA
Creates a brand new perceptron and initializes the value of each attribute / dendrite aka. weight. Think of it as the thickness or plasticity of the dendrites.
For %options, the followings are needed unless mentioned:
%options
The value or thickness of ALL the dendrites when a new perceptron is created.
Generally speaking, this value is usually between 0 and 1. However, it all depend on your combination of numbers for the other options.
An array reference containing all the attributes / dendrites names. Yes, give them some names :)
Optional. The default is 0.05.
0.05
The learning rate of the perceptron for the fine-tuning process.
This value is usually between 0 and 1. However, it all depends on your combination of numbers for the other options.
Optional. The default is 0.5
0.5
This is the passing rate to determine the neuron output (0 or 1).
Obtains a hash of all the attributes of the perceptron
If $value is given, sets the learning rate to $value. If not, then it returns the learning rate.
$value
If $value is given, sets the threshold / passing rate to $value. If not, then it returns the passing rate.
All the training methods here have the same parameters as the two actual train method and they all do the same stuff. They are also used in the same way.
train
Trains the perceptron.
$stimuli_train_csv is the set of data / input (in CSV format) to train the perceptron while $save_nerve_to_file is the filename that will be generate each time the perceptron finishes the training process. This data file is the data of the AI::Perceptron::Simple object and it is used in the validate method.
$stimuli_train_csv
$save_nerve_to_file
validate
$expected_output_header is the header name of the columns in the csv file with the actual category or the exepcted values. This is used to determine to tune the nerve up or down. This value should only be 0 or 1 for the sake of simplicity.
$expected_output_header
$display_stats is optional and the default is 0. It will display more output about the tuning process. It will show the followings:
$display_stats
Indicates the nerve was tuned up, down or no tuning needed
The original sum of all weightage * input or dendrite_size * binary_input
weightage * input
dendrite_size * binary_input
The threshold of the nerve
The new sum of all weightage * input after fine-tuning the nerve
If $display_stats is specified ie. set to 1, then you MUST specify the $identifier. $identifier is the column / header name that is used to identify a specific row of data in $stimuli_train_csv.
$identifier
Calculates and returns the sum(weightage*input) for each individual row of data. Actually, it justs add up all the existing weight since the input is always 1 for now :)
sum(weightage*input)
input
%stimuli_hash is the actual data to be used for training. It might contain useless columns.
%stimuli_hash
This will get all the avaible dendrites using the get_attributes method and then use all the keys ie. headers to access the corresponding values.
get_attributes
This subroutine should be called in the procedural way for now.
Fine tunes the nerve. This will directly alter the attributes values in $self according to the attributes / dendrites specified in new.
$self
new
The %stimuli_hash here is the same as the one in the _calculate_output method.
_calculate_output
%stimuli_hash will be used to determine which dendrite in $self needs to be fine-tuned. As long as the value of any key in %stimuli_hash returns true (1) then that dendrite in $self will be tuned.
Tuning up or down depends on $tune_up_or_down specifed by the train method. The following constants can be used for $tune_up_or_down:
$tune_up_or_down
Value is 1
Value is 0
All the validation methods here have the same parameters as the actual validate method and they all do the same stuff. They are also used in the same way.
This method validates the perceptron against another set of data after it has undergone the training process.
This method calculates the output of each row of data and write the result into the predicted column. The data begin written into the new file or the original file will maintain it's sequence.
Please take note that this method will load all the data of the validation stimuli, so please split your stimuli into multiple files if possible and call this method a few more times.
This is the CSV file containing the validation data, make sure that it contains a column with the predicted values as it is needed in the next key mentioned: predicted_column_index
predicted_column_index
This is the index of the column that contains the predicted output values. $index starts from 0.
$index
This column will be filled with binary numbers and the full new data will be saved to the file specified in the results_write_to key.
results_write_to
Optional.
The default behaviour will write the predicted output back into stimuli_validate ie the original data. The sequence of the data will be maintained.
stimuli_validate
*This method will call _real_validate_or_test to do the actual work.
_real_validate_or_test
All the testing methods here have the same parameters as the actual test method and they all do the same stuff. They are also used in the same way.
test
This method is used to put the trained nerve to the test. You can think of it as deploying the nerve for the actual work or maybe putting the nerve into an empty brain and see how well the brain survives :)
This method works and behaves the same way as the validate method. See validate for the details.
*This method will call &_real_validate_or_test to do the actual work.
This is where the actual validation or testing takes place.
$data_hash_ref is the list of parameters passed into the validate or test methods.
$data_hash_ref
This is a method, so use the OO way. This is one of the exceptions to the rules where private subroutines are treated as methods :)
This is where the filling in of the predicted values takes place. Take note that the parameters naming are the same as the ones used in the validate and test method.
This subroutine should be called in the procedural way.
This part is related to generating the confusion matrix.
The parameters and usage are the same as get_confusion_matrix. See the next method.
get_confusion_matrix
Returns the confusion matrix in the form of a hash. The hash will contain these keys: true_positive, true_negative, false_positive, false_negative, accuracy, sensitivity. More stats like precision, specificity and F1_Score can be obtain by setting the optional more_stats key to 1.
true_positive
true_negative
false_positive
false_negative
accuracy
sensitivity
precision
specificity
F1_Score
more_stats
If you are trying to manipulate the confusion matrix hash or something, take note that all the stats are in percentage (%) in decimal (if any) except the total entries.
This is the CSV file filled with the predicted values.
Make sure that you don't do anything to the actual and predicted output in this file after testing the nerve. These two columns must contain binary values only!
The binary values are treated as follows:
Setting it to 1 will process more stats that are usually not so important eg. precision, specificity and F1_Score
Generates a hash of confusion matrix based on %options given in the get_confusion_matrix method.
Calculates and adds the data for the total_entries key in the confusion matrix hash.
total_entries
Calculates and adds the data for the accuracy key in the confusion matrix hash.
Calculates and adds the data for the sensitivity key in the confusion matrix hash.
Calculates and adds the data for the precision key in the confusion matrix hash.
Calculates and adds the data for the specificity key in the confusion matrix hash.
Calculates and adds the data for the F1_Score key in the confusion matrix hash.
Calculates and adds the data for the negative_predicted_value key in the confusion matrix hash.
negative_predicted_value
Calculates and adds the data for the false_negative_rate key in the confusion matrix hash.
false_negative_rate
Calculates and adds the data for the false_positive_rate key in the confusion matrix hash.
false_positive_rate
Calculates and adds the data for the false_discovery_rate key in the confusion matrix hash.
false_discovery_rate
Calculates and adds the data for the false_omission_rate key in the confusion matrix hash.
false_omission_rate
Calculates and adds the data for the balanced_accuracy key in the confusion matrix hash.
balanced_accuracy
The parameters are the same as display_confusion_matrix. See the next method.
display_confusion_matrix
Display the confusion matrix. If %confusion_matrix has more_stats elements, it will display them if they exists. The default elements ie accuracy and sensitivity must be present, while the rest can be absent.
%confusion_matrix
%confusion_matrix is the same confusion matrix returned by the get_confusion_matrix method.
For %labels, since 0's and 1's won't make much sense as the output labels in most cases, therefore, the following keys must be specified:
%labels
Please take note that non-ascii characters ie. non-English alphabets might cause the output to go off :)
For the %labels, there is no need to enter "actual X", "predicted X" etc. It will be prefixed with A: for actual and P: for the predicted values by default.
A:
P:
Builds the matrix using Text::Matrix module.
Text::Matrix
$c_matrix and $labels are the same as the ones passed to display_exam_results and display_confusion_matrix.
$c_matrix
$labels
display_exam_results
Returns a list ( $matrix, $c_matrix ) which can directly be passed to _print_extended_matrix.
( $matrix, $c_matrix )
_print_extended_matrix
Extends and outputs the matrix on the screen.
$matrix and $c_matrix are the same as returned by &_build_matrix.
$matrix
&_build_matrix
This part is about saving the data of the nerve. These subroutines can be imported using the :local_data tag.
The subroutines are to be called in the procedural way. No checking is done currently.
See PERCEPTRON DATA and KNOWN ISSUES sections for more details on the subroutines in this section.
PERCEPTRON DATA
KNOWN ISSUES
The parameters and usage are the same as save_perceptron. See the next subroutine.
save_perceptron
Saves the AI::Perceptron::Simple object into a Storable file. There shouldn't be a need to call this method manually since after every training process this will be called automatically.
The parameters and usage are the same as load_perceptron. See the next subroutine.
load_perceptron
Loads the data and turns it into a AI::Perceptron::Simple object as the return value.
These subroutines can be imported using the :portable_data tag.
The file type currently supported is YAML. Please be careful with the data as you won't want the nerve data accidentally modified.
The parameters and usage are the same as save_perceptron_yaml. See the next subroutine.
save_perceptron_yaml
Saves the AI::Perceptron::Simple object into a YAML file.
YAML
Loads the YAML data and turns it into a AI::Perceptron::Simple object as the return value.
These are the to-do's that MIGHT be done in the future. Don't put too much hope in them please :)
Clean up and refactor source codes
Add more useful data for confusion matrix
Implement shuffling data feature
Implement fast/smart training feature
Write a tutorial or something for this module
and something yet to be known...
Take note that the Storable nerve data is not compatible across different versions.
If you really need to send the nerve data to different computers with different versions of Storable module, see the docs of the following subroutines:
&preserve_as_yaml or &save_perceptron_yaml for storing data.
&preserve_as_yaml
&save_perceptron_yaml
&revive_from_yaml or &load_perceptron_yaml for retrieving the data.
&revive_from_yaml
&load_perceptron_yaml
Raphael Jong Jun Jie, <ellednera at cpan.org>
<ellednera at cpan.org>
Please report any bugs or feature requests to bug-ai-perceptron-simple at rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=AI-Perceptron-Simple. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
bug-ai-perceptron-simple at rt.cpan.org
You can find documentation for this module with the perldoc command.
perldoc AI::Perceptron::Simple
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
https://rt.cpan.org/NoAuth/Bugs.html?Dist=AI-Perceptron-Simple
CPAN Ratings
https://cpanratings.perl.org/d/AI-Perceptron-Simple
Search CPAN
https://metacpan.org/release/AI-Perceptron-Simple
Besiyata d'shmaya, Wikipedia
AI::Perceptron, Text::Matrix, YAML
This software is Copyright (c) 2021 by Raphael Jong Jun Jie.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
To install AI::Perceptron::Simple, copy and paste the appropriate command in to your terminal.
cpanm
cpanm AI::Perceptron::Simple
CPAN shell
perl -MCPAN -e shell install AI::Perceptron::Simple
For more information on module installation, please visit the detailed CPAN module installation guide.