++ed by:
SKOLYCHEV AERO

2 PAUSE users

Pablo Rodríguez
and 1 contributors

NAME

AI::XGBoost - Perl wrapper for XGBoost library https://github.com/dmlc/xgboost

VERSION

version 0.11

SYNOPSIS

 use 5.010;
 use aliased 'AI::XGBoost::DMatrix';
 use AI::XGBoost qw(train);
 
 # We are going to solve a binary classification problem:
 #  Mushroom poisonous or not
 
 my $train_data = DMatrix->From(file => 'agaricus.txt.train');
 my $test_data = DMatrix->From(file => 'agaricus.txt.test');
 
 # With XGBoost we can solve this problem using 'gbtree' booster
 #  and as loss function a logistic regression 'binary:logistic'
 #  (Gradient Boosting Regression Tree)
 # XGBoost Tree Booster has a lot of parameters that we can tune
 # (https://github.com/dmlc/xgboost/blob/master/doc/parameter.md)
 
 my $booster = train(data => $train_data, number_of_rounds => 10, params => {
         objective => 'binary:logistic',
         eta => 1.0,
         max_depth => 2,
         silent => 1
     });
 
 # For binay classification predictions are probability confidence scores in [0, 1]
 #  indicating that the label is positive (1 in the first column of agaricus.txt.test)
 my $predictions = $booster->predict(data => $test_data);
 
 say join "\n", @$predictions[0 .. 10];

 use aliased 'AI::XGBoost::DMatrix';
 use AI::XGBoost qw(train);
 use Data::Dataset::Classic::Iris;
 
 # We are going to solve a multiple classification problem:
 #  determining plant species using a set of flower's measures 
 
 # XGBoost uses number for "class" so we are going to codify classes
 my %class = (
     setosa => 0,
     versicolor => 1,
     virginica => 2
 );
 
 my $iris = Data::Dataset::Classic::Iris::get();
 
 # Split train and test, label and features
 my $train_dataset = [map {$iris->{$_}} grep {$_ ne 'species'} keys %$iris];
 my $test_dataset = [map {$iris->{$_}} grep {$_ ne 'species'} keys %$iris];
 
 sub transpose {
 # Transposing without using PDL, Data::Table, Data::Frame or other modules
 # to keep minimal dependencies
     my $array = shift;
     my @aux = ();
     for my $row (@$array) {
         for my $column (0 .. scalar @$row - 1) {
             push @{$aux[$column]}, $row->[$column];
         }
     }
     return \@aux;
 }
 
 $train_dataset = transpose($train_dataset);
 $test_dataset = transpose($test_dataset);
 
 my $train_label = [map {$class{$_}} @{$iris->{'species'}}];
 my $test_label = [map {$class{$_}} @{$iris->{'species'}}];
 
 my $train_data = DMatrix->From(matrix => $train_dataset, label => $train_label);
 my $test_data = DMatrix->From(matrix => $test_dataset, label => $test_label);
 
 # Multiclass problems need a diferent objective function and the number
 #  of classes, in this case we are using 'multi:softprob' and
 #  num_class => 3
 my $booster = train(data => $train_data, number_of_rounds => 20, params => {
         max_depth => 3,
         eta => 0.3,
         silent => 1,
         objective => 'multi:softprob',
         num_class => 3
     });
 
 my $predictions = $booster->predict(data => $test_data);

DESCRIPTION

Perl wrapper for XGBoost library.

The easiest way to use the wrapper is using train, but beforehand you need the data to be used contained in a DMatrix object

This is a work in progress, feedback, comments, issues, suggestion and pull requests are welcome!!

XGBoost library is used via Alien::XGBoost. That means downloading, compiling and installing if it's not available in your system.

FUNCTIONS

train

Performs gradient boosting using the data and parameters passed

Returns a trained AI::XGBoost::Booster used

Parameters

params

Parameters for the booster object.

Full list available: https://github.com/dmlc/xgboost/blob/master/doc/parameter.md

data

AI::XGBoost::DMatrix object used for training

number_of_rounds

Number of boosting iterations

ROADMAP

The goal is to make a full wrapper for XGBoost.

VERSIONS

0.2

Full C API "easy" to use, with PDL support as AI::XGBoost::CAPI

Easy means clients don't have to use FFI::Platypus or modules dealing with C structures

0.25

Alien package for libxgboost.so/xgboost.dll

0.3

Object oriented API Moose based with DMatrix and Booster classes

0.4

Complete object oriented API

0.5

Use perl signatures (https://metacpan.org/pod/distribution/perl/pod/perlexperiment.pod#Subroutine-signatures)

SEE ALSO

AI::MXNet
FFI::Platypus
NativeCall

AUTHOR

Pablo Rodríguez González <pablo.rodriguez.gonzalez@gmail.com>

COPYRIGHT AND LICENSE

Copyright (c) 2017 by Pablo Rodríguez González.

CONTRIBUTOR

Ruben <me@ruben.tech>