NAME
Statistics::Cluto - Perl binding for CLUTO
INSTALLATION
Download CLUTO from http://glaros.dtc.umn.edu/gkhome/views/cluto.
Find libcluto.a
which matches your environment and place it under your library path (or specify its path with LIBS option as shown below).
Then do:
perl Makefile.PL [LIBS='-L/where/to/find/libcluto.a -lcluto']
make
make test
make install
Tested with cluto-2.1.2/Darwin-i386, cluto-2.1.2/Darwin-ppc and cluto-2.1.1/Linux-i686.
SYNOPSIS
use Statistics::Cluto;
use Data::Dumper;
my $c = new Statistics::Cluto;
$c->set_dense_matrix(4, 5, [
[8, 8, 0, 3, 2],
[2, 9, 9, 1, 4],
[7, 6, 1, 2, 3],
[1, 7, 8, 2, 1]
]);
$c->set_options({
rowlabels => [ 'row0', 'row1', 'row2', 'row3' ],
collabels => [ 'col0', 'col1', 'col2', 'col3', 'col4' ],
nclusters => 2,
rowmodel => CLUTO_ROWMODEL_NONE,
colmodel => CLUTO_COLMODEL_NONE,
pretty_format => 1,
});
my $clusters = $c->VP_ClusterRB;
print Dumper $clusters;
my $cluster_features = $c->V_GetClusterFeatures;
print Dumper $cluster_features;
DESCRIPTION
This is a perl binding for CLUTO. Please refer to the CLUTO's manual sections 5.6 - 5.8 for details of each function. Basically, Statistics::Cluto has all corresponding methods for functions described in the manual.
loading matrix
Initial matrix can be set either via set_dense_matrix
or via set_sparse_matrix
method.
# loading 4x5 dense matrix
#
# 1 1 0 1 1
# 1 0 0 1 0
# 0 1 1 0 0
# 0 0 1 0 0
my $c = new Statistics::Cluto;
my $nrows = 4;
my $ncols = 5;
my $rowval = [
[1, 1, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 1, 1, 0],
[1, 0, 1, 0, 0]
];
$c->set_dense_matrix($nrows, $ncols, $rowval);
# loading 4x5 sparse matrix
#
# 1 1 0 1 1
# 1 0 0 1 0
# 0 1 1 0 0
# 0 0 1 0 0
my $c = new Statistics::Cluto;
my $nrows = 4;
my $ncols = 5;
my $rowval = [
[1, 1, 2, 1, 4, 1, 5, 1],
[1, 1, 4, 1],
[2, 1, 3, 1],
[3, 1]
];
$c->set_sparse_matrix($nrows, $ncols, $rowval)
Sparse matrix can also be set with set_raw_sparse_matrix
, using the data format described in the manual section 3.3, Fig 16.
# loading sparse matrix via set_raw_sparse_matrix()
#
# 1 1 0 1 1
# 1 0 0 1 0
# 0 1 1 0 0
# 0 0 1 0 0
my $c = new Statistics::Cluto;
my $nrows = 4;
my $ncols = 5;
my $rowptr = [0, 4, 6, 8, 9];
my $rowind = [0, 1, 3, 4, 0, 3, 1, 2, 2];
my $rowval = [1, 1, 1, 1, 1, 1, 1, 1, 1];
$c->set_raw_sparse_matrix($nrows, $ncols, $rowptr, $rowind, $rowval);
setting input parameters
Input parameters nrows
, ncols
, rowptr
, rowind
, rowval
are set automatically when initial matrix is loaded. All other input parameters should be set before calling clustering functions via set_options
method. See sections 5.6 - 5.8 for necessary parameters.
$c->set_options({
rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
nclusters => 2,
nfeatures => 2,
clfun => CLUTO_CLFUN_I2,
treetype => CLUTO_TREE_TOP,
});
calling functions
CLUTO's api functions described in the manual sections from 5.6 to 5.8 can be called with methods of the same name, but without prefix "CLUTO_".
e.g. CLUTO_VP_ClusterDirect
(in section 5.6.1) is named VP_ClusterDirect
in this package.
Routines with a single output parameter will return a single value / arrayref. Routines with multiple output parameters will return an array, each member of the array being the output parameters appearing in the same order as the manual.
# suppose $c is initialized with 5x5 sparse matrix:
# col0 ... col4
# row0: 2 2 0 2 2
# row1: 2 1 0 1 4
# row2: 0 2 5 0 0
# row3: 0 1 6 0 0
# row4: 2 1 0 3 4
$c->set_options({
rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
nclusters => 2,
nfeatures => 2,
});
my $part = $c->VP_ClusterDirect;
# $part = [
# '1',
# '1',
# '0',
# '0',
# '1'
# ];
my ($internalids, $internalwgts, $externalids, $externalwgts) = $c->V_GetClusterFeatures;
# $internalids =
# [
# '2',
# '0',
# '4',
# '0'
# ]
# $internalwgts =
# [
# '1',
# '0',
# '0.598181843757629',
# '0.209491595625877'
# ]
# $externalids =
# [
# '2',
# '4',
# '2',
# '4'
# ]
# $externalwgts =
# [
# '0.5',
# '0.299090921878815',
# '0.5',
# '0.299090921878815'
# ]
Please refer to the manual for the details of the returned data structure.
When pretty_format
option is set to 1, results are returned in a single hashref, and in a (hopefully) little bit more comprehensible way. Meaning of the returned data should be pretty much self-explanatory.
# with the same matrix and options as above...
$c->set_options({ pretty_format => 1 });
my $result = $c->VP_ClusterDirect;
# $result =
# [
# [
# { 'row' => 2, 'rowlabel' => 'row2' },
# { 'row' => 3, 'rowlabel' => 'row3' }
# ],
# [
# { 'row' => 0, 'rowlabel' => 'row0' },
# { 'row' => 1, 'rowlabel' => 'row1' },
# { 'row' => 4, 'rowlabel' => 'row4' }
# ]
# ];
$result = $c->V_GetClusterFeatures;
# $result =
# [
# [
# {
# 'discriminating' => [
# {
# 'externalwgt' => '0.5',
# 'collabel' => 'col2',
# 'externalid' => 2
# },
# {
# 'externalwgt' => '0.299090921878815',
# 'collabel' => 'col4',
# 'externalid' => 4
# }
# ],
# 'descriptive' => [
# {
# 'internalid' => 2,
# 'internalwgt' => '1',
# 'collabel' => 'col2'
# },
# {
# 'internalid' => 0,
# 'internalwgt' => '0',
# 'collabel' => 'col0'
# }
# ]
# },
# {
# 'discriminating' => [
# {
# 'externalwgt' => '0.5',
# 'collabel' => 'col2',
# 'externalid' => 2
# },
# {
# 'externalwgt' => '0.299090921878815',
# 'collabel' => 'col4',
# 'externalid' => 4
# }
# ],
# 'descriptive' => [
# {
# 'internalid' => 4,
# 'internalwgt' => '0.598181843757629',
# 'collabel' => 'col4'
# },
# {
# 'internalid' => 0,
# 'internalwgt' => '0.209491595625877',
# 'collabel' => 'col0'
# }
# ]
# }
# ]
# ];
Exportable constants
use Statistics::Cluto qw(:all)
will export all constants defined in cluto.h
. (Auto generated by h2xs). See section 5 of CLUTO's manual, or cluto.h for details.
CLUTO_CLFUN_CLINK
CLUTO_CLFUN_CLINK_W
CLUTO_CLFUN_CUT
CLUTO_CLFUN_E1
CLUTO_CLFUN_G1
CLUTO_CLFUN_G1P
CLUTO_CLFUN_H1
CLUTO_CLFUN_H2
CLUTO_CLFUN_I1
CLUTO_CLFUN_I2
CLUTO_CLFUN_MMCUT
CLUTO_CLFUN_NCUT
CLUTO_CLFUN_RCUT
CLUTO_CLFUN_SLINK
CLUTO_CLFUN_SLINK_W
CLUTO_CLFUN_UPGMA
CLUTO_CLFUN_UPGMA_W
CLUTO_COLMODEL_IDF
CLUTO_COLMODEL_NONE
CLUTO_CSTYPE_BESTFIRST
CLUTO_CSTYPE_LARGEFIRST
CLUTO_CSTYPE_LARGESUBSPACEFIRST
CLUTO_DBG_APROGRESS
CLUTO_DBG_CCMPSTAT
CLUTO_DBG_CPROGRESS
CLUTO_DBG_MPROGRESS
CLUTO_DBG_PROGRESS
CLUTO_DBG_RPROGRESS
CLUTO_GRMODEL_ASYMETRIC_DIRECT
CLUTO_GRMODEL_ASYMETRIC_LINKS
CLUTO_GRMODEL_EXACT_ASYMETRIC_DIRECT
CLUTO_GRMODEL_EXACT_ASYMETRIC_LINKS
CLUTO_GRMODEL_EXACT_SYMETRIC_DIRECT
CLUTO_GRMODEL_EXACT_SYMETRIC_LINKS
CLUTO_GRMODEL_INEXACT_ASYMETRIC_DIRECT
CLUTO_GRMODEL_INEXACT_ASYMETRIC_LINKS
CLUTO_GRMODEL_INEXACT_SYMETRIC_DIRECT
CLUTO_GRMODEL_INEXACT_SYMETRIC_LINKS
CLUTO_GRMODEL_NONE
CLUTO_GRMODEL_SYMETRIC_DIRECT
CLUTO_GRMODEL_SYMETRIC_LINKS
CLUTO_MEM_NOREUSE
CLUTO_MEM_REUSE
CLUTO_MTYPE_HEDGE
CLUTO_MTYPE_HSTAR
CLUTO_MTYPE_HSTAR2
CLUTO_OPTIMIZER_MULTILEVEL
CLUTO_OPTIMIZER_SINGLELEVEL
CLUTO_ROWMODEL_LOG
CLUTO_ROWMODEL_MAXTF
CLUTO_ROWMODEL_NONE
CLUTO_ROWMODEL_SQRT
CLUTO_SIM_CORRCOEF
CLUTO_SIM_COSINE
CLUTO_SIM_EDISTANCE
CLUTO_SIM_EJACCARD
CLUTO_SUMMTYPE_MAXCLIQUES
CLUTO_SUMMTYPE_MAXITEMSETS
CLUTO_TREE_FULL
CLUTO_TREE_TOP
CLUTO_VER_MAJOR
CLUTO_VER_MINOR
CLUTO_VER_SUBMINOR
SEE ALSO
http://glaros.dtc.umn.edu/gkhome/views/cluto
AUTHOR
Ikuhiro IHARA <tsukue@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2007 by Ikuhiro IHARA
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.