The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::Cluto - Perl binding for CLUTO

INSTALLATION

Download CLUTO from http://glaros.dtc.umn.edu/gkhome/views/cluto.

Find libcluto.a which matches your environment and place it under your library path (or specify its path with LIBS option as shown below).

Then do:

   perl Makefile.PL [LIBS='-L/where/to/find/libcluto.a -lcluto']
   make
   make test
   make install

Tested with cluto-2.1.2/Darwin-i386, cluto-2.1.2/Darwin-ppc and cluto-2.1.1/Linux-i686.

SYNOPSIS

   use Statistics::Cluto;
   use Data::Dumper;
   
   my $c = new Statistics::Cluto;
   
   $c->set_dense_matrix(4, 5, [
     [8, 8, 0, 3, 2],
     [2, 9, 9, 1, 4],
     [7, 6, 1, 2, 3],
     [1, 7, 8, 2, 1]
   ]);
   $c->set_options({
     rowlabels => [ 'row0', 'row1', 'row2', 'row3' ],
     collabels => [ 'col0', 'col1', 'col2', 'col3', 'col4' ],
     nclusters => 2,
     rowmodel => CLUTO_ROWMODEL_NONE,
     colmodel => CLUTO_COLMODEL_NONE,
     pretty_format => 1,
   });
   
   my $clusters = $c->VP_ClusterRB;
   print Dumper $clusters;
   
   my $cluster_features = $c->V_GetClusterFeatures;
   print Dumper $cluster_features;

DESCRIPTION

This is a perl binding for CLUTO. Please refer to the CLUTO's manual sections 5.6 - 5.8 for details of each function. Basically, Statistics::Cluto has all corresponding methods for functions described in the manual.

loading matrix

Initial matrix can be set either via set_dense_matrix or via set_sparse_matrix method.

   # loading 4x5 dense matrix
   #
   # 1 1 0 1 1
   # 1 0 0 1 0
   # 0 1 1 0 0
   # 0 0 1 0 0
   
   my $c = new Statistics::Cluto;
   my $nrows = 4;
   my $ncols = 5;
   my $rowval = [
     [1, 1, 0, 0, 1],
     [1, 1, 0, 1, 1],
     [1, 0, 1, 1, 0],
     [1, 0, 1, 0, 0]
   ];
   $c->set_dense_matrix($nrows, $ncols, $rowval);


   # loading 4x5 sparse matrix
   #
   # 1 1 0 1 1
   # 1 0 0 1 0
   # 0 1 1 0 0
   # 0 0 1 0 0
   
   my $c = new Statistics::Cluto;
   my $nrows = 4;
   my $ncols = 5;
   my $rowval = [
     [1, 1, 2, 1, 4, 1, 5, 1],
     [1, 1, 4, 1],
     [2, 1, 3, 1],
     [3, 1]
   ];
   $c->set_sparse_matrix($nrows, $ncols, $rowval)

Sparse matrix can also be set with set_raw_sparse_matrix, using the data format described in the manual section 3.3, Fig 16.

   # loading sparse matrix via set_raw_sparse_matrix()
   #
   # 1 1 0 1 1
   # 1 0 0 1 0
   # 0 1 1 0 0
   # 0 0 1 0 0
   
   my $c = new Statistics::Cluto;
   my $nrows = 4;
   my $ncols = 5;
   my $rowptr = [0, 4, 6, 8, 9];
   my $rowind = [0, 1, 3, 4, 0, 3, 1, 2, 2];
   my $rowval = [1, 1, 1, 1, 1, 1, 1, 1, 1];
   $c->set_raw_sparse_matrix($nrows, $ncols, $rowptr, $rowind, $rowval);

setting input parameters

Input parameters nrows, ncols, rowptr, rowind, rowval are set automatically when initial matrix is loaded. All other input parameters should be set before calling clustering functions via set_options method. See sections 5.6 - 5.8 for necessary parameters.

   $c->set_options({
       rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
       collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
       nclusters => 2,
       nfeatures => 2,
       clfun => CLUTO_CLFUN_I2,
       treetype => CLUTO_TREE_TOP,
   });

calling functions

CLUTO's api functions described in the manual sections from 5.6 to 5.8 can be called with methods of the same name, but without prefix "CLUTO_".

e.g. CLUTO_VP_ClusterDirect (in section 5.6.1) is named VP_ClusterDirect in this package.

Routines with a single output parameter will return a single value / arrayref. Routines with multiple output parameters will return an array, each member of the array being the output parameters appearing in the same order as the manual.

   # suppose $c is initialized with 5x5 sparse matrix:
   #     col0 ... col4
   # row0: 2 2 0 2 2
   # row1: 2 1 0 1 4
   # row2: 0 2 5 0 0
   # row3: 0 1 6 0 0
   # row4: 2 1 0 3 4
   
   $c->set_options({
       rowlabels => ['row0', 'row1', 'row2', 'row3', 'row4'],
       collabels => ['col0', 'col1', 'col2', 'col3', 'col4'],
       nclusters => 2,
       nfeatures => 2,
   });
   my $part = $c->VP_ClusterDirect;
   
   # $part =   [
   #             '1',
   #             '1',
   #             '0',
   #             '0',
   #             '1'
   #           ];
   
   my ($internalids, $internalwgts, $externalids, $externalwgts) = $c->V_GetClusterFeatures;
   
   # $internalids =
   #           [
   #             '2',
   #             '0',
   #             '4',
   #             '0'
   #           ]
   # $internalwgts =
   #           [
   #             '1',
   #             '0',
   #             '0.598181843757629',
   #             '0.209491595625877'
   #           ]
   # $externalids =
   #           [
   #             '2',
   #             '4',
   #             '2',
   #             '4'
   #           ]
   # $externalwgts =
   #           [
   #             '0.5',
   #             '0.299090921878815',
   #             '0.5',
   #             '0.299090921878815'
   #           ]

Please refer to the manual for the details of the returned data structure.

When pretty_format option is set to 1, results are returned in a single hashref, and in a (hopefully) little bit more comprehensible way. Meaning of the returned data should be pretty much self-explanatory.

   # with the same matrix and options as above...
   
   $c->set_options({ pretty_format => 1 });
   my $result = $c->VP_ClusterDirect;
   
   # $result =
   #         [
   #           [
   #             { 'row' => 2, 'rowlabel' => 'row2' },
   #             { 'row' => 3, 'rowlabel' => 'row3' }
   #           ],
   #           [
   #             { 'row' => 0, 'rowlabel' => 'row0' },
   #             { 'row' => 1, 'rowlabel' => 'row1' },
   #             { 'row' => 4, 'rowlabel' => 'row4' }
   #           ]
   #         ];
   
   $result = $c->V_GetClusterFeatures;
   
   # $result =
   #         [
   #           [
   #             {
   #               'discriminating' => [
   #                                     {
   #                                       'externalwgt' => '0.5',
   #                                       'collabel' => 'col2',
   #                                       'externalid' => 2
   #                                     },
   #                                     {
   #                                       'externalwgt' => '0.299090921878815',
   #                                       'collabel' => 'col4',
   #                                       'externalid' => 4
   #                                     }
   #                                   ],
   #               'descriptive' => [
   #                                  {
   #                                    'internalid' => 2,
   #                                    'internalwgt' => '1',
   #                                    'collabel' => 'col2'
   #                                  },
   #                                  {
   #                                    'internalid' => 0,
   #                                    'internalwgt' => '0',
   #                                    'collabel' => 'col0'
   #                                  }
   #                                ]
   #             },
   #             {
   #               'discriminating' => [
   #                                     {
   #                                       'externalwgt' => '0.5',
   #                                       'collabel' => 'col2',
   #                                       'externalid' => 2
   #                                     },
   #                                     {
   #                                       'externalwgt' => '0.299090921878815',
   #                                       'collabel' => 'col4',
   #                                       'externalid' => 4
   #                                     }
   #                                   ],
   #               'descriptive' => [
   #                                  {
   #                                    'internalid' => 4,
   #                                    'internalwgt' => '0.598181843757629',
   #                                    'collabel' => 'col4'
   #                                  },
   #                                  {
   #                                    'internalid' => 0,
   #                                    'internalwgt' => '0.209491595625877',
   #                                    'collabel' => 'col0'
   #                                  }
   #                                ]
   #             }
   #           ]
   #         ];

Exportable constants

  use Statistics::Cluto qw(:all)

will export all constants defined in cluto.h. (Auto generated by h2xs). See section 5 of CLUTO's manual, or cluto.h for details.

  CLUTO_CLFUN_CLINK
  CLUTO_CLFUN_CLINK_W
  CLUTO_CLFUN_CUT
  CLUTO_CLFUN_E1
  CLUTO_CLFUN_G1
  CLUTO_CLFUN_G1P
  CLUTO_CLFUN_H1
  CLUTO_CLFUN_H2
  CLUTO_CLFUN_I1
  CLUTO_CLFUN_I2
  CLUTO_CLFUN_MMCUT
  CLUTO_CLFUN_NCUT
  CLUTO_CLFUN_RCUT
  CLUTO_CLFUN_SLINK
  CLUTO_CLFUN_SLINK_W
  CLUTO_CLFUN_UPGMA
  CLUTO_CLFUN_UPGMA_W
  CLUTO_COLMODEL_IDF
  CLUTO_COLMODEL_NONE
  CLUTO_CSTYPE_BESTFIRST
  CLUTO_CSTYPE_LARGEFIRST
  CLUTO_CSTYPE_LARGESUBSPACEFIRST
  CLUTO_DBG_APROGRESS
  CLUTO_DBG_CCMPSTAT
  CLUTO_DBG_CPROGRESS
  CLUTO_DBG_MPROGRESS
  CLUTO_DBG_PROGRESS
  CLUTO_DBG_RPROGRESS
  CLUTO_GRMODEL_ASYMETRIC_DIRECT
  CLUTO_GRMODEL_ASYMETRIC_LINKS
  CLUTO_GRMODEL_EXACT_ASYMETRIC_DIRECT
  CLUTO_GRMODEL_EXACT_ASYMETRIC_LINKS
  CLUTO_GRMODEL_EXACT_SYMETRIC_DIRECT
  CLUTO_GRMODEL_EXACT_SYMETRIC_LINKS
  CLUTO_GRMODEL_INEXACT_ASYMETRIC_DIRECT
  CLUTO_GRMODEL_INEXACT_ASYMETRIC_LINKS
  CLUTO_GRMODEL_INEXACT_SYMETRIC_DIRECT
  CLUTO_GRMODEL_INEXACT_SYMETRIC_LINKS
  CLUTO_GRMODEL_NONE
  CLUTO_GRMODEL_SYMETRIC_DIRECT
  CLUTO_GRMODEL_SYMETRIC_LINKS
  CLUTO_MEM_NOREUSE
  CLUTO_MEM_REUSE
  CLUTO_MTYPE_HEDGE
  CLUTO_MTYPE_HSTAR
  CLUTO_MTYPE_HSTAR2
  CLUTO_OPTIMIZER_MULTILEVEL
  CLUTO_OPTIMIZER_SINGLELEVEL
  CLUTO_ROWMODEL_LOG
  CLUTO_ROWMODEL_MAXTF
  CLUTO_ROWMODEL_NONE
  CLUTO_ROWMODEL_SQRT
  CLUTO_SIM_CORRCOEF
  CLUTO_SIM_COSINE
  CLUTO_SIM_EDISTANCE
  CLUTO_SIM_EJACCARD
  CLUTO_SUMMTYPE_MAXCLIQUES
  CLUTO_SUMMTYPE_MAXITEMSETS
  CLUTO_TREE_FULL
  CLUTO_TREE_TOP
  CLUTO_VER_MAJOR
  CLUTO_VER_MINOR
  CLUTO_VER_SUBMINOR

SEE ALSO

http://glaros.dtc.umn.edu/gkhome/views/cluto

AUTHOR

Ikuhiro IHARA <tsukue@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2007 by Ikuhiro IHARA

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.5 or, at your option, any later version of Perl 5 you may have available.