NAME

Algorithm::Kmeanspp - perl implementation of K-means++

SYNOPSIS

use Algorithm::Kmeanspp;

# input documents
my %documents = (
    Alex => { 'Pop'     => 10, 'R&B'    => 6, 'Rock'   => 4 },
    Bob  => { 'Jazz'    => 8,  'Reggae' => 9                },
    Dave => { 'Classic' => 4,  'World'  => 4                },
    Ted  => { 'Jazz'    => 9,  'Metal'  => 2, 'Reggae' => 6 },
    Fred => { 'Hip-hop' => 3,  'Rock'   => 3, 'Pop'    => 3 },
    Sam  => { 'Classic' => 8,  'Rock'   => 1                },
);

my $kmp = Algorithm::Kmeanspp->new;

foreach my $id (keys %documents) {
    $kmp->add_document($id, $documents{$id});
}

my $num_cluster = 3;
my $num_iter    = 20;
$kmp->do_clustering($num_cluster, $num_iter);             

# show clustering result
foreach my $cluster (@{ $kmp->clusters }) {
    print join "\t", @{ $cluster };
    print "\n";
}
# show cluster centroids
foreach my $centroid (@{ $kmp->centroids }) {
    print join "\t", map { sprintf "%s:%.4f", $_, $centroid->{$_} }
        keys %{ $centroid };
    print "\n";
}

DESCRIPTION

Algorithm::Kmeanspp is a perl implementation of K-means++.

METHODS

new

Create a new instance.

add_document($id, $vector)

Add an input document to the instance of Algorithm::Kmeanspp. $id parameter is the identifier of a document, and $vector parameter is the feature vector of a document. $vector parameter must be a hash reference, each key of $vector parameter is the identifier of the feature of documents and each value of $vector is the degree of the feature.

do_clustering($num_cluster, $num_iter)

Do clustering input documents. $num_cluster parameter specifies the number of output clusters, and $num_iter parameter specifies the number of clustering iterations.

clusters

This method is the accessor of clustering result. The output of the method is a array reference, and each item in the array reference includes the list of the identifiers of input documents in each cluster.

# format of output clusters
[
    [ document_id1, document_id2, ... ],  # cluster-1
    [ document_id3, document_id4, ... ],  # cluster-2
    ...
]

centroids

This method is the accessor of the vectors of cluster centroids.

AUTHOR

Mizuki Fujisawa <fujisawa@bayon.cc>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Algorithm::Kmeanspp, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::Kmeanspp

CPAN shell

perl -MCPAN -e shell
install Algorithm::Kmeanspp

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)