NAME

Clusterize - clustering text documents.

VERSION

Version 0.02

SYNOPSIS

use Clusterize;

my %pairs = (
	key1 => [ string1, string2, ...stringN ],
	key2 => [ string5, string6, ...stringM ],
	...
	keyN => [ ... ],
);

my $clusterize = Clusterize->new();
while (my @pair = each %files) { $clusterize->add_pair(@pair) }

foreach my $c ( $clusterize->list ) {
	printf "# /%s/ (digest=%s) (accuracy=%.3f) (size=%d)",
		$c->pattern, $digest, $c->accuracy, $c->size;
	my $pairs = $c->pairs;
	for ( keys %{$pairs} ) { print $_." ".$pairs->{$_} }
}

DESCRIPTION

Clusterize module implements specific algorithm for clustering text documents.

PUBLIC METHODS

new

This is the constructor. No parameter is required.

add_pair

This method is used to add new document into cluster set:

$clusterize->add_pair($key, [$string1, $string2, ...]);

$key - is uniq name of the document (e.g. filename), [$string1, $string2, ...] - text of the document.

remove_pair

This method is used to remove document from cluster set:

$clusterize->remove_pair($key);

$key - is name of the document (e.g. filename).

list

This method is used to get list of built clusters:

my @clusters = $clusterize->list();

Returns list of Clusterize::Pattern objects with the following attributes:

$c->pattern - regexp that matches all strings in the given cluster;

$c->accuracy - this value reflects how similar strings in the cluster (value from 0 to 1);

$c->size - how many documents in the cluster;

$c->digest - MD5 digest of the cluster to identify duplicate clusters;

$c->pairs - list of { key => $key1, val => $val1 } hash pairs, where: key - is name of document, val - is string from 'key' document;

AUTHOR

Slava Moiseev, <slava.moiseev@yahoo.com>

To install Clusterize, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Clusterize

CPAN shell

perl -MCPAN -e shell
install Clusterize

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)