++ed by:

1 non-PAUSE user.

Gene Boggs

# NAME

Statistics::RankCorrelation - Compute the rank correlation between two vectors

version 0.1205

# SYNOPSIS

``````  use Statistics::RankCorrelation;

\$x = [ 8, 7, 6, 5, 4, 3, 2, 1 ];
\$y = [ 2, 1, 5, 3, 4, 7, 8, 6 ];

\$c = Statistics::RankCorrelation->new( \$x, \$y, sorted => 1 );

\$n = \$c->spearman;
\$t = \$c->kendall;
\$m = \$c->csim;

\$s = \$c->size;
\$xd = \$c->x_data;
\$yd = \$c->y_data;
\$xr = \$c->x_rank;
\$yr = \$c->y_rank;
\$xt = \$c->x_ties;
\$yt = \$c->y_ties;``````

# DESCRIPTION

This module computes rank correlation coefficient measures between two sample vectors.

Examples can be found in the distribution `eg/` directory and methods test.

# METHODS

## new

``````  \$c = Statistics::RankCorrelation->new( \@u, \@v );
\$c = Statistics::RankCorrelation->new( \@u, \@v, sorted => 1 );``````

This method constructs a new `Statistics::RankCorrelation` object.

If given two numeric vectors (as array references), the statistical ranks are computed. If the vectors are of different size, the shorter is padded with zeros.

If the `sorted` flag is set, both are sorted by the first (x) vector.

## x_data

``````  \$c->x_data( \$y );
\$x = \$c->x_data;``````

Set or return the one dimensional array reference data. This is the "unit" array, used as a reference for size and iteration.

## y_data

``````  \$c->y_data( \$y );
\$x = \$c->y_data;``````

Set or return the one dimensional array reference data. This vector is dependent on the x vector.

## size

``  \$s = \$c->size;``

Return the number of array elements.

## x_rank

``  \$r = \$c->x_rank;``

Return the ranks as an array reference.

## y_rank

``  \$y = \$c->y_rank;``

Return the ranks as an array reference.

## x_ties

``  \$t = \$c->x_ties;``

Return the x ties as a hash reference.

## y_ties

``  \$t = \$c->y_ties;``

Return the y ties as a hash reference.

## spearman

``````  \$n = \$c->spearman;

6 * sum( (xi - yi)^2 )
1 - --------------------------
n^3 - n``````

Return Spearman's rho.

Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.

Here `x` and `y` are the two rank vectors and `i` is an index from one to n number of samples.

## kendall

``````  \$t = \$c->kendall;

c - d
t = -------------
n (n - 1) / 2``````

Return Kendall's tau.

Here, c and d, are the number of concordant and discordant pairs and n is the number of samples.

## csim

``  \$n = \$c->csim;``

Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.

This returns a measure in the (inclusive) range `[-1..1]` and is computed using matrices of binary data representing "higher or lower" values in the original vectors.

This measure has been studied in musical contour analysis.

# FUNCTIONS

## rank

``````  \$v = [qw(1 3.2 2.1 3.2 3.2 4.3)];
\$ranks = rank(\$v);
# [1, 4, 2, 4, 4, 6]
my( \$ranks, \$ties ) = rank(\$v);
# [1, 4, 2, 4, 4, 6], { 1=>[], 3.2=>[]}``````

Return an list of an array reference of the ordinal ranks and a hash reference of the tied data.

In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:

``````  sorted data:    [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ]
ranks:          [ 1,   2,   3,   4,   5,   6   ]
tied ranks:     3, 4, and 5
tied average:   (3 + 4 + 5) / 3 == 4
averaged ranks: [ 1,   2,   4,   4,   4,   6   ]``````

``````  ( \$u, \$v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] );
# [1, 2, 3, 4], [9, 8, 0, 0]``````

Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.

## co_sort

``  ( \$u, \$v ) = co_sort( \$u, \$v );``

Sort the vectors as two dimensional data-point pairs with u values sorted first.

## correlation_matrix

``  \$matrix = correlation_matrix( \$u );``

Return the correlation matrix for a single vector.

This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.

## sign

Return 0, 1 or -1 given a number.

# TO DO

Handle any number of vectors instead of just two.

Implement other rank correlation measures that are out there...

For the `csim` method:

http://personal.systemsbiology.net/ilya/Publications/JNMRcontour.pdf

For the `spearman` and `kendall` methods:

http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

http://en.wikipedia.org/wiki/Kendall's_tau

# THANK YOU

For helping make this sturdier code:

Thomas Breslin

Jerome

Jon Schutz

Andy Lee

anno

mst

# AUTHOR

Gene Boggs <gene@cpan.org>