- NAME
- VERSION
- SYNOPSIS
- DESCRIPTION
- METHODS
- FUNCTIONS
- TO DO
- SEE ALSO
- THANK YOU
- AUTHOR
- COPYRIGHT AND LICENSE

# NAME

Statistics::RankCorrelation - Compute the rank correlation between two vectors

# VERSION

version 0.1205

# SYNOPSIS

```
use Statistics::RankCorrelation;
$x = [ 8, 7, 6, 5, 4, 3, 2, 1 ];
$y = [ 2, 1, 5, 3, 4, 7, 8, 6 ];
$c = Statistics::RankCorrelation->new( $x, $y, sorted => 1 );
$n = $c->spearman;
$t = $c->kendall;
$m = $c->csim;
$s = $c->size;
$xd = $c->x_data;
$yd = $c->y_data;
$xr = $c->x_rank;
$yr = $c->y_rank;
$xt = $c->x_ties;
$yt = $c->y_ties;
```

# DESCRIPTION

This module computes rank correlation coefficient measures between two sample vectors.

Examples can be found in the distribution `eg/`

directory and methods test.

# METHODS

## new

```
$c = Statistics::RankCorrelation->new( \@u, \@v );
$c = Statistics::RankCorrelation->new( \@u, \@v, sorted => 1 );
```

This method constructs a new `Statistics::RankCorrelation`

object.

If given two numeric vectors (as array references), the statistical ranks are computed. If the vectors are of different size, the shorter is padded with zeros.

If the `sorted`

flag is set, both are sorted by the first (**x**) vector.

## x_data

```
$c->x_data( $y );
$x = $c->x_data;
```

Set or return the one dimensional array reference data. This is the "unit" array, used as a reference for size and iteration.

## y_data

```
$c->y_data( $y );
$x = $c->y_data;
```

Set or return the one dimensional array reference data. This vector is dependent on the x vector.

## size

` $s = $c->size;`

Return the number of array elements.

## x_rank

` $r = $c->x_rank;`

Return the ranks as an array reference.

## y_rank

` $y = $c->y_rank;`

Return the ranks as an array reference.

## x_ties

` $t = $c->x_ties;`

Return the x ties as a hash reference.

## y_ties

` $t = $c->y_ties;`

Return the y ties as a hash reference.

## spearman

```
$n = $c->spearman;
6 * sum( (xi - yi)^2 )
1 - --------------------------
n^3 - n
```

Return Spearman's rho.

Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.

Here `x`

and `y`

are the two rank vectors and `i`

is an index from one to **n** number of samples.

## kendall

```
$t = $c->kendall;
c - d
t = -------------
n (n - 1) / 2
```

Return Kendall's tau.

Here, **c** and **d**, are the number of concordant and discordant pairs and **n** is the number of samples.

## csim

` $n = $c->csim;`

Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.

This returns a measure in the (inclusive) range `[-1..1]`

and is computed using matrices of binary data representing "higher or lower" values in the original vectors.

This measure has been studied in musical contour analysis.

# FUNCTIONS

## rank

```
$v = [qw(1 3.2 2.1 3.2 3.2 4.3)];
$ranks = rank($v);
# [1, 4, 2, 4, 4, 6]
my( $ranks, $ties ) = rank($v);
# [1, 4, 2, 4, 4, 6], { 1=>[], 3.2=>[]}
```

Return an list of an array reference of the ordinal ranks and a hash reference of the tied data.

In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:

```
sorted data: [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ]
ranks: [ 1, 2, 3, 4, 5, 6 ]
tied ranks: 3, 4, and 5
tied average: (3 + 4 + 5) / 3 == 4
averaged ranks: [ 1, 2, 4, 4, 4, 6 ]
```

## pad_vectors

```
( $u, $v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] );
# [1, 2, 3, 4], [9, 8, 0, 0]
```

Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.

## co_sort

` ( $u, $v ) = co_sort( $u, $v );`

Sort the vectors as two dimensional data-point pairs with **u** values sorted first.

## correlation_matrix

` $matrix = correlation_matrix( $u );`

Return the correlation matrix for a single vector.

This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.

## sign

Return 0, 1 or -1 given a number.

# TO DO

Handle any number of vectors instead of just two.

Implement other rank correlation measures that are out there...

# SEE ALSO

For the `csim`

method:

http://personal.systemsbiology.net/ilya/Publications/JNMRcontour.pdf

For the `spearman`

and `kendall`

methods:

http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

http://en.wikipedia.org/wiki/Kendall's_tau

# THANK YOU

For helping make this sturdier code:

Thomas Breslin

Jerome

Jon Schutz

Andy Lee

anno

mst

# AUTHOR

Gene Boggs <gene@cpan.org>

# COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by Gene Boggs.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.