Statistics::RankCorrelation - Compute the rank correlation between two vectors
version 0.1206
use Statistics::RankCorrelation; $x = [ 8, 7, 6, 5, 4, 3, 2, 1 ]; $y = [ 2, 1, 5, 3, 4, 7, 8, 6 ]; $c = Statistics::RankCorrelation->new( $x, $y, sorted => 1 ); $n = $c->spearman; $t = $c->kendall; $m = $c->csim; $s = $c->size; $xd = $c->x_data; $yd = $c->y_data; $xr = $c->x_rank; $yr = $c->y_rank; $xt = $c->x_ties; $yt = $c->y_ties;
This module computes rank correlation coefficient measures between two sample vectors.
Examples can be found in the distribution eg/ directory and methods test.
eg/
$c = Statistics::RankCorrelation->new( \@u, \@v ); $c = Statistics::RankCorrelation->new( \@u, \@v, sorted => 1 );
This method constructs a new Statistics::RankCorrelation object.
Statistics::RankCorrelation
If given two numeric vectors (as array references), the statistical ranks are computed. If the vectors are of different size, the shorter is padded with zeros.
If the sorted flag is set, both are sorted by the first (x) vector.
sorted
$c->x_data( $y ); $x = $c->x_data;
Set or return the one dimensional array reference data. This is the "unit" array, used as a reference for size and iteration.
$c->y_data( $y ); $x = $c->y_data;
Set or return the one dimensional array reference data. This vector is dependent on the x vector.
$s = $c->size;
Return the number of array elements.
$r = $c->x_rank;
Return the ranks as an array reference.
$y = $c->y_rank;
$t = $c->x_ties;
Return the x ties as a hash reference.
$t = $c->y_ties;
Return the y ties as a hash reference.
$n = $c->spearman; 6 * sum( (xi - yi)^2 ) 1 - -------------------------- n^3 - n
Return Spearman's rho.
Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values and is a special case of the Pearson product-moment correlation.
Here x and y are the two rank vectors and i is an index from one to n number of samples.
x
y
i
$t = $c->kendall; c - d t = ------------- n (n - 1) / 2
Return Kendall's tau.
Here, c and d, are the number of concordant and discordant pairs and n is the number of samples.
$n = $c->csim;
Return the contour similarity index measure. This is a single dimensional measure of the similarity between two vectors.
This returns a measure in the (inclusive) range [-1..1] and is computed using matrices of binary data representing "higher or lower" values in the original vectors.
[-1..1]
This measure has been studied in musical contour analysis.
$v = [qw(1 3.2 2.1 3.2 3.2 4.3)]; $ranks = rank($v); # [1, 4, 2, 4, 4, 6] my( $ranks, $ties ) = rank($v); # [1, 4, 2, 4, 4, 6], { 1=>[], 3.2=>[]}
Return an list of an array reference of the ordinal ranks and a hash reference of the tied data.
In the case of a tie in the data (identical values) the rank numbers are averaged. An example will elucidate:
sorted data: [ 1.0, 2.1, 3.2, 3.2, 3.2, 4.3 ] ranks: [ 1, 2, 3, 4, 5, 6 ] tied ranks: 3, 4, and 5 tied average: (3 + 4 + 5) / 3 == 4 averaged ranks: [ 1, 2, 4, 4, 4, 6 ]
( $u, $v ) = pad_vectors( [ 1, 2, 3, 4 ], [ 9, 8 ] ); # [1, 2, 3, 4], [9, 8, 0, 0]
Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.
( $u, $v ) = co_sort( $u, $v );
Sort the vectors as two dimensional data-point pairs with u values sorted first.
$matrix = correlation_matrix( $u );
Return the correlation matrix for a single vector.
This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.
Return 0, 1 or -1 given a number.
Handle any number of vectors instead of just two.
Implement other rank correlation measures that are out there...
For the csim method:
csim
http://personal.systemsbiology.net/ilya/Publications/JNMRcontour.pdf
For the spearman and kendall methods:
spearman
kendall
http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html
http://en.wikipedia.org/wiki/Kendall's_tau
For helping make this sturdier code:
Thomas Breslin
Jerome
Jon Schutz
Andy Lee
anno
mst
Gene Boggs <gene@cpan.org>
This software is copyright (c) 2022 by Gene Boggs.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
To install Statistics::RankCorrelation, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Statistics::RankCorrelation
CPAN shell
perl -MCPAN -e shell install Statistics::RankCorrelation
For more information on module installation, please visit the detailed CPAN module installation guide.