PDLDM::Rank - Calculates and finds tied ranks of a PDL data matrix


    use PDL;
    use PDLDM::Rank qw(TiedRank EstimateTiedRank EstimateTiedRankWithDups UniqueRank EstimateUniqueRankWithDups );
    my $training_pdl = pdl ([[1,2,3,3,4,4,4,5,6,6], [1,1,1,2,2,4,4,5,6,6]]);
    print "training data $training_pdl";
    my ($ranked_training_pdl,$duplicates_training_pdl) = TiedRank($training_pdl);
    print "ranked training data $ranked_training_pdl";
    print "duplicate count in the training data $duplicates_training_pdl";
    my $test_pdl = pdl ([[0.5,4,4.5,6.5], [0.2,1,2,2.5]]);
    print "test data $test_pdl";
    my ($ranked_test_pdl,$unique_test_pdl) = EstimateTiedRank($test_pdl,$training_pdl,$ranked_training_pdl);
    print "ranked test data $ranked_test_pdl";
    print "is the value unique?  $unique_test_pdl";
    my ($ranked_dup_test_pdl,$dup_test_pdl) = EstimateTiedRankWithDups($test_pdl,$training_pdl,$ranked_training_pdl,$duplicates_training_pdl);
    print "ranked test data $ranked_dup_test_pdl";
    print "number of duplicates in the training data  $dup_test_pdl";
    my ($uranked_training_pdl,$urank_training_dup_pdl) = UniqueRank($training_pdl);
    print "Unique ranked training data $uranked_training_pdl";
    print "duplicate count in the training data $urank_training_dup_pdl";
    my ($uranked_dup_test_pdl,$udup_test_pdl) = EstimateUniqueRankWithDups($test_pdl,$training_pdl,$uranked_training_pdl,$urank_training_dup_pdl);        
    print "Unique ranked test data $uranked_dup_test_pdl";
    print "number of duplicates in the training data  $udup_test_pdl";



PDLDM::Rank finds the tied rank values of a given PDL. In the data PDL, the raws should represent the data instances and colomns should represent the attributes.


This returns two PDLs each with the same size as the imput PDL. The first variable contains the tied rank values. The second variable contains the number of instances that share the same value. TiedRank function should produce the same results as the MATLAB tiedrank function.


In some cases data are divided into two parts, training and testing (or evaluation). Tied ranks are first evaluated for the training data. It may be ineffient to re-evaluate the tied ranks of both training and testing data together.

EstimateTiedRank finds the lowest nearest rank for the test data. It needs three PDL inputs: test data, training data and tied ranks of the training data respectively. Tied ranks of the training data is the first variable retuned by the TiedRank function.

EstimateTiedRank returns two PDL varibles each of the same size as the test data PDL. The first varible contains the lowest nearest ranks from the tied ranks of the training data. The second variable contains whether the value is unique, ie. to be unique it should not exist in the training dataset in the corresponding attribute.


This produces similar functionality to the EstimateTiedRank. However additionally it needs number of duplicates as returned by TiedRank. It retunes the duplicate count in the traning data instead of the uniqueness. Therefore, if a value in the second retuned parameter (duplicate count) is zero, the corresponding value in the test/evaluation data is unique.

UniqueRank This function works with similar input and output parameters as the TiedRank function. However, it produces ranking without leaving gaps for duplicates. For example, [ 1 4 4 6 8 8 8 9] given [1 2 2 3 4 4 4 5] ranks.


This performs a similar fucntion as EstimateTiedRankWithDups, but with the data from the UniqueRank function.


This module requires these other modules and libraries:



Please refer for PDL. PDL is very efficeint in terms of memory and execution time.


Muthuthanthiri B Thilak Laksiri Fernando


Copyright (C) 2015 by Muthuthanthiri B Thilak L Fernando

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.18.2 or, at your option, any later version of Perl 5 you may have available.