Anand Jha
and 1 contributors

Name

Text::SenseClusters::LabelEvaluation::ConfusionMatrixTotalCalc - Module responsible for processing of decision matrix.

DESCRIPTION

This module provide two functions. First function will calculate the probability decision matrix from the scores of the original decision matrix. The second function will then use the new decision matrix to decide whether labels are appropriately assigned or not.

function: printCalculatedScoreMatrix

        The following function is responsible for printing the calculated score 
        matrix from the decision matrix.

        @argument1      :  outputFileHandle:    DataType(File Handler)
                                        This the file handler used for defining where to print
                                        the output message/statements of this module.
                                        Its default value is: STDERR.
                                         
        @argument2      : clusterNameArrayRef:          DataType(Reference_Of_Array)
                                        Reference to Array containing Cluster Name.
                                        
        @argument3      : standardTermsArrayRef:        DataType(Reference_Of_Array)  
                                        Reference to Array containing Standard terms.
                                         
        @argument4      : hashForClusterTopicScoreRef:  DataType(Reference_Of_Hash)
                                        Reference to hash containing Cluster Name, corresponding 
                                        StandardTopic and its score.
                                        
        @argument5      : topicTotalSumHashRef:  DataType(Reference_Of_Hash)
                                        Hash which will contains the total score for a topic 
                                        against each clusters.
                                        
        @argument6      : clusterTotalSumHashRef:  DataType(Reference_Of_Hash)
                                        Hash which will contains the total score for a cluster 
                                        against each topics.

        @argument7      : $isDecisionMatrixDebugOn:  DataType(number 0 or 1)
                                  Verbose:: This decide whether to detail output or not.        


        @return         : SimilarityScore
                                  This indicate the similarity score of labels and actual
                                  topics which are correctly identified by SenseClusters 
                                  or similar application.               

        @description    :

        This module is responsible of decision matrix which is identified as:                           

        Calculated Decision MATRIX:
        
                =========================================================
                                                        |       Cluster0                |               Cluster1                |
                ---------------------------------------------------------
                        Bill Clinton:   |               0.478           |               0.522                   |
                ---------------------------------------------------------
                ---------------------------------------------------------
                        Tony Blair:     |               0.625           |               0.375                   |
                ---------------------------------------------------------
                =========================================================


         Where, 1) Cluster0, Cluster1 are  Cluster Names, (Column Header).
                         2) Bill Clinton, Tony Blair are  Standard Topics, (Row Header).
                         3) Cell content is the probability measure which indicates 
                            likelihood of a cluster's label against a Topic.
                            
        
         Steps:
                        1. First, it will iterate through hash, '%hashForClusterTopicScore'.
                        2. It will divide the cluster-topic overlapping score with the total 
                           count value of the decision matrix. 
                        3. This will give the normalized score.
                        4. Based on user input on Verbose, it will display the normalized 
                           decision matrix.
                        5. It will then call the function 'concludingFromDecisionMatrix' 
                           which will used the normalized decision matrix to conclude 
                                        a) which cluster's labels is matching with which Gold-Standard
                                           -topic's data.
                                        a) which Gold-Standard-topic's data label is matching with 
                                           which cluster's labels.
                        6. Finally, it will compare the Clusterwise results with Topicwise 
                           results to conclude final cluster-topic match results along with
                           their matching score.                    

function: concludingFromDecisionMatrix

        The following matrix is responsible for printing the calculated score 
        matrix from the decision matrix.

        @argument1      : hashForClusterTopicScoreRef:  DataType(Reference_Of_Hash)
                                        Reference to hash containing Cluster Name, corresponding 
                                        StandardTopic and its score.
        @argument2      : topicTotalSumHashRef:  DataType(Reference_Of_Hash)
                                        Hash which will contains the total score for a topic 
                                        against each clusters.
        @argument3      : clusterTotalSumHashRef:  DataType(Reference_Of_Hash)
                                        Hash which will contains the total score for a cluster 
                                        against each topics.
        @argument4      : directClusterTopicHashRef:  DataType(Reference_Of_Hash)
                                        HashOfHash to store conclusion of Direct calculation, 
                                        row-wise i.e a topic (OuterKey) score against each 
                                        cluster(InnerKey).
        @argument5      : directTopicClusterHashRef:  DataType(Reference_Of_Hash)
                                        HashOfHash to store conclusion of Direct calculation, 
                                        columnwise i.e a Cluster (OuterKey) scores against 
                                        each topics(InnerKey).

        
         @return1       : directClusterTopicHashRef:  DataType(Reference_Of_Hash)
                                        HashOfHash which store conclusion of calculation, 
                                        row-wise i.e a topic (OuterKey) score against each 
                                        cluster(InnerKey).
         @return2       : directTopicClusterHashRef:  DataType(Reference_Of_Hash)
                                        HashOfHash to store conclusion of calculation, 
                                        columnwise i.e a Cluster (OuterKey) scores against 
                                        each topics(InnerKey).

        @description :
        
                                        The following block of code is responsible for 
                                        1. Calculating the probabilities (normalized value) of all the   
                                                topic against a cluster. 
                                        2. Chosing a topic which has the maximum probability (normali
                                                -zed value) value for the given cluster.
                                        3. In current approach, for calculating the probability (norm
                                                -alized value) we will divide the similarity score of a  
                                                topic against a cluster with total similarity score of all 
                                                the topics against all the cluster.
        
         
                                         Future enhancement::
                                         4. The above approach can be done in two way i.e. using the  
                                                direct way as well as inverse way.
                                         5. In direct approach, for calculating the probability we 
                                         will divide    the similarity score of a topic against a 
                                         cluster with total similarity score of all the topics 
                                         against that cluster.
                                 6. In inverse approach, for calculating the probability we 
                                         will divide the similarity score of a topic against a 
                                         cluster with total similarity score of all the clusters 
                                         against that topic.

SEE ALSO

http://senseclusters.cvs.sourceforge.net/viewvc/senseclusters/LabelEvaluation/

@Last modified by : Anand Jha @Last_Modified_Date : 24th Dec. 2012 @Modified Version : 1.6

AUTHORS

        Ted Pedersen, University of Minnesota, Duluth
        tpederse at d.umn.edu

        Anand Jha, University of Minnesota, Duluth
        jhaxx030 at d.umn.edu

COPYRIGHT AND LICENSE

Copyright (C) 2012 Ted Pedersen, Anand Jha

See http://dev.perl.org/licenses/ for more information.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

        The Free Software Foundation, Inc., 59 Temple Place, Suite 330, 
        Boston, MA  02111-1307  USA