label.pl - Assign labels to clusters in a confusion matrix to maximize agreement
label.pl [OPTIONS] PRELABEL
label.pl --help for a quick summary of options
Labels the discovered clusters with sense tags such that maximum number of contexts are correctly assigned.
Should be the output of cluto2label.pl.
Sample CLUTO2LABEL format
2 // cord phone text div C0: 4 3 0 0 C1: 2 2 2 2 C2: 1 3 3 2 where the 1st line shows the number of unclustereted instances = 2 2nd line shows a space separated list of sense classes starting with // mark.
Each line thereafter shows the sense distribution of the instances belonging to each discovered cluster in the form of a cluster by sense distribution matrix. A cell value at (i,j) in the matrix shows the number of instances belonging to cluster Ci that have the sense tag Sj.
Note that each row begins with the cluster id that precedes a colon (:). Also, the number of sense classes on 2nd line should be same as the number of columns in the cluster by sense distribution table.
Displays this message.
Displays the version information.
Output shows the sense labels attached to each of the discovered clusters along with the score. Score tells the percentage of the total number of instances correctly clustered if the clusters are tagged with the sense labels as suggested.
Prelabel file =>
0 // cord divi form phon prod text C0: 35 26 44 18 23 43 C1: 64 34 50 43 57 52 C2: 0 3 1 2 0 3 C3: 0 0 2 31 0 0 C4: 1 28 0 4 6 0 C5: 0 9 3 2 14 2
Label Output =>
ClusterID -> SenseID C0 -> form C1 -> cord C2 -> text C3 -> phon C4 -> divi C5 -> prod Score = 30.67
cluster C0 represents the 'form' sense cluster C1 represents the 'cord' sense cluster C2 represents the 'text' sense cluster C3 represents the 'phon' sense cluster C4 represents the 'divi' sense and cluster C5 represents the 'prod' sense
Also, 30.67 % of the total instances are in their right sense classes if the clusters are tagged with this labeling scheme.
Ted Pedersen, University of Minnesota, Duluth tpederse at d.umn.edu Amruta Purandare, University of Pittsburgh Anagha Kukarni, Carnegie-Mellon University
Copyright (c) 2002-2008, Ted Pedersen, Amruta Purandare, Anagha Kulkarni
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.