NNexus::Classification - Dismabiguation logic for NNexus concept harvests
NNexus::Classification
use NNexus::Classification qw(disambiguate msc_similarity); $concepts_refined = disambiguate($concept_harvest,%options); $similarity_score = msc_similarity($category1,$category2);
NNexus::Classification contains disambiguation and clustering algorithms for determining a subset of "relevant" concept candidates from a given concept harvest. Relevance is determined heuristically.
The current algorithm considers two facets of "relevance":
1. Relevant candidates come from empirically similar domains of knowledge. To this extent, a similarity metric has been extracted from 3+ million mathematical reviews in Zentrallblatt Math, each annotated with categories from the Math Subject Classification. 2. Technical terms are more likely to be relevant. Consequently: - The more words in a candidate, the more likely that it is a term - The more characters in a candidate, the more likely that it is a term
$concepts_refined = disambiguate($concept_harvest,%options);
Disambiguates a concept harvest, as returned by NNexus::Discover, following the algorithm in the description.
Currently the only accepted option is a boolean value for "verbosity".
$similarity_score = msc_similarity($category1,$category2);
Retrieves the ZBL similarity score of two MSC categories given via the standard MSC naming scheme (e.g. 00-XX, 15Axx, 15B33)
Note that currently the similarity metric only covers the top-level MSC categories.
Deyan Ginev <d.ginev@jacobs-university.de>
Research software, produced as part of work done by the KWARC group at Jacobs University Bremen. Released under the MIT License (MIT)
To install NNexus, copy and paste the appropriate command in to your terminal.
cpanm
cpanm NNexus
CPAN shell
perl -MCPAN -e shell install NNexus
For more information on module installation, please visit the detailed CPAN module installation guide.