UMLS::Similarity CHANGES

  Changes from version 1.45 to 1.47
    1. Added error checking for feature measures

  Changes from version 1.43 to 1.45
    1. Added error checking to the new measures.

  Changes from version 1.41 to 1.43
    1. Added the following similarity measures:

        a. Pekar and Staab (2002) referred to as pks

        b. Pirro and Euzenat (2010) referred to as faith

        c. Maedche and Staab (2001) referred to as cmatch

        d. Batet, et al (2011) referred to as batet

        e. Sanchez, et al. (2012) referred to as sanchez

  Changes from version 1.39 to 1.41
    1. revision to include Makefile.PL

  Changes from version 1.37 to 1.39
    1. modified an umls-similarity for the closeness error

  Changes from version 1.35 to 1.37
    1. updated to include the upath measure

    2. add the module

  Changes from version 1.33 to 1.35
    1. updated test case config files for the SNOMEDCT name change

    2. updated the web documents to account for the name change as well

  Changes from version 1.31 to 1.33
    1. ensure that jcn returns 1 if both concepts are equal

  Changes from version 1.29 to 1.31
    1. updated --intrinsic option to and the IC measure
    modules to store the parameters required to calculate Intrinisic IC in
    the index rather than done on the fly. To do them on the fly you can use
    the --realtime option

  Changes from version 1.27 1.29
    1. added --intrinsic option to and the IC measure
    modules to allow for the Intrinisic IC to be used rather than the corpus
    based IC.

  Changes from version 1.25 to 1.27
    1. updated documentation

  Changes from version 1.23 to 1.25
    1. updated documentation

    2. modified to return an error if the icfrequency
    information does not match the configuration file -- similar to what we
    do with the icpropagation option

  Changes from version 1.21 to 1.23
    1. modified to return -1 when one
    or both input terms/cuis is unknown.

  Changes from version 1.19 to 1.21
    1. added a --loadcache and --getcache option to speed things up if

  Changes from version 1.17 to 1.19
    1. added a --word option to in order to use it with files that
    do not contain CUIs as the term pairs.

  Changes from version 1.15 to 1.17
    1. added the measure proposed by Zhong, et al 2002 to the
    UMLS::Similarity package. You can use it by using the --measure zhong
    option in the program. I will add this to the web
    interface shortly. If you are in a hurry though, let me know :-)

  Changes from version 1.13 to 1.15
    1. added the --original option which will return the original distance
    score between the two cuis rather than the similarity score (this is for
    nam, cdist and jcn)

    2. fixed cdist and nam to return a -1 for lack of sufficient path

  Changes from version 1.11 to 1.13
    1. modified the to retry the query
    if an error is thrown due to a busy server rather than just return

  Changes from version 1.09 to 1.11
    1. modified the synopsis code in the measures

    2. modified the web documentation

    3. included the icpropagation and vectorfiles in the distribution for
    the web interface

    4. added a --url option to the

  Changes from version 1.07 to 1.09
    1. added the --st option to create-icfrequency and create-icpropagation
    to propagation semantic types to find their probability using the is-a
    heirarchy in the semantic network

    2. added the --st option to for the Resnik (res)
    measure - this will return the information content of the semantic type
    of the two concepts least common subsumer

  Changes from version 1.05 to 1.07
    1. added the program to
    automatically query the web interface

    2. added default vectormatrix and vectorindex files if they are not
    specified on the command line

    3. added default icpropagation information to the icpropagation
    similarity measures if the --icpropagation or --icfrequency options are
    not specified on the command line

    4. updated the web interface to include OMIM

  Changes from version 1.03 to 1.05
    1. modified the .pm modules and program to account
    for the changes in the UMLS::Interface API which now passes all array
    information back by reference.

  Changes from version 1.01 to 1.03
    1. Fixed - it was printing out all possible senses of
    a term pair if one of the terms didn't exists. Now it just prints out
    one unless the --allsenses option is used.

  Changes from version 0.99 to 1.01
    1. Added a --word option to to use for input files that do
    not consist of CUIs and/or are not in the output

  Changes from version 0.97 to 0.99
    1. modified the program to account for spaces
    in the preferred term when a CUI is entered

    2. modified the program to include showing the
    shortest path between the concepts and their definitions if they exist
    in the view of the UMLS that is specified at the SAB.

  Changes from version 0.95 to 0.97
    1. updated the --debugfile option print out the concept's definition
    after --dbouledef, --compoundfile, --stoplist, --defraw, --stem options.

    2. fix the bug when use --dictfile and --config together, if a word is
    not in UMLS, it will check the dictfile.

  Changes from version 0.93 to 0.95
    1. modified the check to see if a concept exists when using the vector
    or lesk measure - it was checking the existence based on the SAB/REL
    parameters rather than the SABDEF/RELDEF

  Changes from version 0.91 to 0.93
    1. updated the --metamap option in to require the
    year (version) of metamap that is being used in order to call it on the
    command line.

    2. modified the program to obtain the CUIs of terms
    from a different function when using the lesk or vector measures (this
    is due to the config options).

  Changes from version 0.89 to 0.91
    1. modified documentation

    2. fixed an error in the output of the program

    3. stipulated the --realtime option can only be used with the similarity
    measures and modified the tests accordingly

    4. turned back on the error checking - not certain why I had it off

    5. added error.t test to check the module

  Changes from version 0.85 to 0.89
    1. Added the web interface scripts

  Changes from version 0.85 to 0.87
    1. Modified the documentation in, and

    2. update

  Changes from version 0.83 to 0.85
    1. Add --doubledef document in and

  Changes from version 0.81 to 0.83
    1. Add --doubledef in and

  Changes from version 0.79 to 0.81
    1. added the --compoundfile option for vector

    2. added the --dictfile option for vector

  Changes from version 0.77 to 0.79
    1. I am testing a new measure and forgot to it from umls-similarity

  Changes from version 0.75 to 0.77
    1. there was an error in the program. I was messing
    around with a new measure and forgot to remove its reference

    2. updated documentation

  Changes from version 0.73 to 0.75
    1. modified umls-similarity with the --compoundfile option for lesk and

    2. modified lesk and vector to accept compound words

    3. fixed a small output error in umls-similarity when no similarity is
    found between two concepts

    4. added the --compoundify option to icfrequency

  Changes from version 0.71 to 0.73
    1. added the undirected option to umls-similarity for the path measure

  Changes from version 0.69 to 0.71
    1. modified cdist and nam to use findShortestPathLength

    2. added developers-check to t/ directory

  Changes from version 0.67 to 0.69
    1. added a --precision option to and set the default to

    2. updated jcn to return a -1 if thre is no lcs or the IC of the lcs is

    3. modified the path based measures to use findShortestPathLength rather
    than findShortestPath

  Changes from version 0.65 to 0.67
    1. umls-similarity outputs the preferred term of a CUI rather than
    randomy picking one of the associated terms

    2. the path measures return 1 if the CUIs are the same prior to going
    out and finding the shortest path

    3. modified to allow the --config option to be used
    with the '--measure random' option.

    4. fixed the dictfile errors for lesk and vector

    5. modified the ic measures to return a -1, if the IC of the CUIs (or
    their LCS) is 0. The idea is that there is not enough information to
    determine their similarity.

    6. modified vector and lesk to return a -1, if the definition or vector
    is empty. Again, the idea is that there isnot enought information to
    determine their similarity.

  Changes from version 0.63 to 0.65
    1. modified the documentation for the lesk and vector measure

    2. added the ST option to RELDEF

    3. fixed small warnings/errors in the modules

    4. fixed the synopsis code so they should run properly

    6. Added the total number of concepts to the icfrequency and
    icpropagation files

  Changes from version 0.61 to 0.63
    1. Modified the --dictfile option for the lesk and vector measures.

    This is a dictionary file for the vector measure. It contains the
    'definitions' of a concept or term which would be used rather than the
    definitions from the UMLS. If you would like to use dictfile as a
    augmentation of the UMLS definitions, then use the --config option in
    conjunction with the --dictfile option.

    The expect format for the --dictfile file is:

    CUI: <definition> CUI: <definition> TERM: <definition> TERM:

    Keep in mind, when using this file with the --config option, if one of
    the CUIs or terms that you are obtaining the similarity for does not
    exist in the file the vector or lesk will return -1.

  Changes from version 0.59 to 0.61
    1. modified create-icfrequency to run faster - the new change really
    slowed things down

  Changes from version 0.57 to 0.59
    1. modified the create-icfrequency to only tag CUIs to the terms/words
    in the data that exist in the set of sources/ relations in the
    configuration file

    2. updated documentation - specifically the configuration options and

  Changes from version 0.55 to 0.57
    1. added propagation files to the t/options directory

    2. revised errorchecking w.r.t. propagation

  Changes from version 0.53 to 0.55
    There was an errorhandler mistake when no config option was defined.
    This has been fixed.

  Changes from version 0.51 to 0.53
    1. add configuration checking for the modules

    2. check the number of tests planned for the long tests. It looks like
    they didn't get updated

    3. increment the modules by 2

    4. fix long tests

        plan skip_all => "Lengthy Tests Disabled - set UMLS_SIMILARITY_RUN_ALL
        to run this test suite\n"
        unless defined $ENV{UMLS_SIMILARITY_RUN_ALL} and

        rather than what I have.

    5. add --smooth option

    6. document how the smoothing is done in the perldoc

    --term --metamap

    --smooth --config CONFIGURATION_FILE

    9. Note: add time stamp to output for default output file name

    10. released the lesk measure!!!

    11. added the --stem option in the vector measure - this option is also
    available in lesk

    12. add regular expression stoplist support to vector - this option is
    also available in lesk

  Changes from version 0.49 to 0.51
     1. modified the to be a bit
        more robust in order for it to accepting a file with 
        spaces when using the --icfrequency option.

     2. added remove of the output file in create-propgation-file.t
        prior to actually running the test

     3. added a check in jcn that returns 0 if the distance is equal
        to 0 <- not certain why I didn't have that in there.

     4. add --precision option to

     5. add check to make certain that that the .t programs can only
        be called in the main directory

     6. add a check so that the make test long environment variable
        has to equal 1 - it won't run if it equals 0

     7. remove the output directory prior to running in order be certain 
        they are removed

     8. add --precision option to

     9. change mkpath and rmtree to make_path() and remove_tree() in the 
        .t files

  Changes from version 0.47 to 0.49
    Note: the make test longs are not working properly on all of the

     1. added documentation on how to create propagation file 
        for the the ic measure's 

     2. add a more complete configuration file in the samples directory

     3. add --stoplist in vector method

     4. modified the umls-similarity package for consistency with
        the new UMLS-Interface. We wanted to remove some of the 
        redundancy in the package which meant modifying this 
        package. The package is now not back compatible with 
        older versions of the UMLS-Interface package.

  Changes from version 0.45 to 0.47
    1. updated documentation

    2. added to the --dictfile option the ability to have TERMS not just
    CUIs as input.

  Changes from version 0.43 to 0.45
    1. Fixed some small errors that went out due to lesk

    2. Updated pod documentation

    3. Added README to samples/

    4. Renamed --matrixfile to --vectormatrix Renamed --indexfile to
    --vectorindex Renamed --propagationfile to --icpropagation

    5. Added the --defraw option for lesk and vector. This will stop any
    cleaning that is done to the definition prior to use.

    6. Created a program to create a file
    containing the information content of the CUIs in a specified set of
    sources to be used by umls-similarity when using the information content
    semantic similarity measures.

    7. Created testing file vector-input.t add bigrams input file under
    t/tests/utils add index and matrix files under t/key/static

    8. Added the --defraw option to If this option is set it will
    NOT clean the definition otherwise the words in the definition are
    cleaned up (ie remove punctuation, lower case, ...)

    9. Created testing file create-propagation-file.t

  Changes from version 0.41 to 0.43
    Modified documentation and cleaned up the package

    Updated the that the vector method read the index file and
    co-occurrence matrix from the command line.

    Added the --dictfile options to read the definitions from a text file.
    Each line is a definition. def1: the first definition def2: the second

    Added the --debugfile options. It can print the definition of each
    concept and the vector (words) of every word covered by the
    co-occurrence matrix.

    Added the This file generates the index file and
    co-occurrence matrix file from the bigrams list for vector measure use.

    Added test cases for each of the measures in the t/ directory

    Added a samples/ directory which contains examples of the configuration
    file, matrix and index files for the vector measure, and propagation
    file for the information content measures.

  Changes from version 0.39 to 0.41
        Modified documentation and cleaned up the package

  Changes from version 0.37 to 0.39
    Updated the way that the lcs and shortestpath was being done in
    UMLS-Interface. There were some complications.

    Added the module and the module

    Added the --measure vector option to umls-similarity.

  Changes from version 0.35 to 0.37
    I messed up when modifying the way the lcs and shortestpath information
    was obtained in UMLS-Similarity after I had changed it in
    UMLS-Interface. It should all be fixed now.

    I also removed the vector and lesk measures. We are not quite ready for
    those and are in the process of getting them together for a fresh

    I changed the Jiang and Conrath measure from jnc to jcn like it is
    suppose to be

  Changes from version 0.33 to 0.35
    1. Modified the way lcs and shortestpath information is returned by the
    UMLS-Interface. Needed to modify the UMLS-Similarity to reflect these

  Changes from version 0.31 to 0.33
    1. Modified the propogation in UMLS-Interface and needed to reflect this
    in UMLS-Similarity

  Changes from version 0.29 to 0.31
    1. Added the information content measures proposed by Resnik (1995),
    Jiang and Conrath (1997) and Lin (1998), as well as a random measure
    that returns a random number between one and zero as the similarity

    2. Added a --debug option which prints out UMLS-Interface informatin for
    debugging purposes

  Changes from version 0.21 to 0.29
    I used the following require command:

        require "UMLS::Similarity::vector"

    rather than

        require "UMLS/Similarity/";

    Not certain what I was thinking ...

  Changes from version 0.19 to 0.21
    I fixed (for certain this time) how the modules are installed in the program in the utils/ directory. It should not
    require BerkeleyDB now unless you are running the
    UMLS::Similarity::vector measure.

    For documentation puposes:

            'use' loads the package at compile time

    where as

            'require' loads the package at run time

    Here is some documentation on it:


    So when using 'use UMLS::Similarity::vector' - the program was loading at compile time which used which requires BerkeleyDB
    to be installed. I switched to 'require "UMLS::Similarity::vector"'
    which now only loads at runtime and only when use specify the
    vector measure.

  Changes from version 0.17 to 0.19
    The --verbose option originally in the package is changed to a --info
    option. This option prints out a little more information about a concept
    if it doesn't exist in the sources that are being used.

    The reason for this change is because a new --verbose option was added
    to UMLS-Interface and we wanted to keep the options consistent. Since I
    do not think too many people were using the old --verbose option, I
    don't *think* it will cause too many problems.

    The new --verbose option will print out path information to a file
    rather than having this be done automatically. This will reduce the
    amount of storage space required to hold the path information for a
    given set of sources and relations.

    A new --forcerun option was also added. This option will just create the
    what we call the index - the path information - required by the program
    without prompting you to continue. So this will assume you know what you
    are doing and will not question you :)

    I also fixed (I hope) how the modules are installed in the program in the utils/ directory. It should not
    require BerkeleyDB now unless you are running the
    UMLS::Similarity::vector measure.

  Changes from version 0.15 to 0.17
    Modified the output of umls-similarity to return only the pair of CUIs
    that obtain the highest similarity score when terms that map to multiple
    concepts are being used. I added a --allsenses option if you would like
    the original output that displayed all possible CUIs with their
    similarity score for a given pair of terms.

    Added the vector measure. This measure is in 'beta' version so there
    will be some modifications to it in the future.

    I also removed the print statements that were displayed when the Wu and
    Palmer (wup) measure was being used. Sorry about the noise.

  Changes from version 0.13 to 0.15
    Modified the output of umls-similarity so that if a concept doesn't
    exist you can tell which one it is. I also added a --verbose option
    which gives a little more information about the concept that doesn't

    Modified the module. The Wu and Palmer measure is twice the depth
    of the two concepts LCS divided by the product of the the depths of the
    individual concepts. I was using the minimum depths of these concepts
    where as I should have been using the depth of the path that contained
    the LCS itself.

  Changes from version 0.11 to 0.13
    Added two new semantic similarity measure modules: i) which is an
    implemantion of the semantic similarity measure described by Nguyan and
    Al-Mubaid, 2006 and ii) which is the Conceptual Distance
    measure described by Rada, et. al., 1989.

    Modified the utility program to incorporate the and similarity modules.

  Changes from version 0.09 to 0.11
    Allow the program to obtain the semantic similarity
    between a term and CUI as well as CUI-CUI and term-term pairs.

    Added some error checking to the program and the
    measure modules.

  Changes from version 0.07 to 0.09
    Removed the module and put in its place
    This does exactly what the original does but it also now
    accepts files and is much nicer.

  Changes from version 0.05 to 0.07
    Added the similarity measure described by Wu and Palmer (1994)

    Updated the documentation

  Changes from version 0.03 to 0.05
    Modified the Changelog directory

    Modified documenation - tried to get the misspelling and obvious errors

    Removed the HTML documentation

  Changes from version 0.01 to 0.03
    Modified documentation

    Modified the utils/ program