++ed by:

1 PAUSE user

Bridget McInnes
and 1 contributors


create-icfrequency.pl - This program sums the frequency counts of the CUIs from a specified set of sources in plain text.


This program sums the frequency counts of the CUIs from a specified set of sources in plain text. The CUIs are determined by mapping the words in the text to CUIs in the UMLS using the strings in the MRCONSO table or MetaMap.


Usage: create-icfrequency.pl.pl [OPTIONS] OUTPUTFILE INPUTFILE


The output file contains frequency counts for CUIs in the following format:

    SAB :: (include|exclude) <sources>
    REL :: (include|exclude) <relations>
    N :: NUMBER


File containing plain text.

Optional Arguments:


Output the semantic type of the CUIs and their frequency counts. The concepts are determined based on the source/relations in the configuration file so I would recommend using UMLS_ALL with the PAR/CHD/RB/RN relations unless you are certain of your source.


The text contains compounds depicted by an underscore. For example, the term blood_pressure would be counted as a single term rather than blood and then pressure.


Obtains the CUI counts using the term counts. This is the default.

--metamap TWO_DIGIT_YEAR

This option takes the two digit year of the version of metamap that is being used. For example, --metamap 10 would use ./metamap10 to call metamap to tag the text.

This obtains the CUI counts using MetaMap. This requires that you have MetaMap installed on your system. You can obtain this package:


These frequency counts are used to obtain the propagation counts. The format is similar to the output of count.pl from Text::NSP using the unigram option.

--config FILE

This is the configuration file. The format of the configuration file is as follows:

SAB :: <include|exclude> <source1, source2, ... sourceN>

For example, if we wanted to include on those CUIs in the MSH vocabulary:

SAB :: include MSH REL :: include RB, RN

or maybe use all the CUIs except those in MSH:

SAB :: exclude MSH

If you go to the configuration file directory, there will be example configuration files for the different runs that you have performed.

--username STRING

Username is required to access the umls database on MySql Note: if --username is specified the --password is also required.

--password STRING

Password is required to access the umls database on MySql Note: if --password is specified the --username is also required.

--hostname STRING

Hostname where mysql is located. DEFAULT: localhost

--database STRING

Database contain UMLS DEFAULT: umls


Sets the UMLS-Interface debug flag on for testing


Displays the quick summary of program options.


Displays the version information.


The Information Content (IC) is defined as the negative log of the probability of a concept. The probability of a concept, c, is determine by summing the probability of the concept ocurring in some text plus the probability its decendants occuring in some text:

For more information on how this is calculated please see the README file or the perldoc for create-icpropagation.pl


  • Perl (version 5.8.5 or better) - http://www.perl.org

  • UMLS::Interface - http://search.cpan.org/dist/UMLS-Interface

  • UMLS::Similarity - http://search.cpan.org/dist/UMLS-Similarity

  • Text::NSP - http://search.cpan.org/dist/Text-NSP

  • MetaMap - http://mmtx.nlm.nih.gov/


  If you have any trouble installing and using CreatePropagationFile, 
  please contact us via the users mailing list :
  You can join this group by going to:
  You may also contact us directly if you prefer :
      Bridget T. McInnes: bthomson at cs.umn.edu 

      Ted Pedersen : tpederse at d.umn.edu


 Bridget T. McInnes, University of Minnesota


Copyright (c) 2007-2011,

 Bridget T. McInnes, University of Minnesota
 bthomson at cs.umn.edu
 Ted Pedersen, University of Minnesota Duluth
 tpederse at d.umn.edu

 Siddharth Patwardhan, University of Utah, Salt Lake City
 Serguei Pakhomov, University of Minnesota Twin Cities

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.