The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Statistics::ROC - receiver-operator-characteristic (ROC) curves with nonparametric confidence bounds

SYNOPSIS

  use Statistics::ROC;

  my ($y)    = loggamma($x);
  my ($y)    = betain($x, $p, $q, $beta);
  my ($y)    = Betain($x, $p, $q);
  my ($y)    = xinbta($p, $q, $beta, $alpha);
  my ($y)    = Xinbta($p, $q, $alpha);
  my (@rk)   = rank($type, \@r);
  my (@ROC)  = roc($model_type,$conf,\@val_grp);
  

DESCRIPTION

This program determines the ROC curve and its nonparametric confidence bounds for data categorized into two groups. A ROC curve shows the relationship of probability of false alarm (x-axis) to probability of detection (y-axis) for a certain test. Expressed in medical terms: the probability of a positive test, given no disease to the probability of a positive test, given disease. The ROC curve may be used to determine an optimal cutoff point for the test.

The main function is roc(). The other exported functions are used by roc(), but might be useful for other nonparametric statistical procedures.

loggamma

This procedure evaluates the natural logarithm of gamma(x) for all x>0, accurate to 10 decimal places. Stirlings formula is used for the central polynomial part of the procedure. For x=0 a value of 743.746924740801 will be returned: this is loggamma(9.9999999999E-324).

betain

Computes incomplete beta function ratio

    Remarks:
    Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
                       log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))

    Incomplete beta function ratio:
                 I_x(p,q)=1/B(p,q) * \int_0^x t^{p-1}*(1-t)^{q-1} dt

    --> log(B(p,q)) has to be supplied to calculate I_x(p,q)
    log denotes the natural logarithm
        $beta = log(B(p,q))
        $x    = x
        $p    = p
        $q    = q
    The subroutine returns I_x(p,q). If an error occurs a negative value 
    {-1,-2} is returned.
Betain

Computes the incomplete beta function by calling loggamma() and betain().

xinbta

Computes inverse of incomplete beta function ratio

    Remarks:
 
    Complete beta function: B(p,q)=gamma(p)*gamma(q)/gamma(p+q)
                       log(B(p,q))=ln(gamma(p))+ln(gamma(q))-ln(gamma(p+q))

    Incomplete beta function ratio:
              alpha = I_x(p,q) = 1/B(p,q) * \int_0^x t^{p-1}*(1-t)^{q-1} dt

    --> log(B(p,q)) has to be supplied to calculate I_x(p,q)
    log denotes the natural logarithm
        $beta = log(B(p,q))
        $alpha= I_x(p,q)
        $p    = p
        $q    = q
    The subroutine returns x. If an error occurs a negative value {-1,-2,-3}
    is returned.
      
Xinbta

Computes the inverse of the incomplete beta function by calling loggamma() and xinbta().

rank

Computes the ranks of the values specified as the second argument (an array). Returns a vector of ranks corresponding to the input vector. Different types of ranking are possible ('high', 'low', 'mean'), and are specified as first argument. These differ in the way ties of the input vector, i.e. identical values, are treated:

  • high:

    replace ranks of identical values with their highest rank

  • low:

    replace ranks of identical values with their lowest rank

  • mean:

    replace ranks of identical values with the mean of their ranks

roc

Determines the ROC curve and its nonparametric confidence bounds. The ROC curve shows the relationship of "probability of false alarm" (x-axis) to "probability of detection" (y-axis) for a certain test. Or in medical terms: the "probability of a positive test, given no disease" to the "probability of a positive test, given disease". The ROC curve may be used to determine an "optimal" cutoff point for the test.

The routine takes three arguments:

(1) type of model: 'decrease' or 'increase', this states the assumption that a higher ('increase') value of the data tends to be an indicator of a positive test result or for the model 'decrease' a lower value.

(2) two-sided confidence interval (usually 0.95 is chosen).

(3) the data stored as a list-of-lists: each entry in this list consits of an "value / true group" pair, i.e. value / disease present. Group values are from {0,1}. 0 stands for disease (or signal) not present (prior knowledge) and 1 for disease (or signal) present (prior knowledge). Example: @s=([2, 0], [12.5, 1], [3, 0], [10, 1], [9.5, 0], [9, 1]); Notice the small overlap of the groups. The optimal cutoff point to separate the two groups would be between 9 and 9.5 if the criterion of optimality is to maximize the probability of detection and simultaneously minimize the probability of false alarm.

Returns a list-of-lists with the three curves: @ROC=([@lower_b], [@roc], [@upper_b]) each of the curves is again a list-of-lists with each entry consisting of one (x,y) pair.

Examples

   $,=" ";
   print loggamma(10), "\n";
   print Xinbta(3,4,Betain(.6,3,4)),"\n";
   
   @e=(0.7, 0.7, 0.9, 0.6, 1.0, 1.1, 1,.7,.6);
   print rank('low',@e),"\n";
   print rank('high',@e),"\n";
   print rank('mean',@e),"\n";

   @var_grp=([1.5,0],[1.4,0],[1.4,0],[1.3,0],[1.2,0],[1,0],[0.8,0],
          [1.1,1],[1,1],[1,1],[0.9,1],[0.7,1],[0.7,1],[0.6,1]);
   @curves=roc('decrease',0.95,@var_grp);
   print "$curves[0][2][0]  $curves[0][2][1] \n";

AUTHOR

Hans A. Kestler, hans.kestler@uni-ulm.de or h.kestler@ieee.org

SEE ALSO

Perl/Tk userinterface for drawing ROC curves (requires installed Tk and X11 on MacOS X).

R.A. Hilgers, Distribution-Free Confidence Bounds for ROC Curves (1991), Meth Inform Med, 30:96-101

Algorithm 291, Logarithm of the gamma function. Collected Algorithms of the ACM, Vol II, 1980

Numerical Recipes in C, second edition, by Press, Teukolsky, Vetterling and Flannery, Cambridge University Press, 1992.

G.W. Cran, K.J. Martin and G.E. Thomas (1977).Remark AS R19 and Algorithm AS109, A Remark on Algorithms AS 63: The Incomplete Beta Integral AS 64: Inverse of the Incomplete Beta Function Ratio, Appl Statist, 26:111-114.

K.J. Berry, P.W. Mielke, Jr and G.W. Cran (1990) Algorithm AS R83, A Remark on Algorithm AS 109: Inverse of the Incomplete Beta Function Ratio, Appl Statist, 39:309-310.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 744:

You forgot a '=back' before '=head2'