 
ROC Analysis in Medical Imaging
1. MaximumLikelihood Estimation of ROC Curves from ContinuouslyDistributed Data We have shown that truthstate runs in rankordered data constitute a natural categorization of continuouslydistributed test results for maximumlikelihood (ML) estimation of ROC curves. On this basis, we developed two new algorithms for fitting binormal ROC curves to continuouslydistributed data: a true ML algorithm (LABROC4) and a
quasiML algorithm (LABROC5) that requires substantially less computation with large datasets. Simulation studies indicated that both algorithms produced reliable estimates of the binormal ROC curve parameters a and b, the ROCarea index Az, and the standard errors of those estimates. (612)

2. Statistical Comparison of Two ROC Curve Estimates Obtained from PartiallyPaired Datasets We developed a new generalized method for ROC curve fitting and statistical testing that allows researchers to utilize all of the data collected in an experimental comparison of two diagnostic modalities, even if some patients have not been studied with both modalities. The corresponding algorithm, ROCKIT, subsumes our previous LABROC, CORROC and INDROC algorithms as special
cases. We tested ROCKIT on more than onehalf million computersimulated datasets of various sizes and configurations representing a range of population ROC curves. The algorithm successfully converged for more than 99.8% of all datasets studied. The Type I error rates of the new algorithm’s statistical test for differences in Az estimates were excellent for datasets typically encountered in practice, but diverged from alpha for datasets arising from some extreme situations. (611)

3. An ROC Partial Area Index for Highly Sensitive Diagnostic Tests The area under an ROC curve that has been fit by the conventional binormal model (Az) is widely used as an index of diagnostic performance. However, Az may not be a meaningful summary of clinical diagnostic performance when high sensitivity must be maintained clinically. We developed a new ROC partial area index to summarize an ROC curve only in a highsensitivity region (see
figure). The mathematical formulation of this partial area index was derived from the conventional bivariate binormal model for ROC analysis. Statistical tests of apparent differences in this index were formulated analogous to that of the conventional Az. We validated one common statistical test involving the partial area index using computer simulations under realistic conditions. We also presented an example in mammography illustrating a situation in which the partial area index is more meaningful than the conventional Az index in measuring clinical diagnostic performance. We conclude that the partial area index can be used as a
more meaningful alternative to the conventional Az index for highly sensitive diagnostic tests. (507)


4. VarianceComponent Modeling in the Analysis of ROC Index Estimates Each way of replicating an observer study — e.g., by using the same readers but resampling cases, resampling readers but using the same cases, etc. — produces distinct variances and correlations of the estimates of an ROC index value such as Az. We systematized and clarified the large number of such variances and correlations by relating them to distinct combinations of the components of variance of a
simple multivariate statistical model. Moreover, we introduced a notation that identifies both the method of replication and, when estimate differences are examined, the estimate pairing scheme. We and other investigators have found this approach to be a useful tool for modeling and understanding the distinct sources of variation that contribute to empirical variance and correlation in statistical analyses of ROC index estimates. (577, 747, 749, 797)

5. Multivariate Analysis of ROC Data by “Jackknifing” The conclusion drawn from a statistical test of the difference between two ROC estimates can be generalized to both other cases and other readers only if the statistical test takes both casesample variation and readersample variation into account. In collaboration with Kevin Berbaum and the late Donald Dorfman at the University of Iowa, we developed and validated the first practical approach to testing the
statistical significance of differences between ROC index estimates that accounts for both of these sources of variation. Our approach is based upon a statistical technique called “jackknifing,” in which the relevant ROC estimate (e.g., Az value) for each reader is recalculated after each case is deleted individually from the dataset and then replaced. Extensive simulation studies demonstrated that the approach produced accurate or slightly conservative Pvalues for datasets that contain on the order of 100 cases or more. (338, 578)

6. Proper Binormal ROC Curves: Theory and MaximumLikelihood Estimation The conventional binormal model, which assumes that a pair of latent normal distributions underlies ROC data, has been used successfully for many years to fit smooth ROC curves. However, if the conventional binormal model is used for small datasets and/or ordinal category data with poorly allocated category boundaries, a “hook” in the fitted ROC may be evident near the lowerleft or upperright corner
of the unit square. To overcome this curvefitting artifact, we developed a “proper” binormal model and a new algorithm for maximumlikelihood (ML) estimation of the corresponding ROC curves. Extensive simulation studies have shown the algorithm to be highly reliable. ML estimates of the proper and conventional binormal ROC curves are virtually identical when the conventional ROC shows no hook, but the proper binormal curves have monotonic slope for all datasets, including those for which the conventional binormal model produces degenerate fits. (575, 679)


7. An “Optimal” Method for Combining Two Correlated Diagnostic Assessments with Application to ComputerAided Diagnosis Some computeraided diagnosis (CAD) methods produce quantitative diagnostic assessments, such as the likelihood of malignancy of a breast lesion. Radiologists who use this type of computer aids must combine the computer's quantitative assessment with that of their own. No theoretical or empirical methods are currently available to help radiologists
perform this task. Results of recent observer studies showed that while CAD helped radiologists improve performance, radiologists' ad hoc performance tended to be inferior to that of the computer alone, indicating that they were unable to use computer aids optimally. We developed a general method to combine two correlated diagnostic assessments. We calculated a likelihood ratio based on a bivariate binormal model that describes the joint probability density of the latent decision variables from two sources of diagnostic assessments. To the extent that the bivariate binormal model is valid and that the model's parameters can
be estimated reliably, results that we obtain in this way would be optimal because the ideal observer uses this likelihood ratio in combining the diagnostic assessments. Analyses of observer study data indicated that this method could produce better performance than that achieved by radiologists when they used computer aids in an ad hoc way. Therefore, this method can potentially help radiologists more effectively use quantitative computer analysis results and surpass the accuracy of the computer. (764)

8. Software for ROC Analysis Since the late 1970s, we have been developing computer software for maximumlikelihood estimation of ROC curves and for testing the statistical significance of differences between ROC estimates. Most of this software is available in versions for the Windows, Macintosh and UNIX/LINUX operating systems. Our software, which is provided without charge to all investigators who request it, is now widely accepted as the standard for ROC
data analysis in medical imaging application, and registered copies of the software are employed in more than 6000 laboratories in more than 40 countries. The software is available from the Internet page at ROC Software. (119, 171, 210, 260, 575, 611, 612, 679) 
