Validation and statistical power comparison of methods for analyzing free-response observer performance studies

Dev P Chakraborty

doi:10.1016/j.acra.2008.07.018

Validation and statistical power comparison of methods for analyzing free-response observer performance studies

Acad Radiol. 2008 Dec;15(12):1554-66. doi: 10.1016/j.acra.2008.07.018.

Author

Dev P Chakraborty¹

Affiliation

¹ Department of Radiology, University of Pittsburgh, 3520 Forbes Ave, Suite 109, Pittsburgh, PA 15261, USA. dpc10@pitt.edu

Abstract

Rationale and objectives: The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator.

Materials and methods: A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated.

Results: All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power.

Conclusion: Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.

Publication types

Evaluation Study
Research Support, N.I.H., Extramural
Validation Study

MeSH terms

Data Interpretation, Statistical*
Humans
Image Interpretation, Computer-Assisted / methods*
Observer Variation*
Professional Competence*
ROC Curve*
Reproducibility of Results
Sensitivity and Specificity
Task Performance and Analysis*

Abstract

Publication types

MeSH terms

Grants and funding