Validation and statistical power comparison of methods for analyzing free-response observer performance studies

Acad Radiol. 2008 Dec;15(12):1554-66. doi: 10.1016/j.acra.2008.07.018.

Abstract

Rationale and objectives: The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator.

Materials and methods: A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated.

Results: All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power.

Conclusion: Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural
  • Validation Study

MeSH terms

  • Data Interpretation, Statistical*
  • Humans
  • Image Interpretation, Computer-Assisted / methods*
  • Observer Variation*
  • Professional Competence*
  • ROC Curve*
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Task Performance and Analysis*