Rationale and objectives: The aim of this work was to validate and compare the statistical powers of proposed methods for analyzing free-response data using a search-model-based simulator.
Materials and methods: A free-response data simulator is described that can model a single reader interpreting the same cases in two modalities, or two computer-aided detection (CAD) algorithms, or two human observers, interpreting the same cases in one modality. A variance components model, analogous to the Roe and Metz receiver-operating characteristic (ROC) data simulator, is described; it models intracase and intermodality correlations in free-response studies. Two generic observers were simulated: a quasi-human observer and a quasi-CAD algorithm. Null hypothesis (NH) validity and statistical powers of ROC, jackknife alternative free-response operating characteristic (JAFROC), a variant of JAFROC termed JAFROC-1, initial detection and candidate analysis (IDCA), and a nonparametric (NP) approach were investigated.
Results: All methods had valid NH behavior over a wide range of simulator parameters. For equal numbers of normal and abnormal cases, for the human observer, the statistical power ranking of the methods was JAFROC-1 > JAFROC > (IDCA approximately NP) > ROC. For the CAD algorithm, the ranking was (NP approximately IDCA) > (JAFROC-1 approximately JAFROC) > ROC. In either case, the statistical power of the highest ranked method exceeded that of the lowest ranked method by about a factor of two. Dependence of statistical power on simulator parameters followed expected trends. For data sets with more abnormal cases than normal cases, JAFROC-1 power significantly exceeded JAFROC power.
Conclusion: Based on this work, the recommendation is to use JAFROC-1 for human observers (including human observers with CAD assist) and the NP method for evaluating CAD algorithms.