Rationale and objectives: The aim of this study is to compare the ratings of a group of readers that used two different rating scales in a receiver operating characteristic (ROC) study and to clarify some remaining issues when selecting a rating scale for such studies.
Materials and methods: We reanalyzed a previously conducted ROC study in which readers used both a 5-point and a 101-point scale to identify abdominal masses in 95 cases. Summary statistics include the distribution of scores by reader for each of the rating scales, the proportion of tied scores when using the 5-point scale that correctly resolved when using the 101-point scale and the proportion of paired normal-abnormal cases where the two rating scales resulted in a different selection of an abnormal case.
Results: As a group, the readers used 84 of the rating categories when using the 101-point scale but the categories used differed for individual readers. All readers tended to resolve the majority of ties on the 5-point scale in favor of correct decisions and to maintain correct decisions when a more refined scale was used.
Conclusions: The reanalysis presented here provides additional evidence that readers in a ROC study can adjust to a 101-point scale and the use of such a refined scale can increase discriminative ability. However, the decision of selecting an appropriate scale should also consider the underlying abnormality in question and relevant clinical considerations.