Abstract
This article is the second part of continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance and variance are discussed. Throughout the article, actual statistics are pulled from published literature. Part two begins by explaining two methods for quantifying interpretive accuracy: inter-reader and intra-reader reliability. Agreement among readers can simply be expressed by percentage. However, Cohen’s kappa is a more robust measure of agreement that accounts for chance. The higher the kappa score, the more agreement between readers. When three or more readers are being compared, Fleiss’ kappa is used. Significance testing determines if the difference between two conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a P-value. Calculation of P-value is beyond the scope of this review. However, knowing how to interpret P-values in important for understanding scientific literature. Generally, a P-value less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation, confidence intervals and standard error explain the dispersion of data around a mean of a sample drawn from a population. Standard deviation is commonly reported in the literature. A small standard deviation indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within one standard deviation, 95% will fall between two standard deviations, and 99.7% of the data will fall within three standard deviations. Confidence intervals define the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being measured. A wide confidence interval indicates that if the experiment were repeated multiple times in other samples, the measured statistic would lie within a wide range of possibilities. Confidence intervals rely on the calculation of another metric called the standard error.