Abstract
Does subtraction scintigraphy improve the diagnostic utility of scintigraphic evaluation in acute lower gastrointestinal hemorrhage? Methods: This research was a retrospective clinical study using a repeat-measures design of randomized control and experimental groups. A single patient dataset provided both the control group (conventional scintigraphy) and the experimental group (conventional and subtraction techniques). Forty-nine raw 99mTc-red blood cell studies were randomized and interpreted by 4 independent physicians as conventional scintigraphy data only (round 1). The conventional scintigraphy studies were combined with subtraction images and randomized for reinterpretation (round 2). Results: Although there was a decrease in the mean, no statistically significant difference was noted between the mean time to bleed detection between interpretive rounds 1 and 2 (P = 0.524). The addition of subtraction scintigraphy to the interpretation process changed the outcome from “probably present” to “absent” for 14% of patients and from “equivocal” to “absent” for another 12%, and this change had a marked effect on the false-positive rate. The false-positive rate decreased from 9.6% in round 1 to 3.6% in round 2. Receiver operator characteristic analysis showed that combining conventional scintigraphy with subtraction scintigraphy improved test performance. Conclusion: False-positive studies can be reduced by using subtraction scintigraphy in conjunction with conventional scintigraphy in the interpretive process.
- bowel hemorrhage
- gastrointestinal tract bleeding
- subtraction scintigraphy
- localization
- lower gastrointestinal hemorrhage
Subtraction techniques have been used in nuclear medicine to improve image contrast in 99mTc-red blood cell scintigraphic evaluation of acute lower gastrointestinal hemorrhage (1,2). We recently performed phantom experiments to compare the performance of a variety of subtraction techniques in lower gastrointestinal hemorrhage (3). Under ideal conditions in a phantom, reference subtraction scintigraphy, in which the first frame acquired is used to represent a background image and is subsequently subtracted from each individual image (3), was superior to other subtraction techniques, comparable to conventional scintigraphy in uniform background areas, and superior to conventional scintigraphy when a bleed was superimposed on a small vascular structure. Alternate sequential subtraction performed better than sequential subtraction, as reported previously (4–6). Although the phantom experimentation indicated a role for both reference subtraction and alternate sequential subtraction as an adjunct to conventional scintigraphy in the evaluation of acute lower gastrointestinal hemorrhage, an evaluation of the diagnostic efficacy of these subtraction techniques in a clinical population was warranted.
The research question to be answered was, Does reference subtraction scintigraphy or alternate sequential subtraction scintigraphy provide earlier detection, more accurate localization, greater diagnostic utility, and greater clinician confidence than conventional scintigraphic imaging alone in acute lower gastrointestinal hemorrhage?
MATERIALS AND METHODS
Study Design
This investigation was a retrospective clinical study using a repeat-measures design of randomized control and experimental groups. This design allowed evaluation of the response variables by manipulating the explanatory variables within each patient dataset; thus, a single patient dataset provided both the control group (conventional processing) and the experimental group (conventional, reference subtraction, and alternate sequential subtraction processing). The major disadvantage associated with this design is in observer interpretation, for which the order or sequence of observations may confound the results. Consequently, all datasets for interpretation were randomized before observer interpretation.
This investigation was approved by the Ethics in Human Research Committee of Charles Sturt University.
Data Acquisition
A total of 46 patients were included in the sample, 4 of whom had undergone 2 separate lower gastrointestinal hemorrhage scintigraphy procedures, for a total of 50 studies. One patient study was excluded because of data corruption, leaving 49 cases. All data were acquired on a Prism γ-camera (Philips) using an Odyssey computer (Philips). Acquisition parameters for all datasets included a 128 × 128 matrix and a 60-s-per-frame continuous dynamic acquisition. Although the standard protocol was to acquire data until 60 min after intravenous administration of the 99mTc-red blood cells, an early detection of a bleed site justified earlier termination of data acquisition for some patients. Early termination of the acquisition (before 60 min) occurred in 55.1% (27/49) of studies. The mean total acquisition time was 49.1 min (95% CI, 45.5–52.6 min), with a range of 26–60 min. A rapid 3-s angiographic-phase dynamic acquisition preceded the 60-s dynamic acquisition in 87.8% (43/49) of studies. Studies were generally performed using an in vitro 99mTc-red blood cell label prepared through a commercially available kit; however, the retrospective and masked data without accompanying history and reports made confirmation of the labeling procedure for individual patients impossible. The modified in vitro method (“in vivtro”) would be the alternative method of choice.
Data Processing
The 49 raw patient datasets were displayed conventionally without using subtraction (conventional scintigraphy), using reference subtraction, and using alternate sequential subtraction. Subtraction techniques have previously been described in phantom analysis (3). Studies were randomized and interpreted as conventional scintigraphy data only in the first instance by 4 independent physicians. The conventional, reference subtraction, and alternate sequential subtraction studies were subsequently combined and randomized for reinterpretation. Although studies were randomized within each of the 2 pools of data outlined above, conventional scintigraphy studies were reported before the combined conventional scintigraphy and subtraction data to remove possible bias. The nature of the study and the interpretation process might have allowed remembered subtraction information to aid in the interpretation of conventional scintigraphy studies if the order had been reversed and if unique interpretation dilemmas had triggered that memory.
For each set of initial conventional-scintigraphy patient files, the interpreting physicians reported the presence or absence of a bleed, the frame number on which the bleed was first identified, the site of the bleed, the confidence with which the bleed was detected, and the confidence with which the location was determined. The presence or absence of a bleed was rated using the following 5-point scale to facilitate receiver operator characteristic (ROC) analysis: “definitely present,” “probably present,” “equivocal,” “probably absent,” or “definitely absent.” The bleed site was localized using the following 10 segments: upper gastrointestinal tract, small bowel, cecum, ascending colon, hepatic flexure, transverse colon, splenic flexure, descending colon, sigmoid colon, and rectum. The confidence with which the interpreters believed that their localization was sufficiently accurate to guide intervention was requested on an open scale from 100% to 0%. In addition to reporting data for the lone conventional-scintigraphy studies, the interpreting physicians rated the relative importance or contribution of each processing method (conventional, reference subtraction, and alternate sequential subtraction) to the detection, the localization, and the interpretation confidence. The percentage contributions of the 3 parameters needed to total to 100%.
Statistical Analysis
Statistical significance was calculated using χ2 analysis for nominal data and the Student t test for continuous data. The Pearson χ2 test was used for categoric data with a normal distribution, and the G2 likelihood ratio χ2 test was used for categoric data without a normal distribution. F test ANOVAs were used to determine statistically significant differences within grouped data. A P value of less than 0.05 was considered significant.
Differences between independent means and proportions were calculated with a 95% confidence interval (CI). ROC analysis was performed using JROCFIT software, version 1.0.2, which was developed by Dr. John Eng at Johns Hopkins University. Interobserver correlation was evaluated with χ2 analysis, and interobserver reliability was measured using Cohen's κ-coefficient. The matched-pairs t test was used to assess agreement between pairs.
RESULTS
The mean age of the study population was 68.9 y (95% CI, 64.0–73.9 y), with a range of 18.8–92.8 y and a median of 71.8 y. Women represented 61.2% (30/49) of the study population, and men 38.8% (19/49), although no statistically significant variation in sex distributions was noted (P = 0.11). No statistically significant difference in mean age was noted between the sexes (P = 0.50).
Just 7 studies (14.3%) were positive for gastrointestinal tract bleeding as determined by the post hoc consensus of an expert panel. The retrospective and masked nature of the data collection prevented use of other gold standards. The remaining 42 patient studies were not actively bleeding at detectable rates during the data acquisition. The mean age of the positively bleeding patients was 79.2 y, with a range of 69.1–84.4 y. A statistically significant increase was demonstrated between the mean age of patients with positive studies and the mean age of the study population (P = 0.024). Men represented 42.9% of positive studies, and women represented 57.1% of positive studies; these values were not significantly different from the sex distribution of patients in the study population (P = 0.827).
Interobserver Agreement
A statistically significant variation in the confidence of diagnosis (definitely present, probably present, equivocal, probably absent, or definitely absent) was noted between observers in interpretive rounds 1 and 2 (each, P < 0.001) (Table 1). A statistically significant variation in diagnostic outcome (true-positive, false-positive, false-negative, or true-negative) was noted between observers (P = 0.016) (Table 2).
Detection Confidence
Twenty-two (11.2%) of 196 studies were reported as “definitely present” for lower gastrointestinal hemorrhage in interpretive round 1. A further 8.2% (16/196) were reported as “probably present,” 3.1% (6/196) as “equivocal,” 27.0% (53/196) as “probably absent,” and 50.5% (99/196) as “definitely absent.” Only 14.3% of studies (28/196) were actually positive for an active bleed. Detection confidence improved during interpretive round 2, with a decrease in false-positive studies, the elimination of equivocal studies, and an increase in negative studies reported as “definitely absent” to 59.7% (117/196). A further 10.7% (21/196) were reported as “definitely present,” 5.1% (10/196) as “probably present,” and 24.5% (48/196) as “probably absent.” A statistically significant variation was noted between interpretive rounds 1 and 2 (P < 0.001) (Table 3).
Detection Time
Although there was a decrease in the mean time to bleed detection between interpretive rounds 1 (6.0 min, with a 95% CI of 4.4–7.6 min) and 2 (5.5 min, with a 95% CI of 3.7–7.3 min), no statistically significant difference was noted between the means (P = 0.524). This finding was supported by the overlap in 95% CIs and the matched-pairs t test (P = 0.190).
Localization Confidence
Although there was an increase in mean localization confidence between interpretive rounds 1 (72.1%, with a 95% CI of 64.1%–80.1%) and 2 (75.2, with a 95% CI of 66.7%–83.7%), the difference was not statistically significant (P = 0.440). This finding was supported by the overlap in 95% CIs. The matched-pairs t test, however, indicated a statistically significant difference between matched pairs (P = 0.029). This discordance is possibly explained by the removal of 10 false-positive findings from the analysis (i.e., no matched pair in round 2). As a result, mean observer confidence increased to 74.1% and 79.5% for interpretive rounds 1 and 2, respectively, resulting in a mean improvement in confidence of 5.3% in round 2. The significance of this observation is further supported by the 95% CI of the mean difference (0.6%–10.1%), not including zero.
Contribution to Detection
Statistically significant differences were noted in the degree to which each of the 3 processing methods contributed to bleed detection (all, P < 0.001) (Table 4). The mean contribution of reference subtraction to bleed detection was significantly lower for a detection confidence of “definitely absent” (1.5%) than for “definitely present” (9.5%; P < 0.001) or “probably present” (8.3%; P = 0.011). Conversely, the mean contribution of alternate sequential subtraction was significantly higher for a detection confidence of “definitely absent” (14.3%) than for “definitely present” (4.0%; P < 0.012) or “probably present” (2.8%; P = 0.045).
The mean contribution of reference subtraction to bleed detection was significantly higher for true-positive studies (11.1%) than for false-positive (2.9%; P = 0.010) or true-negative (1.7%; P < 0.001) studies. The degree to which reference subtraction contributed to bleed detection differed significantly between true-positive studies (11.1%) and false-negative studies (0%; P = 0.042). The mean contribution of alternate sequential subtraction to bleed detection was significantly higher for true-negative studies (14.5%) than for false-positive (1.4%; P = 0.042) or true-positive (4.6%; P = 0.010) studies.
Contribution to Localization
Statistically significant differences were noted in the degree to which each of the 3 processing methods contributed to bleed localization (Table 5). No statistically significant relationships were found in the degree to which any of the 3 methods contributed to bleed localization with respect to bleed confidence or diagnostic outcome.
Contribution to Interpretive Confidence
Statistically significant differences were noted in the degree to which each of the 3 processing methods contributed to interpretive confidence (all, P < 0.001) (Table 6). The mean contribution of conventional scintigraphy to interpretive confidence was significantly lower for a detection confidence of “probably absent” (75.2%) than for “definitely present” (87.9%; P = 0.015) or “definitely absent” (84.8%; P = 0.015). The mean contribution of reference subtraction was significantly lower for a detection confidence of “definitely absent” (2.6%) than for “definitely present” (9.8%; P < 0.001), “probably present” (10.0%; P < 0.001), or “probably absent” (9.0%; P < 0.001).
Conversely, the mean contribution of alternate sequential subtraction to interpretive confidence was significantly higher for detection confidences of “definitely absent” and “probably absent,” accounting for deficiencies highlighted in reference subtraction and conventional scintigraphy, respectively. The contribution was significantly higher for “definitely absent” (12.7%) than for “definitely present” (2.4%; P = 0.005) or “probably present” (3.0%; P = 0.050), and the contribution was significantly higher for “probably absent” (15.8%) than for “definitely present” (2.4%; P = 0.003) or “probably present” (3.0%; P = 0.026).
The mean contribution of reference subtraction to interpretive confidence was significantly higher for the diagnostic outcome of true-positive studies (12.2%) than for true-negative (3.9%; P < 0.001), false-positive (1.4%; P = 0.002), or false-negative (0.0%; P = 0.039) studies. The mean contribution of alternate sequential subtraction was significantly higher for true-negative studies (13.5%) than for true-positive (3.2%; P = 0.002) or false-positive (1.4%; P = 0.043) studies.
ROC Analysis
Combining conventional scintigraphy, reference subtraction scintigraphy, and alternate sequential subtraction scintigraphy improved test performance from 0.933 to 0.936, although there was only a subtle difference in the overall area under the curve in favor of this combination. In the combined data, the accuracy of conventional scintigraphy alone was 89.8%, whereas the accuracy of the combined methods was 95.4% (P = 0.017). ROC analysis was also performed for scan evidence for each individual display method (Table 7). Although reference subtraction demonstrated fewer false-positive studies than conventional scintigraphy, each false-positive in reference subtraction corresponded to greater certainty in observer confidence, which reduced the area under the ROC curve.
DISCUSSION
The variable interobserver agreement found in this investigation was not thought to be a threat to its statistical power. In fact, the implications strengthen the external validity of the results. The lack of interobserver agreement reflects the lack of consistency in the interpretive confidence and capability of physicians—a product of both their variable expertise and the variable frequency with which they perform the procedure. This assertion is further supported by the Australian Industry Survey, which indicated that fewer than 40% of departments use procedures consistent with the minimum recommendations of the professional body (7). Furthermore, interpreting the variations in interobserver agreement offers little insight into the absence of an industry-recognized gold standard. What these results do offer is insight into the difficulties of evaluating scintigraphic studies of patients with acute hemorrhaging of the lower gastrointestinal tract. The results thus validate efforts to improve interpretive power and confidence.
The interobserver differences extend to the relative degree to which the 3 processing methods contributed to detection, localization, and interpretive confidence. At one end of the spectrum, observer 1 attributed 100% of judgment and decision making to conventional scintigraphy. Further investigation revealed that the ratings reflected a combination of the extensive expertise of this observer in lower gastrointestinal hemorrhage scintigraphy and the feeling of the observer that subtraction data were more confusing than helpful. At the other end of the spectrum, observer 3 reported that reference subtraction contributed greatly to bleed localization and that alternate sequential subtraction contributed greatly to both bleed detection and interpretive confidence. This finding was reflected in the elimination of all false-positive results for this observer in interpretive round 2.
Subtraction scintigraphy interpreted in conjunction with conventional scintigraphy significantly affected diagnostic outcomes and detection confidence. Specifically, the addition of subtraction scintigraphy to the interpretation process changed the outcome from “probably present” to “absent” for 14% of patients and from “equivocal” to “absent” for another 12%, markedly affecting the false-positive rate (Fig. 1). This change is important not only from the perspective of reduced false-positive findings from 9.6% to 3.6% (unnecessary angiograms) but also because it may allow more accurate guidance of surgical intervention.
Although there was a decrease in the mean, no statistically significant difference was noted between the mean time to bleed detection between interpretive rounds 1 and 2 (P = 0.524). The lack of improved time to detection in interpretive round 2 might represent a limitation of the subtraction techniques used. The mean time to detection was low, indicating that studies in this population tended to be positive early. Consequently, the first subtraction frame may represent an interval equal to (frame 3 or later) or later than (frames 1 and 2) the conventional dataset. We were, however, somewhat surprised to observe no statistically significant difference between the time to detection for true-positive and false-positive studies. This finding perhaps highlights the difficulties in imaging the abdomen with 99mTc-red blood cells; the causes of false-positive findings are prominent and convincing early on.
Unfortunately, there were instances in which negative studies were confounded by the reference subtraction findings. Four of 16 false-positives were determined by observers to have supporting evidence of a bleed on reference subtraction scintigraphy, although in all cases, alternate sequential subtraction supported normality. A further 4 of 25 negative studies had reference subtraction evidence of a bleed in the absence of such evidence on either conventional or alternate sequential subtraction scintigraphy. Thus, reference subtraction introduced the potential for false-positive findings in 19.0% (8/42) of negative studies. This finding is not entirely surprising, given the limitation of reference subtraction with respect to the wide window of opportunity for physical (e.g., motion), radiopharmaceutical (e.g., label breakdown), or physiologic (e.g., excretion) changes between the reference frame and later acquisition frames. More significantly, alternate sequential subtraction never suggested that a study might be falsely positive (0/43).
For true-positive findings, subtraction scintigraphy offered little in terms of improving detection (Fig. 2). Six true-positive studies were identified as positive by all 4 observers in both interpretive round 1 and interpretive round 2. Although not improving detection, subtraction may have improved observer confidence. Perhaps the most important contribution of subtraction scintigraphy to bleed detection was the elimination of an accumulation of radiotracer as a bleeding source. Of those false-positive studies in round 1 that became true-negative studies in round 2, all 10 had no evidence of bleeding on alternate sequential subtraction, whereas 6 had no evidence of bleeding on reference subtraction. More important, 100% of false-positive findings in interpretive round 2 had no evidence of bleeding on either alternate sequential subtraction or reference subtraction. That is, greater reliance on the findings of subtraction scintigraphy, in particular alternate sequential subtraction, would have eliminated all false-positives and false-negatives (100% sensitivity and 100% specificity).
Reference subtraction demonstrated a significantly lower mean contribution to bleed detection for a detection confidence of “definitely absent” than for “definitely present” (P < 0.001) or “probably present” (P = 0.011). Alternate sequential subtraction, on the other hand, demonstrated a significantly higher mean contribution to bleed detection for a detection confidence of “definitely absent” than for “definitely present” (P < 0.012) or “probably present” (P = 0.045). These results suggest that reference subtraction plays an important role in identifying positive bleeding sites, whereas alternate sequential subtraction plays an important role in confirming the normality of findings.
Reference subtraction also demonstrated a significantly higher mean contribution to bleed detection for true-positive studies than for false-positive (P = 0.010) or true-negative (P < 0.001) studies. Alternate sequential subtraction demonstrated a significantly higher mean contribution to bleed detection for true-negative studies than for false-positive (P = 0.042) or true-positive (P = 0.010) studies. These results further support the proposition that reference subtraction plays a more important role in identifying positive bleeding sites, whereas alternate sequential subtraction plays a more important role in confirming the normality of findings. These relationships are noted despite the inclusion of observer 1 data, in which all contributions of reference subtraction and alternate sequential subtraction were recorded as 0%.
Reference subtraction demonstrated a significantly higher mean contribution to interpretive confidence for the diagnostic outcome of true-positive studies than for the diagnostic outcome of true-negative (P < 0.001), false-positive (P = 0.002), or false-negative (P = 0.039) studies. Alternate sequential subtraction demonstrated a significantly higher mean contribution to interpretive confidence for true-negative studies than for true-positive (P = 0.002) or false-positive (P = 0.043) studies. These results further support the proposition that reference subtraction plays an important role in identifying positive bleeding sites, whereas alternate sequential subtraction plays a more important role in confirming the normality of findings. Again, these observations are noted despite the fact that observer 1 uniformly attributed 0% to reference subtraction and alternate sequential subtraction.
CONCLUSION
The specificity of 99mTc-red blood cells has been reported to be poor because of the increased number of false-positive studies (8). False-positive studies can be reduced by using alternate sequential subtraction in conjunction with conventional scintigraphy in the interpretive process. Specificity can be improved by limiting the use of reference subtraction to bleed localization. Sensitivity might be improved by putting more emphasis on the concordance of positive findings between alternate sequential subtraction and reference subtraction. 99mTc-red blood cell scintigraphy should be performed to provide a protracted period of imaging. Both reference subtraction and alternate sequential subtraction can be used to improve the target-to-background ratio and improve diagnostic efficacy. Alternate sequential subtraction provides a superior interpretive tool in the clinical environment.
Acknowledgments
This clinical study was supported by a Charles Sturt University Small Grant. We thank Mitchell Holmes, research assistant, for help with data analysis.
Footnotes
-
COPYRIGHT © 2007 by the Society of Nuclear Medicine, Inc.
References
- Received for publication October 9, 2006.
- Accepted for publication January 23, 2007.