Abstract
Standardized uptake values (SUVs) have been widely used in the diagnosis of malignant tumors and in clinical trials of tumor therapies as semiquantitative metrics of tumor 18F-FDG uptake. However, SUVs for small lesions are liable to errors due to partial-volume effect and statistical noise. The purpose of this study was to evaluate the reproducibility and accuracy of maximum and peak SUV (SUVmax and SUVpeak, respectively) of small lesions in phantom experiments. Methods: We used a body phantom with 6 spheres in a quarter warm background. The PET data were acquired for 1,800 s in list-mode, from which data were extracted to generate 15 PET images for each of the 60-, 90-, 120-, 150-, and 180-s scanning times. The SUVmax and SUVpeak of the hot spheres in the 1,800-s scan were used as a reference (SUVref,max and SUVref,peak). Coefficients of variation for both SUVmax and SUVpeak in hot spheres (CVmax and CVpeak) were calculated to evaluate the variability of the SUVs. On the other hand, percentage differences between SUVmax and SUVref,max and between SUVpeak and SUVref,peak were calculated for evaluation of the accuracy of SUV. We additionally examined the coefficients of variation of background activity and the percentage background variability as parameters for the physical assessment of image quality. Results: Visibility of a 10-mm-diameter hot sphere was considerably different among scan frames. The CVmax and CVpeak increased as the sphere size became smaller and as the acquisition time became shorter. SUVmax was generally overestimated as the scan time shortened and the sphere size increased. The SUVmax and SUVpeak of a 37-mm-diameter sphere for 60-s scans had average positive biases of 28.3% and 4.4%, compared with the reference. Conclusion: SUVmax was variable and overestimated as the scan time decreased and the sphere size increased. In contrast, SUVpeak was a more robust and accurate metric than SUVmax. The measurements of SUVpeak (or SUVpeak normalized to lean body mass) in addition to SUVmax are desirable for reproducible and accurate quantification in clinical situations.
Whole-body oncologic 18F-FDG PET is useful for detection and staging of various malignant tumors, monitoring of responses to therapy, and prognostic stratification (1–4). Standardized uptake values (SUVs) have been widely used for diagnosing malignant tumors and for clinical trials of tumor therapies as semiquantitative metrics of tumor 18F-FDG uptake (1,5,6). Recently, SUV has been used to monitor metabolic response to therapy (1,6). Although SUV is easy to derive tumor metabolic changes (5), accurate and reproducible quantification is crucial to the clinical evaluation of sequential PET/CT imaging.
Positron emission itself is characterized statistically by a Poisson distribution (7). Even if the same object is scanned, the same image cannot be obtained because of the statistical fluctuation (8). It is important to reduce the variability and to improve the accuracy of SUV by sufficient scan duration (8). Under the current guideline (9,10) on phantom experiments, the PET image quality is generally evaluated by the percentage contrast and the percentage background variability (N10 mm). Various organizations including the Society of Nuclear Medicine and Molecular Imaging/Clinical Trials Network (11), the European Association of Nuclear Medicine/EANM Research Ltd. (12), and the American College of Radiology/ACR Imaging Network (13) evaluated the accuracy of SUV with the aim of the harmonization of quantitative values (14). Although the percentage contrast, N10 mm, and the accuracy of SUV are important metrics of good image quality, the reproducibility of small lesion uptake is also essential. In general, maximum SUV (SUVmax) is considered to be overestimated in low-count-statistics images because the highest pixel value tends to present high values due to noise (15). A previous clinical study by Lodge et al. (16) indicated that SUVmax was overestimated in low-count-statistics images. Boellaard et al. (15) also reported that SUVmax showed positive bias for images with higher noise. However, the relationship among scan time, image noise, lesion size, and variability of SUVmax and peak SUV (SUVpeak) has not yet been investigated. The purpose of this study was to evaluate the reproducibility and accuracy of SUVmax and SUVpeak of small lesions using a phantom.
MATERIALS AND METHODS
Imaging Protocols
In this study, we used a Discovery-690 PET/CT scanner (GE Healthcare) and National Electrical Manufacturers Association (NEMA)/International Electrotechnical Commission body phantom (Data Spectrum Corp.). The PET scanner comprises a total of 13,824 lutetium yttrium-orthosilicate crystals with dimensions of 4.2 × 6.3 × 25 mm, covering an axial field of view of 15.7 cm and a transaxial field of view of 70 cm in diameter. The coincidence time window was 4.9 ns. The time-of-flight time resolution was 544.3 ps. The NEMA body phantom with a lung insert and 6 spheres of 37, 28, 22, 17, 13, and 10-mm inner diameter had a background activity of 2.65 kBq/mL at 15 min from scan start time. The activity level simulated an injection dose of 3.7 MBq/kg (9,10). The sphere-to-background ratio was 4:1. The whole inner volume of the phantom was 9,780 mL. The PET data were acquired for 30 min in list-mode and reconstructed using the ordered-subsets expectation maximization plus time-of-flight algorithm with 3 iterations and 8 subsets. Fifteen PET images were reconstructed for each scan time of 60, 90, 120, 150, and 180 s from the 30-min list-mode data (Fig. 1). The image matrix was 192 × 192 with a 3.12-mm pixel size. The display field of view was 60.0 cm. The PET image slice thickness was 3.27 mm. A gaussian filter of 4 mm in full width at half maximum was used as a postsmoothing filter. CT scanning was performed using the following parameters: 120 kV, 40 mA, 0.5-s tube rotation, and 5-mm slice collimation. The CT data were used for the attenuation correction.
Image Analysis
We analyzed the SUVmax and SUVpeak of all spheres using the PET Volume Computer Assisted Reading software (GE Healthcare). The SUVpeak was average SUV with a spheric volume of interest (VOI) (12-mm diameter) positioned so as to maximize the enclosed average activity. The SUVpeak of the 10-mm-diameter sphere (SUVpeak, 10) was not evaluated because the predetermined VOI with a 12-mm diameter was larger than the 10-mm-diameter sphere. Coefficients of variation (CVs) of the SUV of each sphere were calculated for evaluation of variability of the SUV as follows:where j is a diameter of the sphere, and SD is the standard deviation.
The SUVmax and SUVpeak of a j-mm diameter in 1,800-s scan data were defined as the reference SUVmax and SUVpeak (SUVref,max,j and SUVref,peak,j, respectively). Then, differences between the SUVmax,j and SUVref,max,j (%Diffmax,j) and between SUVpeak,j and SUVref,peak,j (%Diffpeak,j) were calculated for the evaluation of accuracy of the SUV as follows:The physical assessment of the PET image quality was additionally performed with the CV of background activity (CVbackground) and the percentage background variability (N10 mm) (9,10). We placed 12 circular regions of interest (ROIs) of 30 and 10 mm in diameter on the central slice and on ±1 and ±2 cm away from the central slice (total of 120 ROIs) in each PET image. The CVbackground was calculated using the data of 30-mm ROIs as follows:The N10 mm was calculated using the data of 10-mm ROIs as follows:where CB,10 mm is the mean measured activity in the ROI for the 10-mm-diameter sphere in the background 12 ROIs on the central slice. SD10 mm is the SD of the background ROI values for 10-mm-diameter sphere.
RESULTS
Figure 2 illustrates all the PET images for various scanning durations. The PET image quality was improved as the scanning time increased. The visibility of the 10-mm-diameter hot sphere was considerably different among the frames of the same scanning time. Figure 3A shows representative images with different SUVmax,10. Although both images were acquired for 60 s, SUVmax,10 varied from 1.92 to 2.85. Figure 3B shows a 3-dimensional graph of the activity of the 37-mm-diameter sphere in the 1,800-s scan image and in the 60-s scan image. A maximum value of a hot sphere in the low-count-statistics image was often higher than that in the high-count-statistics image (Fig. 3B).
The variability of SUVmax and SUVpeak of the hot spheres and that of mean SUV of the background in relation to the scanning time are shown in Table 1. The variability of both SUVmax and SUVpeak increased as the sphere size became smaller and as the scanning time became shorter. When the CVbackground was 10% or lower, all of the CVmax and CVpeak were lower than 10%. It took 150 s or longer scanning time for small spheres to have the CVmax of 10% or lower. On the other hand, all of the CVpeak were lower than 5% regardless of the scanning time and sphere size.
Table 2 and Figure 4 show the accuracy of both SUVmax and SUVpeak in relation to the scanning time and the sphere size. The %Diffmax showed that the SUVmax was generally overestimated as the scanning time shortened and the sphere size became larger. In particular, the SUVmax,37 by 60-s scan showed a 28.3% overestimation, compared with the SUVref,max,37. In contrast, the %Diffpeak showed minimal overestimation. The SUVpeak,37 was overestimated by 4.4%. A 180-s or longer scan time was required to achieve the recommended N10 mm of 6.3% for 2.65 kBq/mL background concentration in the guideline (Table 1). Although the N10 mm achieved the recommended value (10). the CVmax and %Diffmax,37 were 8.5% and 12.3%, respectively.
DISCUSSION
Our phantom study evaluated the reproducibility and accuracy of SUVmax and SUVpeak in hot spheres simulating 18F-FDG–avid tumors. To achieve the CV of 10% or lower, the required scanning time was 150 s or longer for SUVmax and 60 s or longer for SUVpeak. We found that SUVmax was variable and overestimated even in the images that satisfied the guideline-recommended image quality. Although many studies attempted to minimize technical factors affecting accuracy of SUV for the purpose of harmonization of quantitative values (5,12–14,17), minimizing the variability of SUVs is also important for multicenter PET studies such as clinical evaluation of new PET tracers and new applications of the PET technique (18). On the other hand, SUVpeak was more robust for statistical fluctuation than SUVmax.
The variability of SUVmax in hot spheres was higher as the scanning time shortened in this study. On images with high background variability, SUVmax in hot spheres showed high variability. Furthermore, the SUVmax resulted in large overestimations as the image noise increased. The SUVmax of the 37-mm-diameter sphere with 60-s scan time showed 28.3% overestimation, compared with that with 1,800-s scan time. These results were consistent with a previous simulation study reported by Boellaard et al. (15). Many studies have also reported that SUVmax in small lesions strongly depended on image noise (15,16,19–21). In this study, the variability of the SUVmax in hot spheres was higher as the hot sphere became smaller. The SUVmax in small lesions was underestimated because of partial-volume effect (21,22). The higher variability of SUVmax in small spheres must be due to the fluctuation in low count statistics in comparison with the large spheres. Many PET examinations have been performed to detect small lesions and to evaluate responses to therapy for small lesions in both clinical trials and clinical practice (23). It is important to obtain sufficient reproducibility of SUVmax in small lesions. The image noise levels depended on the scanning time. On the basis of these results, determination of the appropriate scan time is important for ensuring reproducibility and accuracy of SUVmax.
The variability in the background area was evaluated using the CV and N10 mm in this study. The Quantitative Imaging Biomarkers Alliance/Uniform Protocol in Clinical Trial (UPICT) recommends that the CV in the uniform area should ideally yield below 10% (24). When the CVbackground was smaller than 10%, the CV of SUVmax in hot spheres also resulted in below 10% in this study. On the other hand, the Japanese guideline (10) recommends that the N10 mm should achieve 6.3% or lower for the NEMA body phantom with 2.65 kBq/mL background activity. When the N10 mm satisfied the criterion, the CV of SUVmax was also below 10% in this study. A scanning time of 180 s or longer was required to satisfy the criteria of CVbackground or N10 mm in this study. Although the CVbackground or N10 mm achieved the recommendation, the CVmax,10 and %Diffmax,37 were 8.5% and 12.3%, respectively. These uncertainties have a potential influence on the assessment of metabolic response based on the relative change in SUVmax.
Many studies for response assessment in clinical trials adopted SUVmax for quantification because of the many advantages of SUVmax (16,21,25,26). In this study, the variability of SUVpeak was smaller than that of SUVmax although the SUVpeak showed close correlation with the SUVmax. Several studies have also reported that SUVpeak was more robust to the statistical fluctuation than SUVmax (16,27). In contrast to the SUVmax, the SUVpeak showed less than a 5% overestimation on all scanning times and sphere sizes. On the basis of these results, SUVpeak was considered to be a more reproducible and accurate metric than SUVmax. Several studies have also reported that SUVpeak was a more robust alternative (16) because it has been expected to reduce uncertainties in the quantification of responses to therapy (16,19,25,27). Vanderhoek et al. (27) emphasized that the most robust and predictive method of SUV measurement should be selected for quantification of responses to therapy in clinical trials. Furthermore, the 18F-FDG PET/CT UPICT protocol (24) recommends that SUVpeak for 3-dimensional volume should be obtained in addition to SUVmax in clinical trials. The use of SUVpeak is suitable especially for the purpose of harmonizing the quantitative performance of various PET scanners. In PET Response Criteria In Solid Tumors, the measurement of SUVpeak normalized to lean body mass (SULpeak) is recommended to assess response to therapy (28). Although there were limitations in applicability to small lesions (below 12-mm diameter) and a requirement for the dedicated image analysis software, we recommend that the measurements of SUVpeak (or SULpeak) in addition to SUVmax are desirable for reproducible and accurate quantification in clinical situations. It would improve diagnostic accuracy if both SUVmax and SUVpeak were evaluated (27).
In this study, we adopted the 1-cm3-volume VOI to compare with previous studies. However, several studies used various VOI sizes to calculate SUVpeak (15,19,27). A further study to determine the ideal VOI size for the SUVpeak is necessary for standardization of quantitative performance.
This study did not take the subject size into consideration. However, the PET image quality of overweight subjects (body mass index ≥ 25) is degraded because of an increase in statistical noise (29–31). To obtain sufficient image quality, adjusting the injection activity or scanning time in each patient based on body weight or body mass index (29,32,33) might be required for overweight subjects (34,35). SUV was increased as a function of patient weight based on a simulation study reported by Boellaard et al. (15). A further clinical study is required to examine the reproducibility and accuracy of the SUV.
CONCLUSION
SUVmax is variable and overestimated as the scanning time decreases and the lesion size increases. Although the standardization and harmonization of quantitative oncology 18F-FDG PET imaging protocols are increasingly being focused on, sufficient scanning time is required to obtain enough reproducibility and accuracy of SUVs of small lesions. We suggest that CV in the uniform area of the appropriate phantom should be below 10% to minimize the effect of statistical fluctuation for the SUVmax. On the other hand, SUVpeak is a more robust and accurate metric than SUVmax. Measurement of SUVpeak (or SULpeak) in addition to SUVmax is desirable for reproducible and accurate quantification in clinical trials.
DISCLOSURE
This study was supported in part by “Fukuoka Foundation for Sound Health” Cancer Research Fund. No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Aug. 13, 2015.
REFERENCES
- Received for publication June 1, 2015.
- Accepted for publication July 21, 2015.