Introduction

Accurate delineation of tumour lesions in lung cancer based on PET/CT is required in radiotherapy treatment planning. Moreover, as the amount of 18F-fluorodeoxyglucose (FDG) uptake is prognostic for survival, precise measurement of the standardized uptake value (SUV) may be of clinical value [13]. However, delineation of lung tumours is fraught with error and suffers from significant inter- and intraobserver variability [4, 5]. Tumour definition based on FDG-PET/CT is the current standard, but the method is far from ideal [69]. In particular, the FDG uptake pattern in 3D PET imaging suffers from breathing averaging, which prevents accurate SUV and volume measurements.

Respiratory-correlated PET imaging (4D PET) is an attractive alternative [1015]. In 4D PET, a surrogate measurement of the patient’s breathing is made during the scan, and based on this information the PET data are sorted according to respiratory phase or amplitude. Hybrid methods have also been proposed [16]. Phantom studies have shown that 4D PET predicts lesion volume recovery more accurately than 3D PET and yields more accurate SUV values [1722]. In spite of its promise, 4D PET has had limited clinical acceptance. Perhaps the most significant limitation is the technical challenge of making the respiratory measurement in a clinical setting, handling the large amount of 4D data, and applying the respiratory motion information in treatment planning. Moreover, 4D PET images have relatively more statistical noise than 3D PET images, since only a fraction of the acquired data is used in each image. There is concern about inconsistent attenuation correction in 4D PET due to mismatches between PET and CT [23]. Even if 4D CT is used for attenuation correction of the corresponding 4D PET, errors due to the mismatch of CT and corresponding PET gates may be introduced due to differences in the binning methods used for reconstructing 4D CT and 4D PET [24]. Despite its drawbacks, 3D PET is still used for tumour delineation and characterization of tumours and lymph nodes.

In this study evaluated a new method, optimal gating (OG), for applying respiratory-correlated measurements to PET. OG finds a single amplitude interval which minimizes the blurring due to respiratory motion while maximizing the number of coincidence events. We compared 3D, 4D and OG PET in terms of SUV, tumour volume and noise.

Materials and methods

Patient overview and PET acquisition

An FDG-PET scan was performed in 26 patients with lung cancer treated in the period between August 2008 and March 2009. The scan was performed during free breathing using a Siemens Biograph 40 PET/CT scanner (Siemens Medical Solutions, Knoxville, TN, USA) for radiotherapy treatment planning purposes. Patients were positioned supine with a dedicated arm support in the radiotherapy position. Our standard clinical protocol included a 4D CT scan and a 24-min list-mode PET acquisition of a single bed position (16.2 cm axial extent) centred on the primary tumour. The signal was measured using a respiratory monitor system (RMON) with a pressure sensor in a belt strapped around the patient’s chest (AZ-733 V; Anzai Medical Corporation, Tokyo, Japan) [25, 26]. The injected activity (megabecquerel) of FDG depended on the weight (kilograms) of the patient and for the first nine patients in the study was four times the body weight plus 20 MBq. For the second set of 17 patients, the injected activity was 2.5 times the body weight following the recommendations of Boellaard et al. [27].

3D, 4D and OG PET image reconstruction

The list-mode data were reconstructed using three methods: (1) standard 3D PET reconstruction, resulting in a free-breathing PET image for a 24-min acquisition; (2) phase-based 4D PET reconstruction, with eight gates; and (3) reconstruction using the OG algorithm.

The OG PET reconstruction was applied after processing the respiratory signals, s(t), acquired during the scan. Prototype software, described in Appendix A was used to create a modified PET data list with the motion largely removed, and this modified list was processed normally on the clinical PET/CT system. The first step in the OG method is to form a histogram of the s(t) values. Second, the histogram is converted to a cumulative distribution function, cdf(s), representing the probability of observing a signal of amplitude s or greater. Third, an algorithm considers all possible combinations of lower (L) and upper (U) levels, subject to a constraint on U that forces the total sensitivity to equal a specific percentage (e.g. 35%) of the acquired breathing signals. The specific percentage is a parameter that is called the optimal gating yield parameter, OG yield . The constraint can be written as:

$$ cdf(U) - cdf(L) = O{G_{{yield}}} $$
(1)

To emphasize that this is a parameter, we write:

$$ O{G_{{yield}}} = 0.35. $$
(2)

A rationale for choosing the specific value of 0.35 is presented in Appendix B. The OG method determines the narrowest such interval by selecting the L value which makes the difference U – L as small as possible. In most patients, the result is a narrow range of breathing amplitudes near the end-expiration phase. Figure 1 shows a 30-s plot of breathing amplitudes in a representative patient study, along with the histogram from the entire 24-min acquisition. The optimal amplitude interval is represented by a shaded region representing a range of breathing amplitudes, or alternatively a portion of the histogram.

Fig. 1
figure 1

Example breathing pattern (left) over 30 s showing the optimal gating window. The histogram (right) shows the amount of breathing amplitudes for the entire 24-min list-mode acquisition. The OG method selects the narrowest bandwidth (shaded area) containing 35% of the respiratory signal

For all PET image reconstructions, attenuation correction was performed using the maximum exhalation phase of the 4D CT scan. In most patients, this is the most representative phase. For some patients 4D CT images were not available at the maximum exhalation phase. In these cases we used the CT phase closest to maximum exhalation. Reconstructions were performed using a Fourier-rebinning-based OSEM 2D algorithm with four iterations and eight subsets, with a final image size of 168 × 168 pixels with a typical pixel size of approximately 4 mm and a 3-mm slice thickness.

Phantom validation

To validate the OG PET reconstruction we performed a phantom study. We used a sphere with a diameter of approximately 3.2 cm and volume of 17 cm3 filled with 10 MBq of FDG that was mounted on a motorized platform (Respiratory Phantom; Anzai Medical Corporation, Tokyo, Japan). The RMON pressure sensor was attached to the platform to monitor the motion. The sphere moved at a rate of ten cycles per minute on a trajectory with an amplitude of 1.7 cm. The period and the amplitude of the motion correspond to the movement of lung tumours in vivo. A 10-min acquisition time was used for these experiments. Signal quality was very high because the source was strong and there was little attenuating medium around the phantom. A 4D CT scan was also acquired. In a similar manner to the PET reconstructions for the patients, 3D, 4D and OG PET reconstructions were performed. The average over all eight gates of the 4D PET measurements was used as the reference value to which the other reconstructions were compared.

Analysis of PET images

PET images of the patients were analysed using TrueD software (version VC60; Siemens, Erlangen, Germany). For all ten datasets, i.e. the 3D PET dataset, the 4D PET dataset with eight gates and the OG PET dataset, the volume of interest (VOI) was selected manually in the transverse, sagittal and coronal anatomical planes around the primary tumour. This VOI was carefully selected so as not to include any high FDG uptake regions of nonprimary tumour tissue such as the heart or the involved mediastinal lymph nodes. In the VOI the threshold for volume calculation was based on two autocontouring criteria: first, a fixed SUV threshold of 2.5 [28], and second, a relative threshold of 40% of the maximum SUV (SUVmax) in the primary tumour [8, 29]. We recorded the volume of the region to which the threshold was applied, and the minimum, average, maximum and standard deviation of SUV values inside the region.

To provide a measure of image noise, we created an additional VOI in the contralateral lung, selecting a region that appeared to be homogeneous. This VOI was copied to all datasets, and the standard deviation of the SUV values was computed and recorded as a fraction of the mean value in the VOI.

In addition, the displacement of the centre of mass of the SUV volumes in each of the corresponding gates of the 4D PET was automatically calculated in three dimensions by the TrueD software. A 3D motion vector, defined as the square root of the quadratic sum of motion amplitudes in the transverse, sagittal and coronal directions, was determined for each patient.

Evaluation strategy

Our hypothesis was that OG would produce less blur than 3D reconstruction and less noise than 4D reconstruction, and yet be quantitatively accurate. The most important objective in the analysis was to retrieve quantitative values for the maximum SUV inside the primary tumour. Because a gold standard was missing, we chose the average of the maximum SUV in the 4D PET as the reference. As a secondary objective, we quantified differences in the volume as determined by different reconstruction methods and different delineation criteria. Finally, noise in the images was evaluated to give an objective surrogate for image quality.

Statistical analysis

Paired variables were compared using a Wilcoxon Signed Rank’s test (Matlab R2009a; The Mathworks, Natick, MA). Because a Bonferroni correction was applied to correct for multiple testing (i.e. three comparisons are performed in this analysis), p values obtained from statistical testing were multiplied by 3. P values smaller than 0.05 were assumed to be statistically significant. Results are expressed as means ± standard deviation, unless mentioned otherwise.

Results

Phantom experiment

Figure 2 shows coronal sections through 4D, 3D, and OG images from the phantom experiment. The 3D PET image was visibly blurred, whereas the OG image appeared to be as sharp as the 4D images. Quantitative differences in maximum pixel value were similar in all measurements. Compared to the average over all phases of the 4D PET dataset, the 3D image was 1.2% lower and the OG image was 1.2% higher. The variation in SUVmax within the 4D acquisition was 1.5%. The volumes were compared by computing autothresholded volumes using the 40% of the maximum SUV criterion. The volume in 4D PET was 15.8 ± 0.3 cm3, averaged over phases. The volume in 3D PET was 13.8 cm3. The volume in OG PET was 15.1 cm3. Average SUVs within autothresholded volumes varied by 4.7% across gates in the 4D PET acquisition, and in OG PET the average SUV was 3.4% higher. In 3D PET, the average SUV was 10.2% lower.

Fig. 2
figure 2

Phantom experiment showing 4D PET reconstructed images together with the motion blurred (static) 3D PET and the OG PET images

Patient characteristics

Table 1 shows the patient characteristics. The average length of the motion vector in 3D of the primary lung tumour of the 26 patients was 3.8 ± 2.7 mm, with a range between 1.3 and 11.3 mm. Examples of the 3D, 4D and OG PET images are shown in Fig. 3.

Table 1 Patient characteristics
Fig. 3
figure 3

Axial images from an example patient (patient 18) comparing OG, 3D and 4D PET reconstruction methods: top row CT images, middle row PET images, bottom row fused PET/CT images; left column OG reconstruction, middle column 3D reconstruction, right column 4D reconstruction

SUVmax values

Table 2 compares average SUVmax values in the primary tumours. The maximum values for the 3D, 4D and OG PET reconstruction methods were 13.1 ± 5.4, 13.7 ± 5.6 and 14.1 ± 6.5, respectively. The large standard deviations were caused by the large interpatient variability and SUVmax ranges from approximately 5 to 35. The SUVmax values for the 3D PET method were significantly lower than the values for the 4D and OG methods. The differences were 4.9 ± 4.8% (p < 0.001) lower than the 4D values, and 6.9 ± 8.8% lower than the OG values (p < 0.001; Table 3). The average SUVmax values for the OG and 4D methods were similar, the differences being 2.0 ± 8.4% (p = NS). Individual SUVmax values are shown in Fig. 4.

Table 2 Results of the 3D, 4D and OG PET reconstruction: maximum SUVmax, and volumes within the SUV 2.5 and 40% of the SUVmax cut-off (values are means±SD)
Table 3 Differences between maximum SUV, volumes within the SUV 2.5 and 40% of the SUVmax cut-off for the 3D, 4D and OG PET reconstruction.
Fig. 4
figure 4

SUVmax values for the OG, 3D and 4D PET reconstruction methods classified according to tumour location inside the lung

Differences were observed between the SUVmax of the different gates of the 4D PET images. In several cases, these variations exceeded 10%; this is shown in Fig. 5. The standard deviation of the SUVmax values within the individual patients, expressed as ratios in relation to the mean 4D PET value, was 4.0 ± 2.1% for the entire patient population, ranging from 1.9% (patient 19) to 12.7% (patient 17). In the case of patient 17, the SUVmax ranged from 6.8 to 10.0 with a median value of 7.7 for a small tumour of 1.7 cm3.

Fig. 5
figure 5

Top row: box plot of the SUVmax values normalized to the mean value for the different gates of the 4D PET method. Bottom row: box plot of the volumes defined by 40% of the SUVmax normalized to the mean volume for the individual patients. In both plots, the edges of the boxes represent the 25th and 75th percentiles, and the whiskers represent the extreme data points. Outliers are plotted individually (circles)

Fixed SUV threshold

The volumes computed with autocontours at a fixed threshold of 2.5 SUV were not significantly different between the three methods (the differences were of the order of 1 cm3): 74.2 ± 76.8 cm3, 75.1 ± 77.9 cm3 and 74.9 ± 76.2 cm3 for the 3D, 4D and OG methods, respectively. The average SUV values in the volume within the contour were 5.4 ± 1.2, 5.4 ± 1.1 and 5.6 ± 1.3, and the values from both the 3D and 4D methods differed slightly, but significantly (p < 0.001), from the average OG PET values.

Relative SUV threshold

Volumes computed by the 40% thresholding criterion did show significant differences between the 3D and 4D methods, but otherwise no significant differences were seen. This is shown in Table 2. A significant reduction of 5.3 ± 7.1% (p = 0.007) in the average 4D volume compared to the 3D volume was found, although the average absolute numbers were all within 2 cm3. Figure 5 shows the variation in autothresholded volumes for the different gates of the 4D reconstruction. As an example, considering all gates for patient 17, the volumes determined by the 40% SUVmax method in this dataset varied between 61% and 139% of the mean volume. The average SUV values within the 40% SUVmax contour were 7.8 ± 3.1, 8.1 ± 3.2 and 8.4 ± 3.8, for 3D, 4D and OG methods, respectively. Again, average values for the 3D method differed significantly from those for the OG and 4D methods, whereas there was no significant difference between the values for the OG and 4D methods.

Noise quantification

For the evaluation of image noise based on the contralateral lung, VOIs with an average size of 37.8 ± 18.5 cm3 were created. The standard deviations of the SUV values of the VOI inside the contralateral lung are shown for all patients in Fig. 6. Compared to the 3D PET method, the noise was higher for the 4D PET method (89 ± 52%, p < 0.001) and the OG PET method (31 ± 21%, p < 0.001). Noise for the OG PET method was 44 ± 30% (p < 0.001) lower than that for the 4D PET method.

Fig. 6
figure 6

Standard deviations of the SUV values inside the VOI in the contralateral lung tissue for all patients relative to the value of the 3D PET reconstruction

Discussion

Accurate quantification of FDG uptake is important in oncology. PET imaging is used for staging and diagnosis of lung tumours, for quantifying the SUV, and for delineating the actual border of the tumour. For conventional radiotherapy schemes, a better delineation of the primary tumour will in general lead to smaller safety margins used in radiotherapy to define the treated volume. These smaller irradiated volumes will encompass less normal lung tissue to be irradiated and hence lower (lung) toxicity, or if an isotoxic dose-escalation strategy is used this will lead to a higher tumour dose and hence higher overall survival in lung cancer patients [30]. Delineating subvolumes of the tumour are currently also a focus of research, as areas with high SUV are presumably more therapy-resistant [2, 31, 32]. Especially in the context of more advanced radiotherapy techniques that rely on dose painting or subvolume boosting strategies, accurate quantification of the PET uptake is essential to define these regions. Furthermore, accurate quantification becomes more important in current trials and clinical practice. In a large Dutch randomized trial (NVALT 8), patients were randomized between two treatment arms based on a SUVmax cut-off value of 7. Applying this to the cohort of patients analysed in this study would mean an increase of approximately 10% in the number of patients in the SUVmax >7 group if 4D or OG PET was used for quantification.

Validation of automatic segmentation methods and delineations in patients with lung cancer is difficult. Results from pathological studies are limited and hampered by technological difficulties in specimen extraction from thoracic surgery [4, 5]. Hence, a gold standard for tumour delineation is difficult to obtain [33]. Validation of new PET reconstruction methods for accurate SUV quantification techniques in vivo is also not directly possible and is limited to phantom evaluations and validation. We therefore performed a phantom validation, demonstrating correct technical behaviour of the optimal gating method providing confidence that the method is applicable in a clinical setting. The phantom movement was similar, even slightly smaller, compared to the size. Hence, no large differences in SUVmax were expected for these reconstructions. The volume recovery of the 3D PET reconstruction differed from that of the 4D PET reconstruction, whereas that of the OG PET method was similar. However, the images in Fig. 2 clearly show that either the 4D or OG PET reconstruction methods was necessary to reduce blurring artefacts from respiratory motion. Figure 2 also shows the correct processing of the list-mode files and OG PET reconstruction technique.

The prototype software was expected to be quantitative in the case of regular breathing, as described in Appendix A. The prototype software treated the OG sinogram as if it came from a static acquisition, and therefore could not accurately correct for effects that are correctly modelled in 4D PET, for example changes in activity distribution during the scan, detector dead time, and decay of the radioisotope. In spite of this limitation, in most scans the quantitative values (SUVmax) were similar for the OG and the average of the 4D method. Figure 4 demonstrates the close agreement and supports our hypothesis that motion blurring would be reduced by OG and that it is possible to extract accurate quantitative values from the OG method. In 3D PET, the SUVmax was significantly reduced due to respiratory motion blurring. Residual motion is also likely to play an effect in determining the SUVmax for the 4D phase-based PET reconstruction we used. Choosing an amplitude-based binning approach might reduce the variation to some extent by reducing the residual motion component, but the counts in each gate would vary and lead to different statistics for the various gates. For the volumes delineated using 40% of the SUVmax, the values from the 4D method were smaller. There could be several reasons for this. Accurate determination of SUVmax of the individual phases was hampered due to increased noise leading to a somewhat larger SUVmax which resulted in smaller delineated volumes. Another possibility could be that the volume of the tumours in this study were quite large and motion was limited for these tumours, making volume definition in 3D already a good estimate of the actual volume of the lesion. For smaller tumours and in the case of more respiratory motion this might be different. However, our dataset was too limited to draw definite statistical conclusions.

OG PET images were sharper than conventionally gated 4D PET images, and not as noisy. Since image quality is difficult to quantify directly in absolute numbers, we used a surrogate of noise defined as the standard deviation of the SUV in a homogeneous part of the lung, and this clearly showed reduced noise for OG PET compared to 4D PET. Also, OG PET images were visually sharper at the border of the primary tumour (Fig. 3) compared to the 3D PET images which had intrinsic motion blurring. The average motion of the primary tumour was about 4 to 5 mm which was comparable to the resolution and voxel size of the PET scanner. Some deterioration of the edges of the primary tumour was visible in the static 3D PET images.

Noise in the 4D PET images was also reflected in the variations between the various gates of the 4D reconstruction. Volume differences up to 40%, and differences as large as 27% in maximum SUV were shown in this study. These results were similar to variations reported by Erdi et al. [23]. These effects might be partly caused by a wrong attenuation correction due to a mismatch between the CT and PET scan [22]. Using the end-expiration phase of the 4D CT would be expected to minimize the errors caused by the attenuation correction. In patients who breathe normally, end-expiration is the most stable breathing phase. In our experience, the OG algorithm typically found an amplitude interval range near to that of the end-expiration phase. One could prospectively gate the CT scanner to acquire attenuation data at the respiratory amplitude determined by the OG calculation, and the patient would receive very little additional radiation exposure from the prospectively gated CT scan. This was not possible in the retrospective study described here. For patients with an irregular or nonphysiological breathing pattern, choosing a different CT gate for the attenuation correction might circumvent possible errors in attenuation correction. However, for the patient group described in this study all OG amplitude intervals were close to the end-exhale position.

Others have tried breath-hold techniques for motion freezing, mainly for the CT acquisition [34, 35]. For PET imaging, breath hold techniques might not be feasible because the patient needs to breath-hold repeatedly, because acquisition times are typically in the order of minutes, whereas the CT scan can generally be performed in a few seconds. Reproducibility of the breath-hold phase for multiple breath holds is difficult, and may require specially designed equipment. The approach described in this paper is simple, and comfortable for the patient as free breathing is allowed during the scan. If 4D images are not required for evaluation of the tumour motion, and OG images are to be used instead, the duration of the study can be significantly reduced, further increasing patient comfort.

The simplicity of the OG method is expected to make it comparatively robust. This simple method may not be able to recover all of the resolution possible with a PET scanner. It is our view that better resolution imaging might one day be achievable with accurate modelling of the complex motions that occur during human breathing, perhaps using methods such as optical flow-based methods of Dawood et al. [36]. In Appendix B we show that OG reduces the blurring from 1 to 2 cm, in extreme cases, to the range 6 to 7 mm that is normal in PET imaging today. Indeed, the OG methodology should work well in the imaging of small tumours, for example metastatic lesions. In such cases, the motion trajectory may exceed the size of the tumour. Another improvement of the OG method compared to 3D and 4D PET would be the detection of involved lymph nodes in the mediastinum. The OG PET might reveal these nodes whereas for the 3D PET these might be blurred due to motion and the 4D PET contains too much noise to distinguish them from the surrounding uptake. However, this has to be validated in clinical practice. Similarly, the OG method is also applicable in cardiac imaging, where respiratory motion is in many cases larger in magnitude than the heart’s contractile motions. Indeed, Frey et al. have recently reported examples in which the OG method significantly sharpened the images in tumour imaging and cardiac PET/CT, with better image quality than 4D PET [37].

However, motion amplitude of the primary tumour is difficult to predict and selection of 3D vs. 4D PET acquisition is frequently not possible prior to the imaging session. The OG reconstruction might be the intermediate step suitable for all patients providing good signal-to-noise ratios for quantitative analysis of the SUV parameters and sharp (non-motion-blurred) images for visual assessment.

Conclusion

The SUVmax for the OG and 4D PET reconstruction methods were comparable and were both significantly higher than that for the 3D PET reconstruction method. Optimal gating accurately determined the SUVmax values, while reducing image noise and providing accurate volume determination, comparable to that from 3D and 4D PET. Based on the better quantification of the maximum value and the less noisy appearance, we conclude that OG PET is a better alternative both to 3D PET, which suffers from breathing averaging, and to 4D PET, which produces noisy images.