Introduction

Over the last 20 years, 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) has played an increasing role in the management of non-small cell lung cancer (NSCLC) patients for staging [1] and restaging [2, 3]. More recently, 18F-FDG PET has been used for response evaluation of chemotherapy and molecularly targeted therapies [46]. The standardized uptake value (SUV) is the most frequently used quantitative parameter in oncology [7]. When using SUV as a diagnostic [8, 9] or prognostic [10, 11] tool (i.e. single measurement) or for therapy monitoring (i.e. longitudinal studies) in multicentre trials or in sites equipped with multiple scanners, one needs to minimize the variability in semi-quantitative measurements by harmonizing both patient preparation in the PET unit and acquisition and reconstruction parameters [1214].

The European Association of Nuclear Medicine (EANM) and the Society of Nuclear Medicine (SNM) have published guidelines [15, 16] regarding patient preparation, data acquisition, reconstruction parameters and definition of volume of interest (VOI) in or around the tumours. With regard to reconstruction parameters, the EANM guidelines, in line with the Netherlands protocol for standardization and quantification of 18F-FDG PET studies in multicentre trials [17], provide recommendations based on an expected spatial resolution of the PET system equal to 7 mm. These recommendations include the use of the NEMA NU-2 phantom to check that activity concentration recoveries are concordant with those expected. Regarding quantitative analysis, SUVmax is currently the most frequently used quantitative parameter in oncological studies [18] despite being a suboptimal parameter due to noise-induced bias [19]. Therefore the EANM guidelines focus on getting comparable SUVs when using SUVmax in multicentre studies.

Hardware and software evolutions can lead to important device-dependent and reconstruction-dependent variations in quantitative values [2022]. For instance, point spread function (PSF) reconstruction, which improves spatial resolution throughout the entire field of view, has recently become commercially available in clinical PET/CT systems. Our group has shown that, by improving activity recovery, especially for non-enlarged nodes, PSF reconstruction significantly improves the diagnostic performance of 18F-FDG PET for nodal staging in NSCLC [23]. On average, PSF reconstruction increases SUVmax and SUVmean by 48 and 28 %, respectively. As a result, recovery coefficient (RC) values obtained with PSF reconstruction are much higher than EANM’s expected activity concentration recoveries as shown recently by Boellaard [24].

There is therefore a need for standardization of reconstruction protocols, keeping in mind that centres running PET systems with advanced reconstruction algorithms that participate in multicentre trials often wish to use their PET system with parameters chosen in order to achieve optimal lesion detection. A solution to optimize PET image quality for diagnostic purposes and at the same time to be able to use quantitative values within the framework of multicentre trials is the use of an additional filtering step [25] or to generate two sets of images: one to provide optimal diagnostic quality and a second one to meet quantitative harmonizing standards [24], with NEMA NU-2 phantom-based filtering chosen so that activity concentration recoveries are as close as possible to those recommended by EANM guidelines.

We aimed at prospectively evaluating such a strategy in NSCLC patients imaged on a PET/CT system equipped with PSF reconstruction. For that purpose, in order to mimic a situation in which a patient would undergo pre- and post-treatment scans on different generation PET systems, the same PET raw data were reconstructed with an ordered subset expectation maximization (OSEM) algorithm known to produce activity concentration recoveries meeting EANM requirements, PSF reconstruction for optimal tumour detection and PSF reconstruction with a filter optimized to fulfil EANM requirements. In addition, the potential impact of several confounding factors [tumour size, location and type as well as patient body mass index (BMI) and image noise] on the accuracy of our method was studied.

Materials and methods

Patient population

During 6 months, 52 patients referred to our institution for staging or restaging of a NSCLC were included in this study. The study was approved by the local Ethics Committee (ref A12-D24-VOL13, Comité de protection des personnes Nord Ouest III) waiving signed informed consent. Among these patients, ten underwent two PET examinations for the purpose of therapy monitoring. Patient demographics are described in Table 1.

Table 1 Patient demographics

Calibration and cross-calibration of the PET system

The calibration of the PET system was performed daily with a 68Ge cylinder with a known radioactive concentration. This cross-calibration procedure was performed twice during the present study. A solution of 18F-FDG (70.6 and 70.5 MBq, as assessed by the dose calibrator) was introduced into a cylindrical phantom with an exactly known volume and completed with water, which resulted in a solution with an exactly known concentration. A two-bed acquisition of the phantom was performed and images were reconstructed with attenuation and scatter correction identical to patient studies. Twelve VOIs were drawn on consecutive axial slices to determine the average activity concentration of 18F-FDG within the phantom. The cross-calibration factor was calculated as the ratio of the calculated activity and the true activity. The cross-calibration factors were found to be 0.99 and 1.04.

Phantom preparation

The phantom set is the International Electrotechnical Commission body phantom set, which consists of a torso cavity containing a 5-cm-diameter cylindrical insert filled with foam pellets with an average density of 0.30 g/ml positioned in the centre of the phantom to simulate lung tissue and six coaxial isocentred spheres with internal diameters of 10, 13, 17, 22, 28 and 37 mm. According to the EANM guidelines, the phantom was filled with a solution of 18F-FDG (2.0 kBq/ml) and all of the spheres with a radioactivity concentration of 20.0 kBq/ml resulting in a lesion to background activity ratio equal to 10.

Patient studies

The weight and height of patients on the day of the PET examination were recorded. BMI was computed as follows and was used to separate overweight (BMI > 25 to < 30 kg/m2) and obese patients (BMI ≥ 30 kg/m2) from low to normal weight patients (BMI < 25 kg/m2):

$$ BMI=\frac{{weight\,\left( {kg} \right)}}{{height\,\left( {kg} \right)}} $$

After a 15-min rest in a warm room, patients who had been fasting for 6 h were injected with 18F-FDG. Mean (SD) injected activity was 4 (0.2) MBq per kg of body weight. The delay (SD) between tracer injection and image acquisition was 62 (4) min, thus meeting EANM guidelines [15].

PET/CT acquisition and reconstruction parameters

All PET imaging studies were performed on a Biograph TrueV (Siemens Medical Solutions) with a 6-slice spiral CT component. The technical and performance characteristics of the PET component of the TrueV system can be found elsewhere [26].

CT acquisition was performed first, with the following parameters: 60 mAs, 130 kVp, pitch 1 and 6 × 2 mm collimation. Subsequently, the PET emission acquisition was performed in 3-D mode. Patients were scanned from the skull base to the mid-thighs. For low to normal weight and overweight to obese patients, the duration was 2 min 40 s and 3 min 40 s, respectively. For phantom scanning, two bed positions were performed. The duration of each bed position was set to 2 min 40 s and 10 min, as per EANM guidelines. In addition, phantom studies with durations of 1 min 40 s and 3 min 40 s were performed in order to study the impact of image noise on the accuracy of our method.

In our department, PET images are reconstructed with a PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) without filtering (PSFallpass), as modelling the PSF during iterative reconstruction introduces correlations between neighbouring voxels in a manner similar to smoothing filters and thus has been shown to achieve maximal performance with little or no filtering [27].

For the purpose of this study, raw data were also reconstructed with the OSEM 3-D reconstruction algorithm (4 iterations and 8 subsets) and the PSF reconstruction algorithm (HD; TrueX, Siemens Medical Solutions; 3 iterations and 21 subsets) using a Gaussian filter and an increasing kernel ranging from 6 to 8 mm with a 0.5-mm increment. Only the PSF-reconstructed data without filtering were used for the purpose of diagnostic workup. The OSEM reconstruction parameters were chosen as recommended by the manufacturer. These parameters meet the EANM requirements regarding activity recoveries and they were recently used by another group with the same PET system [28]. For all reconstructions, matrix size was 168 × 168, resulting in a 4.07 × 4.07 × 4.07 mm voxel size. Scatter and attenuation corrections were applied.

PET/CT analysis

Phantom studies

Activity concentration RCs as a function of sphere (tumour) size were measured. RCs are defined as the ratio between measured and true activity concentration in a sphere. For that purpose, 3-D 50 % isocontour VOIs were drawn over each sphere for each set of reconstructed data and maximum and mean pixel values were recorded.

Patient analysis

The same reader (CL) analysed all PET data sets to extract PET quantitative values for OSEM and PSF reconstructions. Regions of interest (ROIs) were drawn over primary tumour lesions, mediastinal and hilar nodes considered to have pathologically increased uptake and metastatic lesions. ROIs were drawn on the axial slice on which lesions displayed the highest 18F-FDG uptake, by means of a 50 % isocontour method.

The mean and maximum pixel values were extracted from each ROI and mean and maximum SUVs were computed as follows:

$$ SUV=\frac{{tumour\,activity\,\left( {{Bq \left/ {cc } \right.}} \right)\times body\,weight(g)}}{{injected\,dose\,\left( {Bq} \right)}} $$

Finally, short axis size (cm), as determined on axial CT slices, was recorded for each mediastinal and hilar lymph node.

For patients who underwent a post-therapeutic examination, the post-therapeutic status of each lesion was determined by using European Organization for Research and Treatment of Cancer (EORTC) criteria [29, 30]. SUVmax, recorded as described above, was used. The changes in SUVmax between the PET1 and PET2 scans were recorded for all lesions. The percentage change in SUVmax allowed classification into the following groups:

  • Complete metabolic response (CMR): complete resolution of 18F-FDG uptake in the tumour volume (indistinguishable from surrounding normal tissue)

  • Partial metabolic response (PMR): at least 25 % reduction in tumour uptake

  • Stable metabolic disease (SMD): less than 25 % increase or less than 25 % decrease in tumour 18F-FDG SUV and no visible increase in extent of tumour uptake

  • Progressive metabolic disease (PMD): greater than 25 % increase in 18F-FDG tumour SUV within the tumour

Statistical analysis

The first step of the analysis was to determine the optimal filter settings for PSF reconstruction to meet EANM harmonizing standards. For that purpose, for all sets of reconstructed data, RCs for all spheres were compared to EANM expected values by means of the root mean square error (RMSE) method. The kernel size that minimizes the RMSE when compared to EANM expected values was selected as the optimal filter for PSF reconstruction on our PET/CT system. RMSE were computed with R, a freeware statistical package (http://www.r-project.org/foundation/).

Quantitative data extracted from clinical PET/CT examinations are presented as mean (standard deviation, SD). In all statistical tests, a two-tailed p value of less than 0.05 was considered statistically significant. The ratios between PSFEANM and OSEM quantitative values (SUVmean, SUVmax), according to lesion size, location and type (heterogeneous vs homogeneous uptake), BMI (low to normal weight vs overweight vs obese patients) and acquisition time per bed position (2 min 40 s vs 3 min 40 s) were compared using the Mann–Whitney test for unpaired samples and the Kruskal-Wallis test to compare multiple groups. The relationship between PSFallpass or PSFEANM and OSEM quantitative values was assessed using a linear regression analysis and Bland-Altman plots [31]. In the subset of ten patients that underwent two PET/CT examinations for therapy monitoring purposes, levels of agreement between the different types of reconstruction were evaluated using the kappa statistic. The use of OSEM reconstruction both for pre- and post-therapeutic PET examination (OSEMPET1/OSEMPET2) was used as the “current standard” to determine the post-treatment status of each lesion. This was compared to the use of PSFEANM reconstruction either for pre-therapeutic PET evaluation (PSFEANM-PET1/OSEMPET2) or for post-therapeutic PET evaluation (OSEMPET1/PSFEANM-PET2), to the use of PSFallpass reconstruction either for pre-therapeutic PET evaluation (PSFallpass-PET1/OSEMPET2) or for post-therapeutic PET evaluation (OSEMPET1/PSFallpass-PET2) and to the use of PSFEANM reconstruction for both pre- and post-therapeutic PET evaluation (PSFEANM-PET1/PSFEANM-PET2). Kappa values were reported using the benchmarks of Landis and Koch [32] (0.81–1 almost perfect agreement, 0.61–0.8 substantial agreement, 0.41–0.6 moderate agreement and 0.21–0.4 fair agreement). For the kappa estimates, 95 % confidence intervals were calculated using bootstrapping. Graphs and analyses were carried out using the GraphPad software and VassarStats (http://vassarstats.net/).

Results

Phantom data

As shown in Fig. 1, the OSEM 3-D reconstruction algorithm RCs for mean and maximum values fulfilled the EANM recommendations for both the 160-s and the 600-s emission scan. It is noticeable that for mean values (Fig. 1a), the OSEM RCs of the smallest spheres were slightly below the proposed minimum EANM specification. As expected, RCs for mean and maximum values of the PSF reconstruction algorithm without filtering were above the maximum EANM specifications whatever the duration of the emission scans, especially for the smallest hot spheres. When considering maximum values (Fig. 1b), with the exception of the 10-mm sphere, PSFallpass RCs were even greater than 1.0. This can be explained by the fact that PSF modelling results in overshoot along the edge. This artefact (the so-called Gibbs artefact [21, 33, 34]) was visible for the largest sphere for PSFallpass reconstruction and was partially corrected for by applying the Gaussian filters. When using shorter acquisition times, there were higher noise levels, which in combination with the Gibbs artefact led to less accurate (overestimated) measurements, especially for the maximum pixel value. The application of Gaussian filters with an increasing kernel during PSF reconstruction allowed for RCs to be more consistent with the EANM recommendations. When calculating the RMSE, the kernel size that minimized the error compared to EANM expected values was the kernel of 7 mm (supplementary material). This kernel size of 7 mm was then selected as the optimal filter for PSF reconstruction (PSFEANM).

Fig. 1
figure 1

Recovery coefficients for mean (a) and maximum (b) values for OSEM 3-D reconstruction algorithm, PSF reconstruction algorithm without filtering (PSFallpass) and PSF reconstruction algorithm with a 7-mm Gaussian filter (PSFEANM). Corresponding NEMA NU-2 transverse images through the hot spheres (c). Phantom images are scaled on the same maximum value

An evaluation of the potential impact of image noise on the accuracy of our method was performed in a second experiment by scanning the phantom for 1 min 40 s, 2 min 40 s and 3 min 40 s. As expected, the RC values for PSFallpass reconstruction were higher for the shortest acquisition, due to noise in the reconstructed images (supplementary material: Fig. 1). However, calculation of the RMSE showed that our strategy performed well even when image noise was higher (supplementary material: Table 4).

Clinical data

Validation of the PSFEANM reconstruction to overcome reconstruction-dependent variability

A total of 52 consecutive patients with NSCLC were included, for whom clinical data are summarized in Table 1. Among these patients, 36 were referred for initial staging of NSCLC and 16 for restaging of NSCLC recurrence.

Overall, 195 ROIs were drawn over 64 (32.8 %) primary tumour lesions, 91 (46.7 %) mediastinal and hilar nodes considered to have pathologically increased uptake and 40 (20.5 %) visceral and bone metastatic lesions. The mean (SD) number of lesions per patient, all types combined, was 3.8 (3.6). Among the 91 analysed nodes, 45 (49.4 %) had a short axis less than 1 cm [mean (SD) short axis, 0.80 (0.13)], whereas 46 (50.6 %) had a short axis 1 cm or greater [mean (SD) short axis, 1.46 (0.43)]. The mean SUVmean (SD) for OSEM, PSFEANM and PSF reconstruction were 4.70 (3.43), 4.77 (3.46) and 6.24 (4.30), respectively. The mean SUVmax (SD) for OSEM, PSFEANM and PSF reconstruction were 6.60 (4.95), 6.71 (4.97) and 9.52 (6.85), respectively. Linear regression and Bland-Altman analysis are shown in Fig. 2. As expected, a good correlation was found between quantitative values extracted from the PSF and OSEM reconstructions, with an r 2 greater than 0.90 for both SUVmax and SUVmean values. As shown in the Bland-Altman analysis, PSF reconstruction increased SUVmax and SUVmean by 48 and 37 %, respectively. An even better correlation was found between PSFEANM and OSEM reconstruction with r 2 equal to 1.0 for SUVmax and close to 1.0 for SUVmean (0.99). Bland-Altman analysis demonstrated that the mean ratios between PSFEANM and OSEM quantitative values were 1.03 and 1.02 for SUVmax and SUVmean, respectively, with very narrow 95 % limits of confidence in both cases. Amongst the 195 analysed lesions, Bland-Altman plots identified 8 outliers for the SUVmax values for which the ratios of SUVmax PSFEANM and SUVmax OSEM were all above the upper limit of the confidence interval. These outliers corresponded to one tumour, five nodes (four mediastinal nodes and one hilar node) and two bone metastases. For the SUVmean values, Bland-Altman plots identified 14 outliers of which 9 had a ratio below the lower limit of the confidence interval (2 tumours, 2 mediastinal nodes, 2 hilar nodes, 2 bone metastases and 1 lung metastasis) and 5 above the upper limit of the confidence interval (1 tumour, 1 hilar node and 3 bone metastases).

Fig. 2
figure 2

Relationship between quantitative values extracted from PSFallpass or PSFEANM and OSEM images, assessed using linear regression analysis and Bland-Altman plots for SUVmax (a) and SUVmean (b)

As shown in Fig. 3, the ratios between PSFEANM and OSEM quantitative values (SUVmax and SUVmean) were not different according to the size of the lesion. The mean ratio (SD) for SUVmax values and SUVmean values (SD) ranged from 1.01 (0.04) (4th quartile) to 1.04 (0.05) (3rd quartile) and from 1.01 (0.07) (1st quartile) to 1.03 (0.06) (3rd quartile), respectively. Similarly, there was no significant difference according to the BMI, the location of the lesion or the type of lesion (homogeneous versus heterogeneous). When analysing the ratios between PSFEANM and OSEM quantitative values for SUVmax values according to BMI, there was a trend towards higher ratios (p = 0.051) in obese patients.

Fig. 3
figure 3

Impact of the size of the lesion (a), the BMI (b), the location of the lesion (c), tumour homogeneity (d) and emission scan duration (e) on the ratio between PSFEANM PET and OSEM PET quantitative values (left panels SUVmax, right panels SUVmean). Note that 30 lesions were not measurable and are therefore not included in the “per size” analysis (a)

An example of OSEM, PSFallpass and PSFEANM reconstructions is shown in Fig. 4.

Fig. 4
figure 4

Representative coronal slices for OSEM, PSFallpass and PSFEANM reconstructions in a patient with a lung tumour in the left upper lobe, bilateral nodal involvement (a) and distant metastases (lung, bone and liver) (b). Images have been scaled on the same maximum value. Note the improvement in activity recovery visible in a small lung metastasis on the PSFallpass image (arrow)

The use of PSFEANM quantitative values for therapy monitoring

Among the series of 52 consecutive patients, 10 patients underwent both a pre- and post-therapy PET evaluation with an average time between the first and the second PET scan of 72.6 ± 34.6 days (Table 2).

Table 2 Characteristics of patients who underwent post-therapy evaluation

Overall, 84 lesions were evaluated post-treatment: 12 (14.3 %) primary tumour lesions, 41 (48.8 %) mediastinal and hilar nodes and 31 (36.9 %) visceral and bone metastatic lesions. When OSEM reconstruction was used for interpreting both pre- and post-therapeutic PET examinations (OSEMPET1/OSEMPET2), 37 lesions were considered to have had a CMR, 28 a PMR, 13 were stable and 6 had progressed. These results were then compared to several scenarios (Fig. 5, Table 3) when using PSFEANM, OSEM or PSFallpass for either the pre- or post-therapeutic PET examination or both. OSEMPET1/OSEMPET2 was regarded as the standard of reference. All lesions considered, there was almost perfect agreement between OSEMPET1/OSEMPET2 and OSEMPET1/PSFEANM-PET2, PSFEANM-PET1/OSEMPET2 or PSFEANM-PET1/PSFEANM-PET2 with kappa values higher than 0.90. In addition, the associated 95 % confidence intervals virtually matched the almost perfect range of kappa values. When analysing tumours, nodes or visceral and bone metastases separately, the strength of agreement was also considered to be very good. There were four cases (4.8 %) of disagreement (two nodes and two metastatic lesions) in which OSEMPET1/PSFEANM-PET2 diagnosed stable disease, whereas OSEMPET1/OSEMPET2 identified partial response. When PSFEANM-PET1/OSEMPET2 was used, there was only one disagreement (1.2 %) that occurred in a node, coming to a conclusion of stable disease, whereas OSEMPET1/OSEMPET2 identified partial response. With PSFEANM-PET1/PSFEANM-PET2, there were two cases of disagreement that occurred in nodes, coming to a conclusion of stable disease, whereas OSEMPET1/OSEMPET2 identified partial response.

Fig. 5
figure 5

Flow chart for the evaluation of the level of agreement when using PSFEANM or PSFallpass reconstructions for response monitoring (EORTC criteria) either pre- or post-treatment as compared to the exclusive use of OSEM reconstruction (current standard)

Table 3 Impact of PSFEANM on response evaluation

Importantly, when PSFallpass reconstruction was used either for the pre- or post-therapeutic examination (OSEMPET1/PSFallpass-PET2 or PSFallpass-PET1/OSEMPET2), there was considerably less agreement. With OSEMPET1/PSFallpass-PET2, there were overall 23 cases (27.4 %) of disagreement (4 tumours, 13 nodes and 6 metastatic lesions) in which OSEMPET1/PSFallpass-PET2 underestimated the therapeutic response when compared to OSEMPET1/OSEMPET2. With PSFallpass-PET1/OSEMPET2, there were 11 cases (13.1 %) of disagreement including 9 cases (6 nodes and 3 metastatic lesions) where a conclusion of partial response was reached, whereas OSEMPET1/OSEMPET2 diagnosed stable disease. The remaining two cases corresponded to tumours: one for which PSFallpass-PET1/OSEMPET2 reached a conclusion of stable disease, whereas OSEMPET1/OSEMPET2 identified partial response, and one for which PSFallpass-PET1/OSEMPET2 diagnosed progression, whereas OSEMPET1/OSEMPET2 identified stable disease.

Discussion

18F-FDG PET has an increasing role in oncology for staging, restaging and therapy monitoring of chemotherapy and molecularly targeted therapies and is being increasingly implemented in clinical trials, especially for the early assessment of antineoplastic treatments. This prospective study in NSCLC patients validates a strategy allowing the use of quantitative values within the framework of multicentre trials, which is based on the production of protocol-specific images, in addition to images optimized for diagnostic purpose.

Standardized quantification of PET data in multicentre trials as described in the EANM guidelines allows for reliable and reproducible treatment response assessment. However, standardization remains a major challenge as new, more sensitive PET systems and reconstruction algorithms are continuously being developed and introduced into clinical practice [20, 23, 35]. In the present study, we validated a strategy in which the recently introduced PSF reconstruction algorithm can be used not only for visual but also for quantitative analysis of PET imaging, whilst adhering to the EANM guidelines. Our results demonstrate, by mimicking a situation in which a patient would undergo the pre- and post-therapy PET scans on different generation PET systems, that it is possible to minimize reconstruction-dependent variability. Hence, Bland-Altman analysis (Fig. 2) showed that after having applied an adequate filter (PSFEANM) the upper limit of the confidence intervals was 12 %, a value well below the 25 and 30 % cut-off values recommended by EORTC [30] and PERCIST [36], respectively, to discriminate between responders and non-responders when using 18F-FDG PET for therapy monitoring. Importantly, we confirmed this finding in a subset of ten patients who underwent two PET examinations for response assessment (Table 2). In these patients, an excellent agreement was found (kappa values 0.95 and 0.99) in the post-treatment classification of 84 lesions according to EORTC criteria when comparing PSFEANM either pre- or post-therapy to OSEM as the current standard, and no major discordance occurred. However, when the PSFallpass data were used either pre- or post-therapy compared to OSEM, we saw considerably less agreement. Due to system updates on existing PET systems or the purchase of a new PET machine, OSEMPET1/PSFallpass-PET2 is the situation most likely to occur. In this situation, our data showed discordance in 27.4 % of lesions.

The proposed strategy can be useful in the case of patients undergoing pre- and post-treatment scans on different PET systems, for example in centres running two or more PET systems or updating their equipment during the course of a trial. Of course, it would be preferable to scan the patient repeatedly on the same machine, but in practice this is often not possible. Moreover, in the setting of multicentre trials there are two other situations in which standardization of PET quantitative values is required: when pooling SUV from different PET/CT systems for diagnostic purposes (i.e. to determine a specific diagnostic threshold value for a given disease) [8, 9] or as a prognostic tool (i.e. to search for the impact of tumour tracer uptake on disease-free and overall survival) [10, 11].

Regarding practical issues related to the proposed methodology, determination of the appropriate filter per PET system is required by performing the phantom studies and reconstructions with a Gaussian filter with increasing kernel as described in the “Materials and methods” section. Once the optimal filter meeting the EANM expected values is determined, the filtered PET data can be used for both local and multicentre quantitative PET analysis. This method can be readily applied on any PET scanner equipped with PSF; the purchase of additional software is not necessary. However, this method does not obviate the need to generate a second data set which is time consuming. Of course, the choice to use either an OSEM reconstruction or a filtered PSF algorithm for the standardized quantitative analysis remains a choice of local nuclear medicine physicians, physicists and researchers, just like the choice to systematically reconstruct non-attenuation-corrected images or only when clinically needed. Choosing PSFEANM could be the preferred solution, as PSF reconstruction is meant to progressively replace former generation PET systems.

As pointed out by Boellaard [24], patients are frequently included in clinical trials after the first PET examination has been performed. This emphasizes the need to standardize the PET procedure from the very beginning of patient care. However, PET acquisition and reconstruction parameters are not the only source of variability that has to be taken into account. Other technical and biological factors also affect SUV measurements. These factors have been discussed extensively elsewhere [12, 24, 37]. In the present study, one technical factor, the reconstruction protocol, has been analysed. To minimize the influence of the other technical and biological factors affecting SUV measurements in this study, all PET examinations were performed according to the EANM guidelines. Of note, the injected activity per kilogram and the delay between injection and acquisition met the EANM requirements.

The potential impact of image noise on the accuracy of our method was evaluated in phantom studies by varying the acquisition time. Calculation of the RMSE values between PSFEANM and EANM expected values showed that our strategy performed well when image noise was higher, the values being similar for the shortest and longest acquisition times. This was confirmed by clinical data showing no difference in PSFEANM/OSEM ratios for the 2 min 40 s and 3 min 40 s per bed position acquisition times (Fig. 3e).

We found no confounding factors (lesion size and location, tumour heterogeneity, patient BMI) affecting the accuracy of our method. However, we noticed a trend towards higher PSFEANM/OSEM ratios in overweight and obese patients for SUVmax (Fig. 3b). This may be due to the fact that noise in PET images is higher in obese patients and SUVmax is more affected by noise than SUVmean. The observed difference was minimal and did not affect the EORTC classification based on SUVmax (Table 3). The use of SUVpeak, which is defined as the mean value within an ROI centred on the area with the highest uptake, has been reported as a slightly more robust alternative for assessing the most metabolically active part of a tumour [19]. However, SUVpeak is highly sensitive to the ROIpeak definition (i.e. shape, size and location) [38], was shown to have similar repeatability as compared to SUVmax [39] and does not necessarily perform better than SUVmax for therapy assessment [40]. In the present study, a wide range of tumour intensities was studied and no systematic error was depicted by Bland-Altman analysis (i.e. the strategy performs equally for lesions with low 18F-FDG avidity and for those with very intense 18F-FDG uptake). This finding, taken together with the lack of confounding factors affecting our strategy, suggests that it could be applicable in other solid tumours.

Conclusion

The generation of protocol-specific images with NEMA NU-2 phantom-based filtering to meet EANM quantitative harmonizing standards, in addition to images optimized for diagnostic purposes, reduces reconstruction-dependent variation in SUVs. This can be of use in multicentre trials, when using SUV for therapy monitoring, or as a diagnostic or prognostic tool. As no confounding factors (lesion size and location, tumour heterogeneity, patient BMI, image noise) affecting the accuracy of our method were found, this strategy validated in NSCLC patients could be extrapolated to other solid tumours.