Introduction

The guidelines for the management of patients with thyroid nodules and differentiated thyroid carcinoma (DTC) recently published by the American Thyroid Association (ATA) [1] represent a valuable, comprehensive resource for the clinician. These guidelines contain 80 recommendations based on 434 cited references and on the knowledge and experience of an impressive roster of not just American but also European experts.

However, unlike most other recent DTC management or treatment guidelines, e.g. those of the European Association of Nuclear Medicine (EANM) [2], the German Society of Nuclear Medicine (DGN) [3, 4] and the Latin American Thyroid Society (LATS) [5], the ATA guidelines emphasize their “evidence-based” nature: they rate and rank their recommendations according to the degree of support judged to be in the literature. The authors describe in detail their rating criteria (Table 1), which they characterize as being based mainly on patient outcome.

Table 1 Rating criteria for recommendations in the ATA 2009 Thyroid Nodule and DTC Patient Management Guidelines [1]

Nonetheless, careful reading of the ATA guidelines suggests that many of their recommendations may not in fact be as definitively evidence based as is implied. The gold standard of scientific evidence of course traditionally consists of randomized controlled trial results and, ideally, of confirmatory data from subsequent such trials in different populations. In DTC management, however, this type of published data is lacking, except regarding 131I activities for radioiodine thyroid remnant ablation (RRA) [6, 7] and regarding RRA stimulation with recombinant human thyroid-stimulating hormone (rhTSH) versus with thyroid hormone withdrawal [811]. Only recommendations that can be backed up by results conforming to this gold standard should be graded “A” or “F” (Table 1) in the ATA guidelines; it appears to be an over-interpretation to rate the strength of evidence of other kinds of published studies according to “outcome”. The lack of phase 3 trials in most areas of the DTC literature means that these strongest classifications should be reserved for at best two to three recommendations in the ATA guidelines. However, of the ATA’s 69 recommendations on DTC, 8 recommendations or recommendation components are graded “A” [recommendations 25a, 26, 32a (M1 disease), 43, 45, 48b, 52b, 56] and 2 recommendations are graded “F” (2 and 46).

Careful reading of the ATA document also raises the question of whether the guidelines always rely on a balanced selection of available data. Only a minimal summary of some of the cited publications introduces each recommendation; comprehensive tabular overviews of supporting data are not given. The reader therefore cannot easily assess the thoroughness with which the literature regarding controversial topics was considered. In particular, the ATA guidelines’ summaries often fail to mention evidence against the recommendation.

Further, whether greater weight is accorded to published meta-analyses such as those by Sawka et al. [12, 13] than to single-centre observational studies is not specified in the guidelines’ methodology description. Meta-analyses of course may show effects that individual studies are too small to detect; for example, the meta-analysis by Sawka et al. [13] regarding the effect of RRA on recurrence and disease-specific mortality considered six studies addressing recurrence at distant sites. None of the studies individually included enough distant recurrences for a 95% confidence interval (CI) for risk reduction that excluded the zero value. However, pooling the results from all six studies, the risk reduction was 2% and the 95% CI 4–1%. A significantly lower number of new metastases over the course of the disease after 131I ablation is a robust surrogate measure for a better prognosis and evidence for RRA effectiveness that would be missed considering only the individual studies.

Likewise, it is unclear whether the weighting of individual studies took into account the studies’ duration of follow-up. This variable is crucial in DTC outcome studies: for example, in the 2004 meta-analysis by Sawka et al. [12], the statistically significant benefits of 131I ablation on relapse and mortality rates were predominantly found in studies with a long-term follow-up of more than 10 years, but failed to be detected in studies with a shorter follow-up.

Thus, the ATA document demonstrates how evidence-based guidelines and ratings—unfortunately if unsurprisingly—may be vulnerable to the biases of their formulators. Therefore in weighing the ATA recommendations against those from other groups, we argue that it may be more appropriate to regard the ATA guidelines, like those of the EANM, DGN, and LATS, as representing “literature-based consensus by a group of experts”.

The aim of the present paper is to illustrate these points by critically analysing the “evidence-based” nature of the ATA guidelines regarding three DTC-related topics important to nuclear medicine, namely (1) indications and contraindications for RRA, (2) diagnostic follow-up procedures and (3) thyroid hormone therapy in athyreotic patients. To that end, we shall closely examine the ranking given and the literature cited with respect to certain recommendations. After providing this analysis, the editorial will elaborate on another caveat that should be borne in mind by the clinician when considering the ATA recommendations: their applicability to regions outside the USA.

Indications and contraindications for RRA

Among the ATA guidelines’ recommendations regarding indications and contraindications for RRA is one favouring the procedure in the presence of aggressive variants of papillary thyroid cancer (PTC) in pT1b or pT1m tumours (recommendation 32b, rating C, favourable “expert opinion”). Two other recommendations advise against RRA in patients with solitary or multifocal pT1a tumours (< 1 cm diameter) without an aggressive histology of PTC (recommendations 32c and d, respectively; rating E, “outcome not improved” for both). Of note, this last position disagrees with those of the EANM [2], LATS [5, 14] and European Society of Medical Oncology guidelines [15].

ATA justification for performing RRA in aggressive histological variants of small papillary thyroid carcinomas (PTCs)

To support the recommendation of RRA in the presence of aggressive histological variants of papillary microcarcinomas, the guidelines cite the study of Jung et al. [16], which showed that 131I ablation improved survival in patients with poorly differentiated thyroid cancer or with aggressive histological variants of PTC. Since the latter variants were present in only 14 of 72 patients in the study, this report cannot, however, reliably address whether special management is required in these relatively rare cases.

ATA justification against performing RRA in unifocal PTC ≤1 cm

To support omitting RRA in patients with solitary PTC foci ≤ 1 cm in diameter, a report by Hay et al. [17] is cited. This publication describes an observational study involving a 60-year follow-up in 900 such cases. However, only 155 of 900 patients received RRA and follow-up data were available for only 103 of these 155 patients. Additionally, just 23 of 155 patients (15%) who received 131I ablation were staged as negative for lymph node metastases (cN0). Obviously there was a major treatment bias in this study, in that in addition to their unifocal papillary microcarcinoma, most patients given RRA had unfavourable disease characteristics. The ablation group’s higher relapse rate is the logical consequence of such patient selection. Due to this bias, the Hay et al. study [17] is not convincing evidence against radioiodine ablation in small PTC. Rather, what we can learn from this paper is that lymph node metastases are not so rare in this setting. Moreover, a study by Machens and coworkers [18] that is cited in the ATA guidelines, but not in the section dealing with RRA of papillary microcarcinomas, showed that the cumulative risk of developing lymph node metastases increases continuously with the diameter of the PTC, starting as low as 5 mm. This continuous rise calls into question what truly constitutes a “very low-risk” small PTC.

It also should be noted that the ATA assessment of the papillary microcarcinoma literature seems to simply not include other observational studies which found a significant reduction of recurrence rates after RRA [19, 20].

The ATA justification against performing RRA in multifocal PTC ≤ 1 cm

With respect to multifocal papillary microcarcinoma, the ATA guidelines support omitting RRA by citing a study by Ross and coworkers [21]. In that study, patients with multifocal papillary microcarcinoma showed a less favourable DTC recurrence rate when surgical treatment was less than near-total thyroidectomy; the use of RRA in these selected patients did not change this trend. However, the study population was inhomogeneous, in that it included patients with more advanced disease, e.g. N1 (22%), pT3 (4%) or M1 (1.3% of patients) status. The investigators appear to have selectively used RRA in patients with unfavourable disease characteristics, with post-ablation 131I whole-body scintigraphy (WBS) as well as stimulated thyroglobulin (TG) measurement leading to an upstaging in these individuals. This phenomenon of less accurate staging when ablation is withheld is crucial for data interpretation in the Ross et al. study, as the mean follow-up was relatively short—only 4 years. Staging without post-therapy WBS or stimulated TG measurement may have underestimated the recurrence risk in the non-ablated cohort. In a slowly growing malignancy such as DTC, that risk may well only become apparent after long-term follow-up. Additionally, the ∼1.1 GBq (∼29.9 mCi) ablative activity used by Ross and colleagues may be insufficient in patients who only underwent subtotal thyroidectomy, who had nodal metastases or both. Considering these weaknesses, any recommendation regarding 131I ablation certainly should not be based on the Ross et al. [21] study alone.

On the other hand, evidence has emerged in recent years that supports performing RRA in the setting of multifocal papillary microcarcinoma. Shattuck et al. [22] found that in at least 50% of patients with such disease the different tumour foci were of independent clonal origin. This observation suggests that patients with DTC most likely have a heightened susceptibility for the development of multiple primary thyroid carcinomas. RRA presumably would eliminate a potential reservoir for further carcinogenesis.

Diagnostic follow-up procedures

ATA ratings

To detect persistent or recurrent DTC after the completion of primary treatment, the ATA guidelines strongly recommend, among other procedures (A ratings), TG measurement during TSH suppression and after rhTSH (recommendations 43 and 45a) and fine-needle aspiration (FNA) of small ultrasonographically suspicious lymph nodes >5–8 mm, with TG measurement in the FNA washout (recommendation 48b). This endorsement contrasts with the cautious interpretation (C rating) of the value of diagnostic 131I WBS (dxWBS) in patients at intermediate or high risk of recurrence (recommendation 47).

The enthusiasm for TG testing in general and for FNA with TG testing in the aspirates contrasts even more with the strict recommendation against dxWBS in patients at low risk for recurrence (recommendation 46, rating F). To support this last recommendation, the ATA guidelines cite a review article [23], a letter to the editor [24], and the studies of Pacini et al. [25] and of Torlontano and colleagues [26], in both of which all patients with negative TG also had a negative first dxWBS. However, the guidelines do not cite other publications on this topic showing different results, for example, the observational study by Robbins et al. [27]. This study included a subgroup of 90 low-risk patients, 8% of whom showed iodine-positive metastases on dxWBS despite negative rhTSH-stimulated TG levels (≤2 ng/ml). In view of these contradictory results, so strong a rating as “F” for dxWBS in low-risk patients does not seem to be justified.

Additionally, in patients with subtotal or near-total thyroidectomy, the high thyroid bed uptake on the post-ablation WBS will obscure small metastases in the neck and the upper chest. For low-risk patients who have undergone such surgery, the ATA follow-up strategy would at a minimum lead to uncertainty because an unremarkable follow-up dxWBS would not be documented. Indeed, the ATA strategy could even lead to a higher long-term relapse rate, since occult lesions might not be detected or adequately treated.

Further, as a threshold for a follow-up strategy without dxWBS, the TG cutoff level of ≤ 2 ng/ml suggested in the ATA guidelines’ “Algorithm for Management of DTC 6–12 Months after Remnant Ablation” cannot be considered state of the art. The 2007 comparison by Schlumberger et al. [28] of five conventional and two “supersensitive” TG assays found a de facto norm for the functional sensitivity of the former type of assays of 0.2–0.3 ng/ml. Given the points discussed above, the ATA follow-up strategy of omitting dxWBS in low-risk patients would seem to only be a safe option under the following premises [4]:

  • Minimal thyroid bed uptake on the post-ablation WBS

  • Use of a TG assay with a functional sensitivity around 0.2–0.3 ng/ml

  • ∼Six-month post-ablation rhTSH-stimulated TG concentration below that functional sensitivity

  • Absence of anti-TG antibody elevation, normal TG recovery (ca. 80–120%) or both these findings

Interestingly, the ATA guidelines state (p. 1188) that “[f]ollow-up of low-risk patients … without 131I remnant ablation … may represent a challenge” and for that statement cite the study of Torlontano et al. [29]. In this study, which described the omission of RRA in a very low-risk group, rhTSH-stimulated TG values of ≤1 ng/ml or of >1 ng/ml could not distinguish the patients with lymph node metastases.

The low diagnostic specificity of rhTSH-stimulated TG in patients who did not receive RRA presumably provides a rationale for the ATA guidelines’ strong endorsement of FNA of ultrasonographically suspicious lymph nodules of >5–8 mm. However, such nodules also may be a nonspecific marker, since modern high-resolution ultrasonography will detect 5- to 6-mm nodules in many individuals, especially during winter. The ATA’s “A” rating when recommending FNA implies a direct effect of this procedure on patient outcome; currently, however, no published evidence supports this view. Therefore this rating seems too high and “over-ambitious”. In our view, different management strategies from those recommended by the ATA guidelines are possible in patients with 6- to 8-mm lymph nodules during follow-up, but adequate first-line therapy including RRA in low-risk patients is still the best approach to facilitate accurate follow-up.

Regarding the recommendation advocating TG testing in fine-needle aspirates of cervical lymph nodes, the ATA guidelines refer to two studies [30, 31]. Both studies demonstrated a good diagnostic accuracy for such testing, but outcome evaluation was beyond their scope. Therefore the “A” designation for this recommendation is inconsistent with the ATA’s own statement on guidelines rating methodology presented in our Table 1. Moreover, given the specificity issues with small lymph nodules, the applicability and practicability of FNA with TG measurement of the washout is questionable.

Thyroid hormone therapy in athyreotic patients

Recommended target TSH levels and their rationale

In one of the ATA guidelines’ recommendations (49c) regarding target TSH levels for long-term thyroid hormone therapy of athyreotic patients who have undergone primary treatment of DTC, a level of 0.3–2.0 mU/l is endorsed for low-risk ablated patients. To support this recommendation, the ATA cites the recent work by Hovens et al. [32]. This observational study included 250 patients, who had unremarkable findings, i.e. negative dxWBS, stimulated TG < 2 ng/ml, at 1 year post-RRA. The authors found no difference in the relapse rate or death rate between patients with a median TSH above versus below the investigators’ retrospectively defined cutoff, 0.4 mU/l. Changing the median TSH cutoff to 2.0 mU/l, the authors identified far more adverse outcomes (both relapses and deaths) in patients with median TSH above that level. However, cutoffs between 0.4 and 2.0 mU/l appear not to have been extensively investigated. Considering that the effects found by Hovens and coworkers most likely represent a sliding scale, the study findings can be interpreted as an argument for maintaining TSH levels well below the 2 mU/l cutoff. By and large, the ATA’s “B” rating seems too high for the small body of evidence on which these very precise recommendations are based.

Another caveat: applicability

As alluded to earlier, a noteworthy caveat to clinicians using the ATA guidelines is the applicability of at least some of the recommendations outside the USA. Indications for RRA provide one example. In areas of former or current iodine deficiency such as e.g. Germany, the patient population referred for thyroid surgery is quite different from that in iodine-sufficient parts of the world such as the USA. In highly specialized American clinics, the a priori suspicion of malignancy of a given nodule is much higher, resulting in a more radical approach to surgery with the aim to remove as much thyroid tissue as possible. In contrast, in iodine-deficient areas, up to 50% of thyroid cancers are encountered unexpectedly after (sub)total (hemi)thyroidectomy for presumably benign goiter. In surgery for this less serious disorder, there is a greater emphasis on minimizing the risk of complications such as recurrent laryngeal nerve palsy or hypoparathyroidism and hence, a more conservative approach. Consequently, thyroid remnants often are relatively large. The dilemma therefore arises of weighing the considerable risk of disabling recurrent laryngeal nerve damage, leading to a severe reduction in quality of life, against the relatively low risks of RRA. When faced with this choice, both physician and patient nearly always opt for 131I therapy.

Conclusions

The recently published ATA guidelines on the management of patients with thyroid nodules and DTC represent an admirable and extensive effort and contain much information and many ideas of interest for the clinician. However, a close analysis of certain recommendations in this document, the cited supporting literature and the strength-of-evidence ratings illustrates how even guidelines characterized as “evidence based” may not always escape bias caused by the views of their formulators. In the elaboration of DTC guidelines, the major sources of subjective and disputable interpretation are the generally excellent prognosis of this neoplasm and the resultant lack of randomized controlled trials for most interventions. As a result of the excellent prognosis, varying disease management algorithms can produce at most small differences in patient outcome. Comparisons of interventions frequently require large sample sizes and lengthy follow-ups—sometimes much longer than the average physician’s career—that render most randomized controlled trials impracticable. Barring the availability of definitive evidence, the input of experienced clinicians from all disciplines involved in DTC care is needed. However, the multidisciplinary involvement runs the danger of introducing biases favouring the procedures of the experts’ specialties—a phenomenon to which we as nuclear medicine physicians are of course not immune. For all these reasons, truly objective DTC management guidelines are nearly impossible to develop. Thus before applying DTC management guidelines to everyday practice, the clinician should be aware that recommendations not infrequently suffer from their authors’ reading what they believe instead of believing what they read.