Remodeling 99mTc-Pertechnetate Thyroid Uptake: Statistical, Machine Learning, and Deep Learning Approaches

Geoffrey M. Currie; Basit Iqbal

doi:10.2967/jnmt.121.263081

Visual Abstract

Abstract

Although reference ranges for ^99mTc thyroid percentage uptake vary, the seemingly intuitive evaluation of thyroid function does not reflect the complexity of thyroid pathology and biochemical status. The emergence of artificial intelligence in nuclear medicine has driven problem solving associated with logic and reasoning, warranting reexamination of established benchmarks in thyroid functional assessment. Methods: This retrospective study of 123 patients compared scintigraphic findings with grounded truth established through biochemistry status. Conventional statistical approaches were used in conjunction with an artificial neural network to determine predictors of thyroid function from data features. A convolutional neural network was also used to extract features from the input tensor (images). Results: Analysis was confounded by subclinical hyperthyroidism, primary hypothyroidism, subclinical hypothyroidism, and triiodothyronine toxicosis. Binary accuracy for identifying hyperthyroidism was highest for thyroid uptake classification using a threshold of 4.5% (82.6%), followed by pooled physician interpretation with the aid of uptake values (82.3%). Visual evaluation without quantitative values reduced accuracy to 61.0% for pooled physician determinations and 61.4% classifying on the basis of thyroid gland intensity relative to salivary glands. The machine learning (ML) algorithm produced 84.6% accuracy; however, this included biochemistry features not available to the semantic analysis. The deep learning (DL) algorithm had an accuracy of 80.5% based on image inputs alone. Conclusion: Thyroid scintigraphy is useful in identifying hyperthyroid patients suitable for radioiodine therapy when using an appropriately validated cutoff for the patient population (4.5% in this population). ML artificial neural network algorithms can be developed to improve accuracy as second-reader systems when biochemistry results are available. DL convolutional neural network algorithms can be developed to improve accuracy in the absence of biochemistry results. ML and DL do not displace the role of the physician in thyroid scintigraphy but can be used as second-reader systems to minimize errors and increase confidence.

In 1967, Atkins and Richards (1) evaluated the potential role of ^99mTc-pertechnetate in evaluating thyroid function as an alternative to sodium iodide with ¹³¹I on the basis that ^99mTc uptake in the thyroid reflects the gland’s trapping function. This landmark work used a probe detector rather than γ-camera imaging for the uptake calculation. A small number of hypothyroid patients were included, and all had percentage uptakes below 0.5%. Only 2 of 15 hyperthyroid patients fell below 4%, whereas 4 of 133 euthyroid patients had uptake above 4%. Thus, a cutoff for normality was set at 0.4%–4.0% to provide 87% accuracy in hyperthyroidism, 97% accuracy in euthyroidism, and 100% accuracy in hypothyroidism.

Later work, in 1973, by Maisey et al. (2) used a γ-camera, pinhole collimation, and interfaced computer to generate regions of interest for calculation of ^99mTc-pertechnetate uptake in the thyroid. Uptake was 0.2%–3.6% in euthyroid patients, 0.3%–6.2% in the presence of a goiter, 2.8%–8.8% in hyperthyroidism, and 0.1%–0.3% in hypothyroidism, leading to establishment of a reference range of 0.3%–3.4%. More recently, ^99mTc-pertechnetate uptake in euthyroidism was characterized in the range of 0.4%–1.7% in 47 clinically normal patients (3). It is widely acknowledged that reference values change with geography and time, particularly in relation to iodine deficiency (4). Although widespread use of international standards is common (0.5%–4.5% for example), these values may not reflect either the technique used (probe vs. γ-camera) or population characteristics (e.g., iodine deficiency). In Namibia, investigators found the reference range to be 0.15%–2.14% (4) although the study included only 76 patients and all were euthyroid. A U.K. study (5) used 60 euthyroid patients to estimate the local reference range as 0.2%–2.0%.

Although reference ranges for percentage uptake vary, the method for calculation of thyroid function on ^99mTc scintigraphy also varies (6). A seemingly intuitive evaluation of thyroid function has also been used as a visual evaluation of thyroid activity relative to salivary gland activity (Fig. 1). Such an evaluation does not reflect the complexity of thyroid pathology and biochemical status. When the bulk of patients are euthyroid or hyperthyroid, this simplification is intuitive, but it fails to accommodate subclinical hyperthyroidism, which can produce a low thyroid accumulation of ^99mTc; triiodothyronine (T3) toxicosis, which can have high or low ^99mTc uptake; subclinical hypothyroidism, which can have elevated or normal ^99mTc accumulation; and primary hypothyroidism, which can have normal or elevated ^99mTc accumulation. Thus, the accuracy of ^99mTc uptake may be more dependent on the pathologic cross section of patients than on the technique itself.

FIGURE 1.

Intuitive, but sometimes inaccurate, visual evaluation of thyroid status relative to salivary gland activity. (Left) Salivary gland activity exceeding thyroid gland activity suggests hypothyroidism. (Middle) Salivary gland activity and thyroid gland activity being similar (within same scale) suggests euthyroidism. (Right) Salivary gland activity not being apparent relative to thyroid activity suggests hyperthyroidism. All images were obtained with ^99mTc-pertechnetate using high-resolution, parallel-hole imaging.

The emergence of artificial intelligence in nuclear medicine has driven problem solving associated with logic and reasoning (7,8). Developments in machine learning (ML) and deep learning (DL) provide valuable research tools, particularly for image segmentation and interpretation (9). The artificial neural network (ANN) provides the backbone for both ML and DL algorithms. The ANN relies on input of specific data (features) and generally refers to ML. More complex ANNs can produce deep architectures (a high number of layers and nodes) and refers to DL. Deep ANNs are generally associated, in medical imaging, with convolutional neural networks (CNN) that use convolution and pooling layers to extract features from input tensors (images) (9,10). Although there have been historical uses of neural networks to classify thyroid-based ophthalmologic conditions and evaluate in vitro laboratory tests, it is only recently that DL approaches have been applied to thyroid scintigraphy. Using SPECT thyroid scintigraphy, 3 DL models based on AlexNet, VGGNet, and ResNet architectures trained on 1,430 clinical studies were modeled and compared with residents in nuclear medicine (11). Although the investigators concluded that DL approaches performed well in thyroid scintigraphy, the role of DL might be limited to assisting the physician in training rather than having any specific clinical utility. The algorithms marginally outperformed first-year residents but did not perform as well as second-year residents, let alone experienced physicians. Concurrent use of the DL approaches improved the performance of residents on the order of 5% and reduced reporting time. Nonetheless, there is a need to explore potential clinical and research applications, and the less complex nature of planar thyroid scintigraphy may be better suited to DL approaches. The performance of these algorithms was enhanced by a sanitized dataset with a case population comprising healthy individuals (175), patients with Graves disease (834), and patients with subacute thyroiditis (421). The 3 DL architectures reported a high degree of recall for subacute thyroiditis, poor accuracy for normality, and moderate accuracy for Graves disease (11).

The aim of this investigation was to correlate each of the following with biochemical status and compare performance: percentage uptake of ^99mTc, visual correlation of thyroid activity in the thyroid, ML algorithms using an ANN, and DL approaches using a CNN.

MATERIALS AND METHODS

The study retrospectively analyzed 123 patients (90.2% female), with a mean age of 35 y (range, 10–70 y). The mean intravenous dose of ^99mTc was 153.4 MBq. ^99mTc-based thyroid uptake was determined using background-corrected thyroid regions of interest and a measured standard. All calculations were decay-corrected and accounted for residual dose in the syringe after injection. The extracted image features included both background-corrected and non–background-corrected total thyroid, left-side and right-side area (cm²), counts, and counts per pixel. The ratio of the right lobe to the left lobe for area (cm²), counts, and counts per pixel was also determined with and without background correction. Additionally, the ratio of thyroid count to background count for total thyroid, right lobe, and left lobe was determined (trapping index). The dose relative to the total count was also calculated, and visual classification of thyroid activity relative to the salivary glands was recorded. Biochemical features included the levels of free thyroxine (T4) (pmol/L), free T3 (pmol/L), and thyroid-stimulating hormone (µIU/mL). The biochemical status of the patient was determined (Table 1) and was further stratified as ternary (hypothyroid, euthyroid, or hyperthyroid) or binary (hyperthyroid or not hyperthyroid) (1–6,12,13). Other imaging features were also recorded (e.g., hot or cold nodule and multinodular goiter). Only 96 patients had both imaging features and biochemical status available. The investigation was approved by the institutional ethics committee.

View this table:

TABLE 1.

Biochemical Stratification of Patient Studies and Findings (1–6,12,13)

Conventional statistical analysis was undertaken using JMP software (version 15.2.1; SAS Institute). The statistical significance was calculated using χ² analysis for nominal data and the Student t test for continuous data. The Pearson χ² test was used for categoric data with a normal distribution, and the likelihood ratio χ² test was used for categoric data without a normal distribution. F test ANOVA was used to determine statistically significant differences within grouped data. A P value of less than 0.05 was considered significant. Interobserver correlation was evaluated with χ² analysis, and interobserver reliability was measured using the Cohen κ-coefficient.

The data were also evaluated using an ANN (Neural Analyser, version 2.9.5; Artificial Intelligence Techniques, Ltd.). There were 42 input variables in 123 patients (instances) using a binary classification of hyperthyroid or euthyroid. A 50:25:25 split of 96 valid instances (excluded missing biochemistry data) was used for training, selection, and testing. The initial network architecture included 16 scaling layer inputs and 3 hidden layers of 6, 4, and 3 nodes, using a logistic activation function (defines the output of each node based on its input) for a single probabilistic layer (binary). The weighted squared error method was used to determine the loss index, and the neural parameter norm was used for the regularization method. A quasi-Newtonian training method was applied using gradient information to estimate the inverse Hessian matrix for each iteration of the algorithm (no second derivatives). The loss function associated with the training phase estimates the error associated with the data that the neural network observes.

A single anterior neck image for the 96 patients was evaluated by 3 independent expert physicians masked to other image and biochemical features. On the basis of the visual appearance, each scan was recorded as euthyroid, hypothyroid, or hyperthyroid. On completion of the stratification, each physician reevaluated the ternary status, with the visual inspection supplemented by the calculated thyroid uptake (%). The physician rating was determined by majority group consensus.

Individual, nonannotated, anterior neck images representative of each patient were evaluated using a CNN classifier (Deep Learning Toolkit Deep Network Designer App in MATLAB, version R2020b; MathWorks). Given the lack of discriminatory power of either visual evaluation or thyroid uptake quantitation using various cutoffs to identify hypothyroidism, the CNN classifier was designed to identify hyperthyroidism or no hyperthyroidism (euthyroid and hypothyroid). Given the lack of complexity in the image data, the architecture used for the CNN was initially modeled on a binary version of AlexNet with 25 layers but optimized using a model that resembled the VGG-19 CNN architecture with a binary output and 30 layers (Table 2; Fig. 2). All patient files were trained and validated 3 times (70:30 random data split) for each of 3 image types; white on black gray scale, black on white gray scale, and the magnitude spectrum of the Fourier transformation of each image (Fig. 3). Specific parameters included an ADAM (adaptive movement estimation) stochastic gradient descent optimizer algorithm, an initial learn rate of 0.001, a maximum of 50 epochs (1 epoch = 1 iteration), and randomization with each epoch.

View this table:

TABLE 2.

CNN Architecture, Activations, and Parameters

FIGURE 2.

CNN architecture. 2D = 2-dimensional; ReLU = rectified linear unit.

FIGURE 3.

Three example patients (top, middle, and bottom) with black on white (left), white on black (center), and magnitude spectrum from Fourier transformation (right) used as inputs for CNN.

Situation analysis was undertaken using the confusion matrix for classifier prediction, including true-positives (TPs), false-positives (FPs), true-negatives (TNs), and false-negatives (FNs). Several performance indicators can be gleaned from the confusion matrix, including precision ( TPs/[TPs + FPs]), recall (TPs/[TPs + FNs]), accuracy ([TPs + TNs]/[TPs + TNs + FPs + TNs]), and F1 score (2 × TPs/[2 × {TPs + FPs + FNs}]).

RESULTS

Statistical Analysis

For the 123 patients, the mean thyroid uptake was 4.4% (95% CI, 3.3%–5.5%), with a median of 2.2% (Table 3). Among the visual findings, 9 patients had increased uptake associated with primary hypothyroidism, 22 had increased uptake because of Graves disease, 9 had multinodular goiters, 2 had nodular thyroids, 28 had a normal morphology, 3 had goiters, 11 had reduced or absent uptake, 7 had autonomous glands with contralateral suppression (6 on the right), 24 had cold nodules (16 on the right), and 8 had hot nodules (4 on the right). Table 4 summarizes other key demographic data.

View this table:

TABLE 3.

Ternary Classification of Thyroid Function Based on Various Published Reference Ranges

View this table:

TABLE 4.

Key Variables

The mean age of hypothyroid patients (48.0 y) was statistically higher than that for biochemically euthyroid patients (33.7 y) (P = 0.041) but not for hyperthyroid patients (36.7 y). There was also a weak positive correlation between age and thyroid size (P < 0.001; R² = 0.117). No other statistically significant relationships were noted for patient age. Men demonstrated a statistically higher mean thyroid area (48.5 cm²) than women (32.2 cm²) (P = 0.003). There was also a statistically significant difference in the biochemical status (P = 0.019), with a disproportionately high representation of hyperthyroidism for men and a lower euthyroid rate. Given the lower representation of men in the thyroid scan population, this observation may reflect lower presentation rates for men in the absence of markedly abnormal thyroid function driving more pressing symptoms. No other statistically significant relationships were noted for patient gender or patient dose (MBq).

There was no statistically significant correlation between thyroid uptake and right-lobe–to–left-lobe ratio (P = 0.672), thyroid area (P = 0.166), or background counts per pixel (CCP) (P = 0.416). The increase in thyroid uptake associated with increasing total counts (P < 0.001; R² = 0.458) and total CPP (P < 0.001; R² = 0.356) was expected. There were also statistically significant relationships between increasing thyroid uptake and increasing thyroid-to-background ratios (P < 0.001; R² = 0.376). The mean thyroid uptake was statistically higher (P < 0.001) when the scan showed—relative to appropriately thresholded thyroid activity—no salivary activity (9.1%) than when it showed—relative to faint thyroid activity (2.5%)—salivary activity less than thyroid activity (1.7%), salivary activity equal to thyroid activity (1.1%), or salivary activity greater than thyroid activity (0.4%). A positive correlation between thyroid uptake and both free T4 (P < 0.001; R² = 0.351) and free T3 (P < 0.001; R² = 0.365) was noted; however, no correlation was noted between thyroid uptake and thyroid-stimulating hormone (P = 0.695; R² = 0.002).

Biochemical status demonstrated a statistically significant difference (P < 0.001) in mean thyroid uptake stratified as hyperthyroid (9.5%; 95% CI, 7.1%–12.0%), hypothyroid (4.0%; 95% CI, 1.3%–6.7%), and euthyroid (2.5%; 95% CI, 0.9%–4.2%). Hypothyroid studies had a higher mean thyroid uptake than euthyroid studies because of the primary hypothyroidism cases. Excluding primary hypothyroidism, there was no statistically significant difference in thyroid uptake between hypothyroidism and euthyroidism, or between hypothyroidism and subclinical hyperthyroidism or suppressed hyperthyroidism. Although 4.5% is the cutoff reflecting 100% sensitivity for standard hyperthyroidism, clinical hyperthyroidism with suppression and subclinical hyperthyroidism (both biochemically) are not identified by this reference range.

The optimized cutoff range for thyroid uptake against biochemical status was 0.45%–4.5%, although the lower end of this range is a poor discriminator for hypothyroidism against euthyroidism. For biochemical hyperthyroidism, 70.8% of cases had an uptake greater than 4.5% whereas 29.3% fell below 4.5%. Of those below 4.5%, 100% had biochemically subclinical hyperthyroidism or T3 toxicosis. Of patients with true hyperthyroidism biochemically, 100% had uptake above 4.5%. Conversely, 27.8% of hypothyroidism cases had uptake above 4.5%. There were no hypothyroidism cases that had uptake values below the 0.45% cutoff (all values below this were hyperthyroid or euthyroid biochemically). In the biochemically euthyroid range, only 6% had an uptake above 4.5%, and only 2% had an uptake below 0.45%.

Using the ternary classification, a thyroid uptake above 4.5% had a sensitivity of 70.8% and a specificity of 88.2% for detecting hyperthyroidism. A thyroid uptake below 0.45% had a sensitivity of 0% and specificity of 95.9% for hypothyroidism (Fig. 4, left). A broader biochemical classification of hyperthyroidism saw the sensitivity of the 4.5% cutoff reach 100%, with a specificity of 88.2% (Fig. 4, right).

FIGURE 4.

(Left) Ternary biochemical status classification against thyroid uptake. (Right) Broader biochemical status classification against thyroid uptake. Horizontal line represents overall mean, and diamonds represent class mean and 95% CIs.

On the basis of the ternary biochemical status, there was a statistically higher thyroid area for hyperthyroidism (40.7 cm²) than for hypothyroidism (29.5 cm²) or euthyroidism (33.0 cm²) (P = 0.049). With reference to Figure 1, the scintigraphic appearance of thyroid activity relative to salivary gland activity correctly identified 70.3% of hyperthyroid studies, 0% of hypothyroid studies, and 62.7% of euthyroid studies (Table 5). Excluding subclinical hyperthyroidism and T3 toxicosis, 94.1% of hyperthyroidism studies were identified using the visual criteria. Table 5 also provides an outline of TP rate (recall) for each set of cutoffs against the biochemical status.

View this table:

TABLE 5.

Ternary Classification of Thyroid Function Based on Recall Against Biochemical Status

ML

There were 42 input variables in 96 patients (instances) using a binary classification of hyperthyroid or euthyroid. The heat-map correlation matrix identified several redundant variables, and the highest correlation scores were associated with thyroid-stimulating hormone (0.888), appearance of salivary glands on scans (0.627), free T4 (0.575), percentage uptake (0.501), and free T3 (0.491), consistent with the conventional statistical analysis. The network architecture included 16 scaling layer inputs and 3 hidden layers of 6, 4 and 3 nodes. The initial value of the training loss was 1.5473, and the final value after 105 iterations was 0.0172. The initial value of the selection loss was 1.5570, and the final value after 105 iterations was 1.1895.

A growing-inputs method was used to calculate the correlation for every input against each output in the dataset. Beginning with the most highly correlated inputs, progressively decreasing correlated inputs were added to the network until the selection loss increased. The final architecture of the neural network reflected the optimized subset of inputs with the lowest selection loss. In this case, the selection loss and the training loss identified the optimal number of inputs to be 4, with the training loss optimized at 0.0298 and the selection loss being less than 0.0001. The final architecture was 4 scaling-layer inputs; 3 hidden layers of 6, 4, and 1 nodes; an unscaling layer; and a single binary probabilistic layer (Fig. 5).

FIGURE 5.

Final architecture of trained and validated neural network. TSH = thyroid-stimulating hormone.

Several metrics were used to test the final architecture using a subset of the original patient data. Receiver-operator-characteristic analysis demonstrated an area under the curve of 0.933. This correlated with a sensitivity of 100%, a specificity of 80%, and a classification accuracy of 0.846. These results were consistent with scores of 0.60 for precision, 0.75 for F1 score (harmonic mean of sensitivity and precision), 0.693 for the Matthew correlation (correlation between targets and outputs), and 0.8 for the Youden index (probability of a correct decision as opposed to guessing). The cumulative gain analysis demonstrates the benefit of using the developed model over a random guess. On the graph in Figure 6, the positive cumulative gain shows the percentage of positive instances found (y-axis) against the percentage of population (x-axis). Similarly, the negative cumulative gain shows the percentage of negative instances found against the percentage of population. The straight line represents a random classifier. The broader the separation, the better the predictive model. Since the instance ratio provides maximum separation (maximized percentage of positive and negative instances), an instance ratio of 0.40 has a maximum gain score of 0.8. Specifically, but individually, hyperthyroidism is predicted by ^99mTc uptake above 5.7%, free T4 below 20 pmol/L or above 34 pmol/L, free T3 above 9.8 pmol/L, and thyroid-stimulating hormone below 5.5 μIU/mL. In combination, these scaled and weighted input features of the neural network can be expressed mathematically, enhancing the collective predictive capability.

FIGURE 6.

Cumulative gain chart demonstrating maximum separation of positive and negative curves to provide cumulative gain score of 0.8 and instances ratio of 0.4 (arrow).

DL

Preliminary network development demonstrated overfitting beyond 30 iterations (epochs); therefore, the maximum epoch number was reset to 30. The results of the triplicated training and validation passes are summarized in Table 6. The variations in validation accuracy reflect the smaller dataset and the random assignment of cases to training and validation. No statistically significant differences (grouped F test) were noted between training and validation accuracy for different types of input tensors (P = 0.161 for training accuracy and 0.531 for validation accuracy) despite the higher accuracy for white on black and the lower accuracy for the magnitude spectrum. A direct comparison of white on black against the magnitude spectrum showed P values of 0.068 for training accuracy and 0.280 for validation accuracy.

View this table:

TABLE 6.

Triplicate Training and Validation Binary Results (Hyperthyroid or Not Hyperthyroid) for 30-Layer CNN Architecture

DISCUSSION

Although thyroid scintigraphy is a well-established technique for the assessment of thyroid function, opinions vary on the role in identifying low versus high thyroid uptake to guide radionuclide therapy. Thyroid scintigraphy is useful in the evaluation of hyperthyroidism to differentiate causes and guide therapy (14). Although the specific scintigraphic patterns associated with thyroid pathology do not easily differentiate the biochemical status of the patient (Fig. 7), scintigraphic imaging does provide information useful in identifying patients suitable for radioiodine therapy (14). Despite being in widespread use for this purpose internationally, ^99mTc-pertechnetate–based thyroid uptake is not considered suitable in some circles for guiding the therapeutic dosage of radioiodine (14). Consistent with the observations of this study, scintigraphy has a limited role in hypothyroidism (15).

FIGURE 7.

Various scintigraphic appearances of thyroid pathology using parallel-hole (high-resolution) collimation and ^99mTc-pertechnetate. MNG = multinodular goiter.

The challenges and limitations of thyroid scintigraphy are highlighted by poor agreement of physician interpretation. However, with the exclusion of patient history and biochemistry results, the physician interpretation is not done under normal conditions, but for the purpose of this study, the constrained interpretation provides a useful benchmark. Using a thyroid uptake cutoff of 0.45%–4.5%, agreement with physician interpretation was only 63.5%, and using salivary gland appearance, agreement was just 53.1%. Agreement between physicians was not strong, at 59.4%–86.5%, and agreement with biochemistry-grounded truth ranged from 42.7% to 68.8%. This, combined with the poor prediction utility of the salivary gland appearance, contradicts the simplicity of thyroid imaging depicted in Figure 1.

Using the ternary classification of euthyroid, hyperthyroid, and hypothyroid, a thyroid uptake above 4.5% had a sensitivity of 70.8% for detecting hyperthyroidism and a specificity of 88.2%. A thyroid uptake below 0.45% had a sensitivity for hypothyroidism of 0% and a specificity of 95.9%. Specific biochemical classification of hyperthyroidism that excluded T3 toxicosis and subclinical hyperthyroidism improved the sensitivity of the 4.5% cutoff to 100%, with a specificity of 88.2%. This finding highlights the value of thyroid uptake with a cutoff of 4.5% in identifying patients suitable for radioiodine therapy. Given that this goal is the primary one and that scintigraphy has a limited role in hypothyroidism in adults, a binary classification (hyperthyroidism or no hyperthyroidism) provides a more suitable evaluation. The value of an appropriate thyroid uptake cutoff is highlighted in Table 5, which shows that in this population, binary accuracy was high for a 4.5% cutoff (82.6%) and for physician interpretation augmented by uptake value (82.3%) but was low for salivary gland appearance alone (59.4%) and for masked physician interpretation (61.0%). Indeed, the value and accuracy of 4.5% as the cutoff are reinforced by the similarity in physician interpretation with and without the uptake-augmented information.

Although ML was able to demonstrate improved accuracy to 100%, the algorithm relied on biochemistry not available for physician interpretation. Indeed, the grounded truth relied on the additional value of biochemistry insights to physician insights. In the absence of available biochemistry results, the ML algorithm relies on uptake alone. Conversely, the physician interpretation would improve substantially with the additional insights from biochemistry. In this study, regardless of the apparent performance results, ML augmentation outperformed physician interpretation only because the physician was masked to the biochemistry results available to the ML algorithm. Nonetheless, the role of ML is not and should not be to displace physician reporting but rather to improve accuracy by eliminating error. In this instance, the ML algorithm has been shown to be an accurate second-reader system that can be automated with minimal cost and resources to identify hyperthyroid patients suitable for radioiodine therapy.

In contrast to the success of ML algorithm development, the DL CNN performed more poorly than either the 4.5% cutoff discriminator or the uptake-augmented physician interpretation. The best result was achieved using the white-on-black images (80.5%). Although this result represents only a marginal decrease in performance compared with uptake alone (82.6%) and physician interpretation (82.3%), the CNN was trained on only a single anterior neck image and had no inputs for either the thyroid uptake percentage or the biochemistry results. As a result, the comparative performance should be considered the physician rating without uptake values. In this regard, the 80.5% binary accuracy of the CNN was superior to the physician interpretation (61.0%) and the visual classification against salivary gland appearance (61.5%). Although this result does not suggest displacement of physician interpretation, it does indicate that the accuracy of physician reporting might be improved using the CNN algorithm when biochemistry results are not available.

CONCLUSION

Thyroid scintigraphy is useful in identifying hyperthyroid patients suitable for radioiodine therapy. Physician interpretation relies on an accurate thyroid function assessment (uptake) and an appropriately validated cutoff for the patient population (4.5% in this population). An inappropriate cutoff significantly undermines accuracy. ML ANN algorithms can be developed to improve accuracy as second-reader systems when biochemistry results are available. DL CNN algorithms can be developed to improve accuracy in the absence of biochemistry results. ML and DL do not displace the role of the physician in thyroid scintigraphy but can be used as second-reader systems to minimize errors and increase confidence.

DISCLOSURE

No potential conflict of interest relevant to this article was reported.

ACKNOWLEDGMENT

We thank the 3 physicians who performed the visual analysis of the images. We also thank Hugo Currie from the College of Engineering and Computer Science, Australian National University, Canberra, Australia, for producing the Fourier magnitude spectrum images for analysis.

KEY POINTS

QUESTION: Can ML and DL approaches improve semantic evaluation of thyroid scintigraphy and uptake in hyperthyroidism?

PERTINENT FINDINGS: ML algorithms can be developed to improve accuracy as second-reader systems when biochemistry results are available. DL CNN algorithms can be developed to improve accuracy in the absence of biochemistry results.

IMPLICATIONS FOR PATIENT CARE: ML and DL do not displace the role of the physician in thyroid scintigraphy but can be used as second-reader systems to minimize errors and increase confidence.

Footnotes

Published online Dec. 7, 2021.

REFERENCES

1.↵
1. Atkins HL,
2. Richards P
. Assessment of thyroid function and anatomy with technetium-99m as pertechnetate. J Nucl Med. 1968;9:7–15.
OpenUrl Abstract/FREE Full Text
2.↵
1. Maisey MN,
2. Natarajan TK,
3. Hurley PJ,
4. Wagner HN Jr.
. Validation of a rapid computerized method of measuring ^99mTc pertechnetate uptake for routine assessment of thyroid structure and function. J Clin Endocrinol Metab. 1973;36:317–322.
OpenUrl CrossRef PubMed
3.↵
1. Ramos CD,
2. Wittmann DEZ,
3. de Camargo Etchebehere ECS,
4. Tambascia MA,
5. Silva CAM,
6. Camargo EE
. Thyroid uptake and scintigraphy using ^99mTc pertechnetate: standardization in normal individuals. Sao Paulo Med J. 2002;120:45–48.
OpenUrl PubMed
4.↵
1. Hamunyela RH,
2. Kotze T,
3. Philotheou GM
. Normal reference values for thyroid uptake of technetium-99m pertechnetate for the Namibian population. J Endocrin Metab Diabetes S Afr. 2013;18:142–147.
OpenUrl
5.↵
1. Macauley M,
2. Shawgi M,
3. Ali T,
4. et al
. Assessment of normal reference values for thyroid uptake of technetium-99m pertechnetate in a single centre UK population. Nucl Med Commun. 2018;39:834–838.
OpenUrl
6.↵
1. Currie G,
2. Dixon C,
3. Vu T
. Validation of a normal range for trapping index in thyroid scintigraphy. ANZ Nucl Med. 2004;35:11–16.
OpenUrl
7.↵
1. Currie G,
2. Hawk KE,
3. Rohren E,
4. Vial A,
5. Klein R
. Machine learning and deep learning in medical imaging: intelligent imaging. J Med Imaging Radiat Sci. 2019;50:477–487.
OpenUrl
8.↵
1. Currie GM
. Intelligent imaging: artificial intelligence augmented nuclear medicine. J Nucl Med Technol. 2019;47:217–222.
OpenUrl Abstract/FREE Full Text
9.↵
1. Currie G
. Intelligent imaging: anatomy of machine learning and deep learning. J Nucl Med Technol. 2019;47:273–281.
OpenUrl Abstract/FREE Full Text
10.↵
1. Currie G,
2. Rohren E
. Intelligent imaging in nuclear medicine: the principles of artificial intelligence, machine learning and deep learning. Semin Nucl Med. 2021;51:102–111.
OpenUrl
11.↵
1. Qiao T,
2. Liu S,
3. Cui Z,
4. et al
. Deep learning for intelligent diagnosis in thyroid scintigraphy. J Int Med Res. 2021;49:300060520982842.
OpenUrl
12.↵
1. Alswat K,
2. Assiri SA,
3. Althaqafi RMM,
4. et al
. Scintigraphy evaluation of hyperthyroidism and its correlation with clinical and biochemical profiles. BMC Res Notes. 2020;13:324.
OpenUrl
13.↵
1. Wagieh S,
2. Salman K,
3. Bakhsh A,
4. et al
. Retrospective study of Tc-99m thyroid scan in patients with Graves’ disease: is there significant difference in lobar activity? Indian J Nucl Med. 2020;35:122–129.
OpenUrl
14.↵
1. Mariani G,
2. Tonacchera M,
3. Grosso M,
4. Orsolini F,
5. Vitti P,
6. Strauss HW
. The role of nuclear medicine in the clinical management of benign thyroid disorders, part 1: hyperthyroidism. J Nucl Med. 2021;62:304–312.
OpenUrl Abstract/FREE Full Text
15.↵
1. Mariani G,
2. Tonacchera M,
3. Grosso M,
4. et al
. The role of nuclear medicine in the clinical management of benign thyroid disorders, part 2: nodular goiter, hypothyroidism, and subacute thyroiditis. J Nucl Med. 2021;62:886–895.
OpenUrl Abstract/FREE Full Text

Received for publication August 19, 2021.
Revision received November 10, 2021.

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Bookmark this article

Cited By...

Its Time to Gather and Share

Google Scholar

Keywords

[1] 1.↵
Atkins HL,
Richards P
. Assessment of thyroid function and anatomy with technetium-99m as pertechnetate. J Nucl Med. 1968;9:7–15.
OpenUrl Abstract/FREE Full Text

[2] Atkins HL,

[3] Richards P

[4] 2.↵
Maisey MN,
Natarajan TK,
Hurley PJ,
Wagner HN Jr.
. Validation of a rapid computerized method of measuring ^99mTc pertechnetate uptake for routine assessment of thyroid structure and function. J Clin Endocrinol Metab. 1973;36:317–322.
OpenUrl CrossRef PubMed

[5] Maisey MN,

[6] Natarajan TK,

[7] Hurley PJ,

[8] Wagner HN Jr.

[9] 3.↵
Ramos CD,
Wittmann DEZ,
de Camargo Etchebehere ECS,
Tambascia MA,
Silva CAM,
Camargo EE
. Thyroid uptake and scintigraphy using ^99mTc pertechnetate: standardization in normal individuals. Sao Paulo Med J. 2002;120:45–48.
OpenUrl PubMed

[10] Ramos CD,

[11] Wittmann DEZ,

[12] de Camargo Etchebehere ECS,

[13] Tambascia MA,

[14] Silva CAM,

[15] Camargo EE

[16] 4.↵
Hamunyela RH,
Kotze T,
Philotheou GM
. Normal reference values for thyroid uptake of technetium-99m pertechnetate for the Namibian population. J Endocrin Metab Diabetes S Afr. 2013;18:142–147.
OpenUrl

[17] Hamunyela RH,

[18] Kotze T,

[19] Philotheou GM

[20] 5.↵
Macauley M,
Shawgi M,
Ali T,
et al
. Assessment of normal reference values for thyroid uptake of technetium-99m pertechnetate in a single centre UK population. Nucl Med Commun. 2018;39:834–838.
OpenUrl

[21] Macauley M,

[22] Shawgi M,

[23] Ali T,

[24] et al

[25] 6.↵
Currie G,
Dixon C,
Vu T
. Validation of a normal range for trapping index in thyroid scintigraphy. ANZ Nucl Med. 2004;35:11–16.
OpenUrl

[26] Currie G,

[27] Dixon C,

[28] Vu T

[29] 7.↵
Currie G,
Hawk KE,
Rohren E,
Vial A,
Klein R
. Machine learning and deep learning in medical imaging: intelligent imaging. J Med Imaging Radiat Sci. 2019;50:477–487.
OpenUrl

[30] Currie G,

[31] Hawk KE,

[32] Rohren E,

[33] Vial A,

[34] Klein R

[35] 8.↵
Currie GM
. Intelligent imaging: artificial intelligence augmented nuclear medicine. J Nucl Med Technol. 2019;47:217–222.
OpenUrl Abstract/FREE Full Text

[36] Currie GM

[37] 9.↵
Currie G
. Intelligent imaging: anatomy of machine learning and deep learning. J Nucl Med Technol. 2019;47:273–281.
OpenUrl Abstract/FREE Full Text

[38] Currie G

[39] 10.↵
Currie G,
Rohren E
. Intelligent imaging in nuclear medicine: the principles of artificial intelligence, machine learning and deep learning. Semin Nucl Med. 2021;51:102–111.
OpenUrl

[40] Currie G,

[41] Rohren E

[42] 11.↵
Qiao T,
Liu S,
Cui Z,
et al
. Deep learning for intelligent diagnosis in thyroid scintigraphy. J Int Med Res. 2021;49:300060520982842.
OpenUrl

[43] Qiao T,

[44] Liu S,

[45] Cui Z,

[46] et al

[47] 12.↵
Alswat K,
Assiri SA,
Althaqafi RMM,
et al
. Scintigraphy evaluation of hyperthyroidism and its correlation with clinical and biochemical profiles. BMC Res Notes. 2020;13:324.
OpenUrl

[48] Alswat K,

[49] Assiri SA,

[50] Althaqafi RMM,

[51] et al

[52] 13.↵
Wagieh S,
Salman K,
Bakhsh A,
et al
. Retrospective study of Tc-99m thyroid scan in patients with Graves’ disease: is there significant difference in lobar activity? Indian J Nucl Med. 2020;35:122–129.
OpenUrl

[53] Wagieh S,

[54] Salman K,

[55] Bakhsh A,

[56] et al

[57] 14.↵
Mariani G,
Tonacchera M,
Grosso M,
Orsolini F,
Vitti P,
Strauss HW
. The role of nuclear medicine in the clinical management of benign thyroid disorders, part 1: hyperthyroidism. J Nucl Med. 2021;62:304–312.
OpenUrl Abstract/FREE Full Text

[58] Mariani G,

[59] Tonacchera M,

[60] Grosso M,

[61] Orsolini F,

[62] Vitti P,

[63] Strauss HW

[64] 15.↵
Mariani G,
Tonacchera M,
Grosso M,
et al
. The role of nuclear medicine in the clinical management of benign thyroid disorders, part 2: nodular goiter, hypothyroidism, and subacute thyroiditis. J Nucl Med. 2021;62:886–895.
OpenUrl Abstract/FREE Full Text

[65] Mariani G,

[66] Tonacchera M,

[67] Grosso M,

[68] et al

Main menu

User menu

Search

Remodeling ^99mTc-Pertechnetate Thyroid Uptake: Statistical, Machine Learning, and Deep Learning Approaches

Visual Abstract

Abstract

MATERIALS AND METHODS