Skip to main content

Main menu

  • Home
  • Content
    • Current
      • JNMT Supplement
    • Ahead of print
    • Past Issues
    • Continuing Education
    • JNMT Podcast
    • SNMMI Annual Meeting Abstracts
  • Subscriptions
    • Subscribers
    • Rates
    • Journal Claims
    • Institutional and Non-member
  • Authors
    • Submit to JNMT
    • Information for Authors
    • Assignment of Copyright
    • AQARA Requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
    • Corporate & Special Sales
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • SNMMI
    • JNMT
    • JNM
    • SNMMI Journals
    • SNMMI

User menu

  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Nuclear Medicine Technology
  • SNMMI
    • JNMT
    • JNM
    • SNMMI Journals
    • SNMMI
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Journal of Nuclear Medicine Technology

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • Continuing Education
    • JNMT Podcast
    • SNMMI Annual Meeting Abstracts
  • Subscriptions
    • Subscribers
    • Rates
    • Journal Claims
    • Institutional and Non-member
  • Authors
    • Submit to JNMT
    • Information for Authors
    • Assignment of Copyright
    • AQARA Requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
    • Corporate & Special Sales
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • Watch or Listen to JNMT Podcast
  • Visit SNMMI on Facebook
  • Join SNMMI on LinkedIn
  • Follow SNMMI on Twitter
  • Subscribe to JNMT RSS feeds
Research ArticleProfessional Development

Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions

Michael A. Oumano and Shawn M. Pickett
Journal of Nuclear Medicine Technology May 2025, jnmt.124.269335; DOI: https://doi.org/10.2967/jnmt.124.269335
Michael A. Oumano
1Landauer Medical Physics, Glenwood, Illinois;
2Department of Medicine and Biological Sciences, Brown University, Providence, Rhode Island; and
3Department of Medical Physics and Radiation Safety, Rhode Island Hospital, Providence, Rhode Island
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shawn M. Pickett
1Landauer Medical Physics, Glenwood, Illinois;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Info & Metrics
  • PDF
Loading

Visual Abstract

Figure
  • Download figure
  • Open in new tab
  • Download powerpoint

Abstract

This study investigated the application of large language models (LLMs) with and without retrieval-augmented generation (RAG) in nuclear medicine, particularly their performance across various topics relevant to the field, to evaluate their potential use as reliable tools for professional education and clinical decision-making. Methods: We evaluated the performance of LLMs, including the OpenAI GPT-4o series, Google Gemini, Cohere, Anthropic, and Meta Llama3, across 15 nuclear medicine topics. The models’ accuracy was assessed using a set of 600 sample questions, covering a range of clinical and technical domains in nuclear medicine. Overall accuracy was measured by averaging performance across these topics. Additional performance comparisons were conducted across individual models. Results: OpenAI’s models, particularly openai_nvidia_gpt-4o_final and openai_mxbai_gpt-4o_final, demonstrated the highest overall accuracy, achieving scores of 0.787 and 0.783, respectively, when RAG was implemented. Anthropic Opus and Google Gemini 1.5 Pro followed closely, with competitive overall accuracy scores of 0.773 and 0.750 with RAG. Cohere and Llama3 models showed more variability in performance, with the Llama3 ollama_llama3 model (without RAG) achieving the lowest accuracy. Discrepancies were noted in question interpretation, particularly in complex clinical guidelines and imaging-based queries. Conclusion: LLMs show promising potential in nuclear medicine, improving diagnostic accuracy, especially in areas like radiation safety and skeletal system scintigraphy. This study also demonstrates that adding a RAG workflow can increase the accuracy of an off-the-shelf model. However, challenges persist in handling nuanced guidelines and visual data, emphasizing the need for further optimization in LLMs for medical applications.

  • nuclear medicine
  • AI models
  • retrieval-augmented generation
  • large language models
  • diagnostic accuracy
  • radiation safety

Footnotes

  • Published online May 9, 2025.

View Full Text

This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.

SNMMI members

SNMMI Member Login

Login to the site using your SNMMI member credentials

Individuals

Non-Member Login

Login as an individual user

PreviousNext
Back to top

In this issue

Journal of Nuclear Medicine Technology: 53 (1)
Journal of Nuclear Medicine Technology
Vol. 53, Issue 1
March 1, 2025
  • Table of Contents
  • About the Cover
  • Index by author
  • Complete Issue (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Nuclear Medicine Technology.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions
(Your Name) has sent you a message from Journal of Nuclear Medicine Technology
(Your Name) thought you would like to see the Journal of Nuclear Medicine Technology web site.
Citation Tools
Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions
Michael A. Oumano, Shawn M. Pickett
Journal of Nuclear Medicine Technology May 2025, jnmt.124.269335; DOI: 10.2967/jnmt.124.269335

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Comparison of Large Language Models’ Performance on 600 Nuclear Medicine Technology Board Examination–Style Questions
Michael A. Oumano, Shawn M. Pickett
Journal of Nuclear Medicine Technology May 2025, jnmt.124.269335; DOI: 10.2967/jnmt.124.269335
Twitter logo Facebook logo LinkedIn logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • Visual Abstract
    • Abstract
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • CONCLUSION
    • DISCLOSURE
    • Footnotes
    • REFERENCES
  • Figures & Data
  • Info & Metrics
  • PDF

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • A Multiradionuclide Automatic Dispensing System for Syringes of Radiopharmaceuticals: The Effect on Operator Hand Dose
  • The Impact of COVID-19 on First-Year Undergraduate Nuclear Medicine Students’ Practical Skills Training
Show more Professional Development

Similar Articles

Keywords

  • nuclear medicine
  • AI models
  • retrieval-augmented generation
  • large language models
  • diagnostic accuracy
  • radiation safety
SNMMI

© 2025 SNMMI

Powered by HighWire