Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation

Jesse A. Most; Evan H. Walker; Nehal N. Mehta; Ines D. Nagel; Jimmy S. Chen; Jonathan F. Russell; Nathan L. Scott; Shyamanga Borooah

doi:10.1016/j.xops.2025.100911

Back

Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation

Journal article

Open access

Peer reviewed

Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation

Jesse A. Most, Evan H. Walker, Nehal N. Mehta, Ines D. Nagel, Jimmy S. Chen, Jonathan F. Russell, Nathan L. Scott and Shyamanga Borooah

Ophthalmology science (Online), Vol.6(1), 100911

01/2026

DOI: 10.1016/j.xops.2025.100911

PMCID: PMC12478077

PMID: 41030829

Files and links (1)

url

https://doi.org/10.1016/j.xops.2025.100911View

Published (Version of record) Open Access

Abstract

Objective To evaluate the diagnostic accuracy of four multimodal large language models (MLLMs) in detecting and grading diabetic retinopathy (DR) using their new image analysis features. Design Single-center retrospective study Subjects Patients diagnosed with pre-diabetes and diabetes Methods Ultrawide field (UWF) fundus images from patients seen at the University of California San Diego were graded for DR severity by three retina specialists using the Early Treatment Diabetic Retinopathy Study (ETDRS) classification system to establish ground truth. Four MLLMs (ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini 1.5 Pro, and Perplexity Llama 3.1 Sonar/Default) were tested using four distinct prompts. These assessed multiple choice disease diagnosis, binary disease classification, and disease severity. MLLMs were assessed for accuracy, sensitivity, and specificity in identifying the presence or absence of DR, and relative disease severity. Main Outcome Measures Accuracy, sensitivity, and specificity of diagnosis Results A total of 309 eyes from 188 patients were included in the study. Average patient age was 58.7 (56.7, 60.7) years, with 55.3% being female. After specialist grading, 70.2% of eyes had DR of varying severity and 29.8% had no DR. For disease identification with multiple choices provided, Claude and ChatGPT scored significantly higher (P < 0.0006, per Bonferroni correction) than other MLLMs for accuracy (0.608, 0.566) and sensitivity (0.618, 0.641). In binary DR versus No DR classification, accuracy was highest for ChatGPT (0.644) and Perplexity (0.602). Sensitivity varied [ChatGPT (0.539), Perplexity (0.488), Claude (0.179), and Gemini (0.042)], while specificity for all models was relatively high (range: 0.870 - 0.989). For the DR severity prompt with the best overall results (Prompt 3.1), no significant differences between models were found in accuracy [Perplexity (0.411), ChatGPT (0.395), Gemini (0.392), Claude (0.314)]. All models demonstrated low sensitivity [Perplexity (0.247), ChatGPT (0.229), Gemini (0.224), Claude (0.184)]. Specificity ranged from 0.840 to 0.866. Conclusion MLLMs are powerful tools which may eventually assist retinal image analysis. Currently, however, there is variability in the accuracy of image analysis, and diagnostic performance falls short of clinical standards for safe implementation in diabetic retinopathy diagnosis and grading. Further training and optimization of common errors may enhance their clinical utility.

Diabetic retinopathy

Ultra-widefield fundus photography

Multimodal large language model

Artificial intelligence

Image analysis

Details

Title: Subtitle: Can Multimodal Large Language Models Diagnose Diabetic Retinopathy from Fundus Photos? A Quantitative Evaluation
Creators: Jesse A. Most - University of California San Diego
Evan H. Walker - University of California San Diego
Nehal N. Mehta - University of California San Diego
Ines D. Nagel - University of California San Diego
Jimmy S. Chen - University of California San Diego
Jonathan F. Russell - University of Iowa
Nathan L. Scott - University of California San Diego
Shyamanga Borooah - University of California San Diego
Resource Type: Journal article
Publication Details: Ophthalmology science (Online), Vol.6(1), 100911
DOI: 10.1016/j.xops.2025.100911
PMID: 41030829
PMCID: PMC12478077
NLM abbreviation: Ophthalmol Sci
ISSN: 2666-9145
eISSN: 2666-9145
Publisher: ELSEVIER
Language: English
Electronic publication date: 08/2025
Date published: 01/2026
Academic Unit: Ophthalmology and Visual Sciences
Record Identifier: 9984946696602771

Metrics

26 Record Views