Logo image
Reference Standard for Validation of Age-Related Macular Degeneration Screening Algorithms
Journal article   Open access   Peer reviewed

Reference Standard for Validation of Age-Related Macular Degeneration Screening Algorithms

Amitha Domalpally, Emily Y Chew, Malvina B Eydelman, Tiarnán D L Keenan, Pearse A Keane, Aaron Y Lee, Cecilia S Lee, Eleonora M Lad, Jennifer I Lim, Anat Lowenstein, …
Ophthalmology (Rochester, Minn.)
04/16/2026
DOI: 10.1016/j.ophtha.2026.04.013
PMID: 41999903
url
https://doi.org/10.1016/j.ophtha.2026.04.013View
Published (Version of record) Open Access

Abstract

Artificial intelligence (AI)-based screening models hold promise for identifying individuals with undiagnosed age-related macular degeneration (AMD) in non-specialist settings. A standardized reference framework for image labeling is needed to enable consistent training, validation, and deployment of AI based screening algorithms.The goal of the present study is to establish expert consensus on image -based reference standard for labeling AMD DESIGN: Modified Delphi consensus study Subjects/ Participants: fellowship-trained retina specialists, ophthalmologists, AI specialists, and imaging specialists METHODS: A prespecified Delphi process was conducted using structured surveys . Over two rounds, panelists assessed opinions on existing reference standards, including the AREDS scale and Beckman scale, as well as imaging modalities such as color, optical coherence tomography (OCT), and autofluorescence. The surveys also evaluated imaging features of AMD, including drusen, pseudodrusen, and pigment changes, as well as referral criteria. Consensus was defined using a 9-point Likert scale, with predefined statistical thresholds for agreement. Agreement on key elements of a reference standard RESULTS: Consensus was reached on adopting the Beckman Classification as the level 1 reference standard (median score 8; agreement). OCT use for identifying key AMD features, including drusen, GA, and CNV, also reached consensus (median scores 8.5-9; agreement). Pigment change detection did not reach consensus (median 7.5; uncertain), and screening age thresholds showed non-consensus (median 8; uncertain). Referral thresholds reached consensus, including urgent referral for neovascular AMD and non-urgent referral for GA and intermediate AMD (median 9; agreement). This study defines a consensus-based reference standard for labeling AMD from images for AI based screening. These recommendations are intended to support consistent AI model development and evaluation, while remaining distinct from clinical practice guidelines.

Details

Metrics

1 Record Views
Logo image