Retina layers thickness guided vision transformer for glaucoma diagnosis

Wenjun Yang; Jui-Kai Wang; Randy H. Kardon

doi:10.1117/12.3047523

Back

Conference proceeding

Retina layers thickness guided vision transformer for glaucoma diagnosis

Wenjun Yang, Jui-Kai Wang and Randy H. Kardon

Vol.13407, pp.1340734-1340734-8

Progress in Biomedical Optics and Imaging

04/04/2025

DOI: 10.1117/12.3047523

View Online

Abstract

Purpose: Convolutional neural network (CNN) encodes medical image features in latent space for diagnostic classification and decodes generative images for segmentation or prognosis. However, the image compression in encoder loses resolution resulting in hard-interpretable activation map. Vision Transformer (ViT) processes small image patches parallelly to preserve focused attention maps. We applied retina layers thickness guided ViT for glaucoma diagnosis with pathology-related attention maps and high accuracy. Materials and Methods: Eighty patients with balanced glaucoma vs. normal grouping were scanned with ocular computational tomography (OCT) for multiple visits. Macular and optic nerve hypoplasia (ONH) OCT were acquired at each visit, with GCIPL, ILM, RNFL, and RPE layers segmented with an in-house algorithm. The dataset contained above 3000 samples by cross-assemble macular and ONH layers on multiple visits. We evaluated binary classification using the data-efficient image transformer (DeiT) with retina layers thickness maps and enface images. DeiT was tuned with transfer learning and re-training, then compared with benchmark models including ResNet18 and Swin transformers. Results: DeiT tuned by re-training with retina layers thickness maps achieved highest classification accuracy of 1.0/0.99 for training and testing, respectively, while transfer learning yielded test accuracy of 0.95 for enface images, which was 3% higher than re-training. The transfer learning accuracy of DeiT outperformed ResNet18, Swin, and Swin V2 models by 2-10%. Low contrast regional pathology was detected on DeiT attention maps with re-training, while high contrast neuron bundles or vessels were more sensitive with transfer learning. Conclusion: DeiT distilled with superior small objects detection outperformed larger state-of-art ViT models for glaucoma diagnosis. Models with suitable design should be selected for medical applications instead of simply pursuing growing model size. Transfer learning or re-training should be selected based on fine-texture or regional related pathology.

Glaucoma

RNFL

vision transformer

OCT

CNN

Details

Title: Subtitle: Retina layers thickness guided vision transformer for glaucoma diagnosis
Creators: Wenjun Yang - The Univ. of Iowa Hospitals and Clinics (United States)
Jui-Kai Wang - University of Iowa
Randy H. Kardon - University of Iowa
Contributors: Susan M. Astley (Editor) - University of Manchester
Axel Wismüller (Editor) - University of Rochester
Resource Type: Conference proceeding
Publication Details: Vol.13407, pp.1340734-1340734-8
Publisher: SPIE
Series: Progress in Biomedical Optics and Imaging
DOI: 10.1117/12.3047523
ISSN: 1605-7422
Grant note: Department of Veteran Affairs (VA) Rehabilitation Research and Development (RRD): I50RX003002, RRD I01RX003797 National Institutes of Health (NIH): R01EY031544
This study was supported, in part, by the Department of Veteran Affairs (VA) Rehabilitation Research and Development (RR&D) I50RX003002, RR&D I01RX003797, and National Institutes of Health (NIH) R01EY031544.
Language: English
Date published: 04/04/2025
Academic Unit: Iowa Neuroscience Institute; Ophthalmology and Visual Sciences
Record Identifier: 9984813173702771

Metrics

6 Record Views