Conference proceeding
Retina layers thickness guided vision transformer for glaucoma diagnosis
Vol.13407, pp.1340734-1340734-8
Progress in Biomedical Optics and Imaging
04/04/2025
DOI: 10.1117/12.3047523
Abstract
Purpose: Convolutional neural network (CNN) encodes medical image features in latent space for diagnostic classification and decodes generative images for segmentation or prognosis. However, the image compression in encoder loses resolution resulting in hard-interpretable activation map. Vision Transformer (ViT) processes small image patches parallelly to preserve focused attention maps. We applied retina layers thickness guided ViT for glaucoma diagnosis with pathology-related attention maps and high accuracy. Materials and Methods: Eighty patients with balanced glaucoma vs. normal grouping were scanned with ocular computational tomography (OCT) for multiple visits. Macular and optic nerve hypoplasia (ONH) OCT were acquired at each visit, with GCIPL, ILM, RNFL, and RPE layers segmented with an in-house algorithm. The dataset contained above 3000 samples by cross-assemble macular and ONH layers on multiple visits. We evaluated binary classification using the data-efficient image transformer (DeiT) with retina layers thickness maps and enface images. DeiT was tuned with transfer learning and re-training, then compared with benchmark models including ResNet18 and Swin transformers. Results: DeiT tuned by re-training with retina layers thickness maps achieved highest classification accuracy of 1.0/0.99 for training and testing, respectively, while transfer learning yielded test accuracy of 0.95 for enface images, which was 3% higher than re-training. The transfer learning accuracy of DeiT outperformed ResNet18, Swin, and Swin V2 models by 2-10%. Low contrast regional pathology was detected on DeiT attention maps with re-training, while high contrast neuron bundles or vessels were more sensitive with transfer learning. Conclusion: DeiT distilled with superior small objects detection outperformed larger state-of-art ViT models for glaucoma diagnosis. Models with suitable design should be selected for medical applications instead of simply pursuing growing model size. Transfer learning or re-training should be selected based on fine-texture or regional related pathology.
Details
- Title: Subtitle
- Retina layers thickness guided vision transformer for glaucoma diagnosis
- Creators
- Wenjun Yang - The Univ. of Iowa Hospitals and Clinics (United States)Jui-Kai Wang - University of IowaRandy H. Kardon - University of Iowa
- Contributors
- Susan M. Astley (Editor) - University of ManchesterAxel Wismüller (Editor) - University of Rochester
- Resource Type
- Conference proceeding
- Publication Details
- Vol.13407, pp.1340734-1340734-8
- Publisher
- SPIE
- Series
- Progress in Biomedical Optics and Imaging
- DOI
- 10.1117/12.3047523
- ISSN
- 1605-7422
- Grant note
- Department of Veteran Affairs (VA) Rehabilitation Research and Development (RRD): I50RX003002, RRD I01RX003797 National Institutes of Health (NIH): R01EY031544
This study was supported, in part, by the Department of Veteran Affairs (VA) Rehabilitation Research and Development (RR&D) I50RX003002, RR&D I01RX003797, and National Institutes of Health (NIH) R01EY031544.
- Language
- English
- Date published
- 04/04/2025
- Academic Unit
- Iowa Neuroscience Institute; Ophthalmology and Visual Sciences
- Record Identifier
- 9984813173702771
Metrics
6 Record Views