Conference proceeding
Disentangled Contrastive Representation Learning for Zero-Shot Biomedical Text Classification
Proceedings (IEEE International Conference on Data Mining), pp.1425-1434
11/12/2025
DOI: 10.1109/ICDM65498.2025.00152
Abstract
Zero-shot biomedical text classification requires accurate assignment of biomedical text (e.g., scientific abstract) to previously unseen labels (or concepts). Existing methods often struggle to generalize to novel concepts such as new diseases, drugs, and genes. To address these unique challenges, we propose a framework that combines feature disentanglement with contrastive learning to address this limitation. It separates each abstract into a content representation relevant for classification and a variance representation. This disentanglement ensures that the content features are invariant to the writing style, improving generalization to unseen labels. A contrastive learning strategy further structures the latent space by encouraging semantic clustering and separation of categories. Moreover, we model intra-class variance as a shared distribution across labels to enable variational data augmentation, enhancing robustness. The framework uses a domain-specific biomedical language model for feature extraction and fixed label anchors for semantic alignment. Extensive experiments conducted on the largest available biomedical corpus achieve superior performance on zero-shot multi-label classification tasks by learning discriminative and style-invariant representations.
Details
- Title: Subtitle
- Disentangled Contrastive Representation Learning for Zero-Shot Biomedical Text Classification
- Creators
- Ratri Mukherjee - University of IowaShailesh Dahal - University of IowaKishlay Jha - University of Iowa
- Resource Type
- Conference proceeding
- Publication Details
- Proceedings (IEEE International Conference on Data Mining), pp.1425-1434
- DOI
- 10.1109/ICDM65498.2025.00152
- eISSN
- 2374-8486
- Publisher
- IEEE
- Language
- English
- Date published
- 11/12/2025
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9985141959902771
Metrics
1 Record Views