Journal article
Cross-modal retrieval with dual multi-angle self-attention
Journal of the Association for Information Science and Technology, Vol.72(1), pp.46-65
01/2021
DOI: 10.1002/asi.24373
Abstract
In recent years, cross-modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is a huge semantic gap between different modalities on account of heterogeneous properties. How to establish the correlation among different modality data faces enormous challenges. In this work, we propose a novel end-to-end framework named Dual Multi-Angle Self-Attention (DMASA) for cross-modal retrieval. Multiple self-attention mechanisms are applied to extract fine-grained features for both images and texts from different angles. We then integrate coarse-grained and fine-grained features into a multimodal embedding space, in which the similarity degrees between images and texts can be directly compared. Moreover, we propose a special multistage training strategy, in which the preceding stage can provide a good initial value for the succeeding stage and make our framework work better. Very promising experimental results over the state-of-the-art methods can be achieved on three benchmark datasets ofFlickr8k,Flickr30k, andMSCOCO.
Details
- Title: Subtitle
- Cross-modal retrieval with dual multi-angle self-attention
- Creators
- Wenjie Li - Fudan UniversityYi Zheng - Fudan UniversityYuejie Zhang - Fudan UniversityRui Feng - Fudan UniversityTao Zhang - Shanghai University of Finance and EconomicsWeiguo Fan - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Journal of the Association for Information Science and Technology, Vol.72(1), pp.46-65
- Publisher
- Wiley
- DOI
- 10.1002/asi.24373
- ISSN
- 2330-1635
- eISSN
- 2330-1643
- Number of pages
- 20
- Grant note
- 61976057; 61572140 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC) 19ZR1417200 / Shanghai Natural Science Foundation; Natural Science Foundation of Shanghai 19YJA630116 / Humanities and Social Sciences Planning Fund of Ministry of Education of China Henry Tippie Endowed Chair Fund from the University of Iowa
- Language
- English
- Date published
- 01/2021
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380404902771
Metrics
3 Record Views