Journal article
Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning
IEEE transactions on image processing, Vol.29, pp.694-709
01/01/2020
DOI: 10.1109/TIP.2019.2928144
PMID: 31331893
Abstract
Visual saliency and semantic saliency are important in image captioning. However, a single-phase image captioning model benefits little from limited saliency information without a saliency predictor. In this paper, a novel saliency-enhanced re-captioning framework via two-phase learning is proposed to enhance single-phase image captioning. In the framework, both visual and semantic saliency cues are distilled from the first-phase model and fused with the second-phase model for model self-boosting. The visual saliency mechanism can generate a saliency map and a saliency mask for an image without learning a saliency predictor. The semantic saliency mechanism sheds some lights on the properties of those words with the part-of-speech Noun in a caption. Besides, another type of saliency, sample saliency is proposed to compute the saliency degree of each sample, which is helpful for more robust image captioning. In addition, how to combine the three types of saliency for further performance boost is also examined. Our framework can treat an image captioning model as a saliency extractor, which may benefit other captioning models and the related tasks. The experimental results on both the Flickr30k and MSCOCO datasets show that the saliency-enhanced models can obtain promising performance gains.
Details
- Title: Subtitle
- Re-Caption: Saliency-Enhanced Image Captioning Through Two-Phase Learning
- Creators
- Lian Zhou - Fudan UniversityYuejie Zhang - Fudan UniversityYu-Gang Jiang - Fudan UniversityTao Zhang - Shanghai University of Finance and EconomicsWeiguo Fan - University of Iowa
- Resource Type
- Journal article
- Publication Details
- IEEE transactions on image processing, Vol.29, pp.694-709
- Publisher
- IEEE
- DOI
- 10.1109/TIP.2019.2928144
- PMID
- 31331893
- ISSN
- 1057-7149
- eISSN
- 1941-0042
- Grant note
- University of Iowa (10.13039/100008893) 17DZ1100504; 16JC1420401 / Shanghai Municipal R&D Foundation 19ZR1417200 / Natural Science Foundation of Shanghai (10.13039/100007219) 19YJA630116 / Humanities and Social Sciences Planning Fund of Ministry of Education of China 61572140; 61976057 / National Natural Science Foundation of China (10.13039/501100001809)
- Language
- English
- Date published
- 01/01/2020
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380554602771
Metrics
5 Record Views