Journal article
Stacked Multimodal Attention Network for Context-Aware Video Captioning
IEEE transactions on circuits and systems for video technology, Vol.32(1), pp.31-42
01/2022
DOI: 10.1109/TCSVT.2021.3058626
Abstract
Recent neural models for video captioning usually employ an attention-based encoder-decoder framework. However, current approaches mainly attend to the motion features and object features of the video when generating the caption, but ignore the potential but useful historical information. Besides, exposure bias and vanishing gradients problems always exist in current caption generation models. In this paper, we propose a novel video captioning framework, named Stacked Multimodal Attention Network (SMAN). It adopts additional visual and textual historical information during caption generation as context features, employs a stacked architecture to process different features gradually, and utilizes the Reinforcement Learning method and coarse-to-fine training strategy to further improve the generated results. Both quantitative and qualitative experiments on the benchmark datasets of MSVD and MSR-VTT show the effectiveness and feasibility of our framework. The codes are available on https://github.com/zhengyi123456/SMAN .
Details
- Title: Subtitle
- Stacked Multimodal Attention Network for Context-Aware Video Captioning
- Creators
- Yi Zheng - Fudan UniversityYuejie Zhang - Fudan UniversityRui Feng - Fudan UniversityTao Zhang - Shanghai University of Finance and EconomicsWeiguo Fan - University of Iowa
- Resource Type
- Journal article
- Publication Details
- IEEE transactions on circuits and systems for video technology, Vol.32(1), pp.31-42
- Publisher
- IEEE
- DOI
- 10.1109/TCSVT.2021.3058626
- ISSN
- 1051-8215
- eISSN
- 1558-2205
- Grant note
- 19ZR1417200 / Shanghai Natural Science Foundation (10.13039/100007219) 20511101203; 20511102702; 20511101403; 19DZ2205700; 2021SHZDZX0103 / Science and Technology Development Plan of Shanghai Science and Technology Commission 19YJA630116 / Humanities and Social Sciences Planning Fund of Ministry of Education of China (10.13039/501100013139) 61976057; 61572140 / National Natural Science Foundation of China (10.13039/501100001809)
- Language
- English
- Date published
- 01/2022
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380476002771
Metrics
6 Record Views