Conference proceeding
VIDEO CAPTIONING WITH TEMPORAL AND REGION GRAPH CONVOLUTION NETWORK
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), Vol.2020-, pp.1-6
IEEE International Conference on Multimedia and Expo
01/01/2020
DOI: 10.1109/ICME46284.2020.9102967
Abstract
Video captioning aims to generate a natural language description for a given video clip that includes not only spatial information but also temporal information. To better exploit such spatial-temporal information attached to videos, we propose a novel video captioning framework with Temporal Graph Network (TGN) and Region Graph Network (RGN). TGN mainly focuses on utilizing the sequential information of frames that most of existing methods ignore. RGN is designed to explore the relationships among salient objects. Different from previous work, we introduce Graph Convolution Network (GCN) to encode frames with their sequential information and build a region graph for utilizing object information. We also particularly adopt a stack GRU decoder with a coarse-to-fine structure for caption generation. Very promising experimental results on two benchmark datasets (MSVD and MSR-VTT) show the effectiveness of our model.
Details
- Title: Subtitle
- VIDEO CAPTIONING WITH TEMPORAL AND REGION GRAPH CONVOLUTION NETWORK
- Creators
- Xinlong Xiao - Fudan UniversityYuejie Zhang - Fudan UniversityRui Feng - Fudan UniversityTao Zhang - Shanghai Finance UniversityShang Gao - Deakin UniversityWeiguo Fan - University of Iowa
- Resource Type
- Conference proceeding
- Publication Details
- 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), Vol.2020-, pp.1-6
- Publisher
- IEEE
- Series
- IEEE International Conference on Multimedia and Expo
- DOI
- 10.1109/ICME46284.2020.9102967
- ISSN
- 1945-7871
- eISSN
- 1945-788X
- Number of pages
- 6
- Grant note
- 61976057; 61572140 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC) 19ZR1417200 / Shanghai Natural Science Foundation; Natural Science Foundation of Shanghai 19YJA630116 / Humanities and Social Sciences Planning Fund of Ministry of Education of China
- Language
- English
- Date published
- 01/01/2020
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380479602771
Metrics
4 Record Views