Journal article
Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval
Pattern recognition, Vol.100, p.107148
04/2020
DOI: 10.1016/j.patcog.2019.107148
Abstract
•A simple yet effective pipeline for FG-SBIR is created through combining all the beneficial multimodal cues involved in sketches and annotated images.•A deep cascaded neural network architecture with deep representation, embedding, and ranking is established for revealing multimodal relationships.•Two extended image datasets are collected to validate the generalization ability of our scheme, which demonstrates its effectiveness for both SBIR and FG-SBIR.
Fine-grained Sketch-based Image Retrieval (FG-SBIR), which utilizes hand-drawn sketches to search the target object images, has recently drawn much attention. It is a challenging task because sketches and images belong to different modalities and sketches are highly abstract and ambiguous. Existing solutions to this problem either focus on visual comparisons between sketches and images and ignore the multimodal characteristics of annotated images, or treat the retrieval as a one-time process. In this paper, we formulate FG-SBIR as a coarse-to-fine process, and propose a Deep Cascaded Cross-modal Ranking Model (DCCRM) that can exploit all the beneficial multimodal information in sketches and annotated images and improve both the retrieval efficiency and the top-K ranked effectiveness. Our goal concentrates on constructing deep representations for sketches, images, and descriptions, and learning the optimized deep correlations across such different domains. Thus for a given query sketch, its relevant images with fine-grained instance-level similarities in a specific category can be returned, and the strict requirement of the instance-level retrieval for FG-SBIR is satisfied. Very positive results have been obtained in our experiments by using a large quantity of public data.
Details
- Title: Subtitle
- Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval
- Creators
- Yanfei Wang - Fudan UniversityFei Huang - Fudan UniversityYuejie Zhang - Fudan UniversityRui Feng - Fudan UniversityTao Zhang - Shanghai University of Finance and EconomicsWeiguo Fan - University of Iowa
- Resource Type
- Journal article
- Publication Details
- Pattern recognition, Vol.100, p.107148
- Publisher
- Elsevier Ltd
- DOI
- 10.1016/j.patcog.2019.107148
- ISSN
- 0031-3203
- eISSN
- 1873-5142
- Grant note
- DOI: 10.13039/100007219, name: Natural Science Foundation of Shanghai, award: 19ZR1417200; DOI: 10.13039/100008893, name: University of Iowa; DOI: 10.13039/501100001809, name: National Natural Science Foundation of China, award: 61572140, 61976057; DOI: 10.13039/501100013139, name: Humanities and Social Science Fund of Ministry of Education of China, award: 19YJA630116; DOI: 10.13039/501100018625, name: Science and Technology Innovation Plan Of Shanghai Science and Technology Commission, award: 16JC1420401, 17DZ1100504
- Language
- English
- Date published
- 04/2020
- Academic Unit
- Business Analytics
- Record Identifier
- 9984380476502771
Metrics
3 Record Views