Balanced single-shot object detection using cross-context attention-guided network

Shuyu Miao; Shanshan Du; Rui Feng; Yuejie Zhang; Huayu Li; Tianbi Liu; Lin Zheng; Weiguo Fan

doi:10.1016/j.patcog.2021.108258

Back

Balanced single-shot object detection using cross-context attention-guided network

Journal article

Peer reviewed

Balanced single-shot object detection using cross-context attention-guided network

Shuyu Miao, Shanshan Du, Rui Feng, Yuejie Zhang, Huayu Li, Tianbi Liu, Lin Zheng and Weiguo Fan

Pattern recognition, Vol.122, p.108258

02/2022

DOI: 10.1016/j.patcog.2021.108258

View Online

Abstract

•A light but effective Cross-context Attention-guided Network is proposed to maximize the balance between accuracy and speed for real-world object detection applications.•Cross-context Attention Mechanism (CCAM) is established to take the multi-region context information into consideration simultaneously, such as cross-region, adjacent-region, channel region, and spatial region.•Three attention mechanisms are introduced to guide the network optimization for better learning region focusing. In real-world application scenarios, object detection usually encounters two technical challenges, i.e., high accuracy and high speed. Although the latest detection frameworks based on anchor-free detection have achieved outstanding performance, they cannot be widely used in real-world scenarios due to their model complexity and slow speed. In this paper, inspired by cross-context attention mechanism of human visual systems, we propose a light but effective single-shot detection framework using Cross-context Attention-guided Network (CCAGNet) to balance the accuracy and speed. CCAGNet uses attention-guided mechanism to highlight the interaction of object-synergy regions, and suppresses non-object-synergy regions by combining Cross-context Attention Mechanism (CCAM), Receptive Field Attention Mechanism (RFAM), and Semantic Fusion Attention Mechanism (SFAM). The main contribution of our work includes establishing a novel attention mechanism that takes the context information of channel, spatial, cross- and adjacent-regions into consideration simultaneously. Extensive experiments demonstrate the feasibility and effectiveness of our method on the public benchmark datasets. To the best of our knowledge, CCAGNet obtains the state-of-the-art performance on both PascalVOC and MSCOCO with the excellent trade-off between accuracy and speed among single-shot detectors. Especially, the Average Precision (AP) metric is significantly improved by 17.0% on small object detection on MSCOCO.

Accuracy and speed balance

Cross-context attention mechanism

Cross-context attention-guided network

Receptive field attention mechanism

Semantic fusion attention mechanism

Details

Title: Subtitle: Balanced single-shot object detection using cross-context attention-guided network
Creators: Shuyu Miao - Fudan University
Shanshan Du - Fudan University
Rui Feng - Fudan University
Yuejie Zhang - Fudan University
Huayu Li - Fudan University
Tianbi Liu - Fudan University
Lin Zheng - Antea Group
Weiguo Fan - University of Iowa
Resource Type: Journal article
Publication Details: Pattern recognition, Vol.122, p.108258
Publisher: Elsevier Ltd
DOI: 10.1016/j.patcog.2021.108258
ISSN: 0031-3203
eISSN: 1873-5142
Language: English
Date published: 02/2022
Academic Unit: Business Analytics
Record Identifier: 9984380390502771

Metrics

11 Record Views

17 Times Cited - Web of Science

See more details